This documentation provides a comprehensive technical overview of the Skia Performance Dashboard (Perf), targeting software engineers new to the project. It focuses on the architectural rationale, data lifecycle, and core concepts beyond simple feature descriptions.
Skia Perf is a large-scale performance monitoring and regression detection platform. It ingests high-frequency telemetry data from diverse sources (Chrome, Android, Fuchsia, Skia), organizes it into searchable time-series “traces,” and automatically identifies performance regressions using statistical analysis.
The system is designed to handle millions of data points across years of history, translating non-linear source control history into linear, searchable performance trends.
benchmark=motion_mark, bot=pixel_6, unit=ms).bot names). This powers the query UI.#base_memory_usage).Perf is built as a modular system where a single Go binary, perfserver, performs different roles based on its execution mode.
[ Data Sources ] [ Ingestion Pipeline ] [ Storage Layer ] [ Analysis & UI ] | | | | [ GCS Buckets ] ------> [ perfserver ingest ] ----> [ Spanner/CDB ] <---- [ perfserver cluster ] | (Parses JSON, Maps Commits) (Trace Store) (Finds Regressions) | ^ | | | v [ Git Repos ] -------------------------------------------+---------- [ perfserver frontend ] (Web UI & API)
Perf mandates a specific JSON schema for incoming data to decouple performance producers from the dashboard.
YYYY/MM/DD/HH directory structure. This allows the ingester to process files in chronological order and prevents GCS directory listing bottlenecks.perfgit module monitors Git repositories to map every incoming git_hash to a CommitNumber. If the database is empty, it reconstructs this mapping from the Git source of truth on startup.Traces are stored in a specialized “tiled” format within the database (Google Cloud Spanner or CockroachDB).
DataFrame for the UI. This ensures O(1) lookup time for any specific point and linear scaling for range queries.Regression detection is not just simple thresholding; it uses shape-based analysis.
StepSize / LeastSquaresError. A high score means a clean, significant jump with low noise.The system supports two alerting strategies based on data density.
go/tracestore (Backend Storage)This module is the “brain” of data persistence. It manages the separation of trace values (the numbers) from the trace parameters (the metadata). It implements the logic for “joining” disparate tiles into a cohesive DataFrame for the frontend.
go/regression (Anomaly Logic)Coordinates the regression detection lifecycle. It pulls configurations (Alerts), fetches data from the tracestore, executes the clustering algorithms, and writes found anomalies to the regressionstore. It is responsible for the “Start-Status-Result” polling pattern used by the UI.
modules/explore-simple-sk (Frontend Orchestrator)The primary UI component for data interaction. It manages the reactive loop:
plot-google-chart-sk to render the SVG lines.go/notify (Alert Delivery)A modular system for dispatching alerts. It formats regression data into HTML or Markdown templates and interacts with external APIs like the Google Issue Tracker (Buganizer) or SMTP servers. It handles deduplication to ensure a single regression doesn't result in multiple redundant bugs.
When a user selects a filter in the UI:
stateReflector updates the URL with the new query.explore-simple-sk calls /_/frame/start with the query.dfbuilder identifies the necessary Tiles, fetches trace data, and applies any requested formulas (e.g., norm(), moving_average())./_/frame/status until the data is ready.DataFrame is passed to the chart, which translates CommitNumbers into X-coordinates using the ChartLayoutInterface.perf-tool database backup. Only “user-generated” data (Alerts, Regressions, Shortcuts) is backed up.This module provides the operational glue for interacting with the CockroachDB cluster deployed within the Skia infrastructure. Rather than managing the database engine itself, this module focuses on providing developer and administrator access to the data layer for debugging, manual SQL intervention, and performance monitoring.
The module is designed around the principle of ephemeral, secure access to a distributed database running inside a Kubernetes environment (perf-cockroachdb). Because the database is not exposed directly to the public internet, the scripts implement two primary access patterns:
Direct SQL Execution (In-Cluster): The connect.sh script facilitates a “cloud-native” way to interact with the database. It spins up a temporary, short-lived Kubernetes pod running the official CockroachDB image. This pod connects to the perf-cockroachdb-public service internally. This approach is preferred for quick SQL queries as it avoids local toolchain dependencies and keeps the traffic entirely within the cluster's private network.
Local Tunneling (Port-Forwarding): For more complex operations—such as using a local IDE, a native cockroach binary, or accessing the web-based Admin UI—the module utilizes Kubernetes port-forwarding.
admin.sh bridges the local port 8080 to the database's status server. This allows developers to use a local browser to inspect cluster health, node status, and query performance metrics.skia-infra-public-port-forward.sh establishes a tunnel from the local machine to the database's wire protocol port (26257). This is routed through the skia-infra-public context, enabling developers to use local CLI tools as if the database were running on localhost.SQL Access Utilities (connect.sh, skia-infra-public-port-forward.sh): These scripts manage the lifecycle of a connection. The design choice to use the --insecure flag suggests that the cluster is configured to rely on network-level isolation and Kubernetes RBAC rather than client-side certificate management for these specific administrative entry points.
Observability Bridge (admin.sh): This component targets the CockroachDB built-in HTTP console. It automates the dual step of establishing the network tunnel and launching the browser, reducing the friction required to monitor the database during performance testing or troubleshooting.
The following diagram illustrates the lifecycle of a remote administrative session using the tunneling scripts:
[ Developer Machine ] [ Kubernetes Cluster (perf) ] | | (1) Run admin.sh ----------------------> [ Pod: perf-cockroachdb-0 ] | <-- Port-Forwarding -- | :8080 (HTTP) | | (2) Browser opens localhost:8080 | | | | | (3) Run skia-...-port-forward.sh -------> [ Pod: perf-cockroachdb-0 ] | <-- Port-Forwarding -- | :26257 (SQL) | | (4) Local SQL Client -------------------+ connects to 127.0.0.1:25000
perf-cockroachdb-0 specifically for port-forwarding. This implies a StatefulSet deployment where the zero-ordinal pod acts as a reliable entry point for administrative tasks, even if the service itself is distributed across multiple nodes.v19.2.5. This ensures compatibility with the server-side wire protocol and guarantees that the administrative environment is reproducible across different developer machines.The /configs directory serves as the central repository for the operational environment definitions of Skia Perf. Each JSON file in this directory represents a unique instance of the Perf service, defining how it interacts with data stores, ingestion pipelines, version control systems, and alerting mechanisms.
These configurations are designed to be deserialized into the config.InstanceConfig Go struct, which acts as the “source of truth” for the application's behavior at runtime.
Perf is designed as a generic engine for time-series visualization and anomaly detection. Rather than hard-coding logic for different projects (like Chrome, Android, or V8), the system uses these configuration files to define the project-specific “shape” of data. This allows a single codebase to support diverse use cases, from local development to massive-scale production environments.
The module defines how performance data moves from build systems into the database.
source_type: "dir"). This is used in demo_spanner.json and local.json to allow developers to test the full ingestion stack without cloud dependencies.git_repo_config, instances specify how to parse commit positions (e.g., using commit_number_regex) to ensure the X-axis of performance graphs remains linear and meaningful across thousands of commits.The configurations allow for fine-grained control over the underlying storage engine (primarily Google Cloud Spanner).
tile_size parameter determines the granularity of data partitioning. Smaller tiles (e.g., 256) are optimized for sparse datasets or frequent small-range queries, whereas larger tiles (e.g., 8192) in high-traffic instances like Chrome minimize overhead during massive bulk ingestion.level1_cache_key (often set to bot or benchmark), the system can pre-index and cache common query patterns in Redis or local memory.Beyond simple visualization, these configurations integrate Perf into the broader CI/CD ecosystem:
[ Ingestion ] -> [ Trace Store ] -> [ Regression Detection ] -> [ Bug Filing ]
^ | |
| v v
[ Git Repo ] <---------------------- [ Anomaly Grouping ] -------- [ Issue Tracker ]
use_regression2_schema to enable advanced SQL-based anomaly tracking.issue_tracker_config and notify_config blocks define the “where” and “how” of alerting—ranging from simple email notifications to automated bug creation in Google's Issue Tracker, including the use of specific API keys and secrets.The top-level fields (e.g., instance_name, contact, favorites, extra_links) define the identity of the instance. The favorites and extra_links sections are particularly important for usability, allowing administrators to curate specific views or link to external documentation and dashboards directly within the Perf UI.
data_store_config)This component defines the backend storage technology. While the project is shifting towards Spanner (as seen in demo_spanner.json and the /spanner subdirectory), the config remains flexible enough to define connection strings and database types, ensuring the application knows how to communicate with the PostgreSQL-compatible Spanner interface.
query_config)This section controls how users interact with the data:
benchmark, bot, test) that the UI should expose for filtering and searching.local.json and demo_spanner.json are essential for the development lifecycle. They point to local data directories (./demo/data/) and use simplified auth schemes to allow developers to run the entire Perf stack on a single workstation for testing and debugging.The /configs/spanner directory contains JSON configuration files for various Skia Perf instances that utilize Google Cloud Spanner as their primary data store. These configurations define how performance data is ingested, stored, queried, and reported for specific projects such as Chrome, Android, V8, Flutter, and Fuchsia.
Each file in this directory represents a distinct environment (production, experiment, or internal) for a performance monitoring dashboard. By using Cloud Spanner, these instances benefit from a horizontally scalable, globally consistent relational database, which is particularly suited for handling the high volume of “traces” (time-series performance data) generated by large-scale CI/CD systems.
The move to Spanner (referenced in these configs via the datastore_type: "spanner" and a PostgreSQL-compatible connection string) represents an architectural shift toward high-performance SQL-based storage for performance metrics.
The configurations use a “tile-based” storage approach, controlled by the tile_size parameter.
enable_follower_reads. This improves read latency and reduces costs by allowing the application to read from Spanner replicas that might be slightly behind the leader, which is acceptable for dashboard visualization.Data flow is standardized across instances using a Google Cloud Storage (GCS) to Pub/Sub pipeline.
[ Build System ] -> [ GCS Bucket ] -> [ Pub/Sub Topic ] -> [ Perf Ingestion Service ] -> [ Spanner DB ]
gcs, identifying the bucket where performance JSON files are uploaded.topic and subscription fields define the “push” mechanism that triggers ingestion as soon as new data arrives in GCS.dl_topic and dl_subscription to handle failed ingestion attempts without losing data.The configurations define how the system reacts to performance regressions:
markdown_issuetracker (for automated bug creation), html_email, or anomalygroup (which clusters related regressions before alerting).enable_sheriff_config allows these instances to pull alert thresholds and ownership data from a central management system.use_regression2_schema: true and fetch_anomalies_from_sql: true, indicating a transition to a more robust, queryable SQL schema for tracking performance changes over time.git_repo_config)Determines how commits are mapped to performance data.
gitiles, which is optimized for Google-hosted source code.v8 and chrome use commit_number_regex to extract “Commit Positions” (e.g., refs/heads/main@{#12345}), which are used as a linear X-axis instead of raw Git hashes.query_config)Customizes the UI and discovery experience for each project's unique metric structure.
benchmark, bot, test, subtest_1) are indexed and searchable in the Perf UI.android2 use this to automatically select specific stats (like min for timeNs) when a user selects a certain metric, reducing the manual effort required to find meaningful data.cache_config) to store common query results, specifically targeting level1_cache_key (usually benchmark) to speed up dashboard loading.temporal_config)Specific to internal Chrome and Fuchsia instances, this links the Perf dashboard to Temporal, a workflow orchestration engine. This is used to trigger automated “bisects” (pinpoint_task_queue)—a process that automatically finds the exact CL responsible for a performance regression.
chrome-public.json, v8-public.json) Primary dashboards used by developers.chrome-internal.json, eskia-internal.json) Restricted instances for proprietary code or sensitive performance metrics.v8-internal-autopush.json, chrome-internal-experiment.json) Testing grounds for new Perf UI features or experimental Spanner schemas.The /coverage module provides a comprehensive quality assurance suite for the Perf project. Instead of relying on a single metric, it implements a “triangulated” approach to code health by measuring type safety, test execution coverage, and test effectiveness through mutation testing.
The primary goal of this module is to generate actionable reports that reside in a unified dashboard, allowing developers to identify not just untested code, but also “weakly” tested code or type-system gaps.
The module is structured around three distinct methodologies for evaluating the codebase:
any type or missing annotations might be bypassing the safety checks of the compiler. This ensures that the codebase remains maintainable and less prone to runtime errors.c8 and mocha to track which lines and branches of code are executed during unit tests. This is the traditional metric for identifying “dead” zones in the test suite.> to <). If the test suite still passes despite these changes, the mutation “survived,” indicating that the tests are not sensitive enough to detect logic regressions in that area.perf-coverage.sh (The Orchestrator): This script acts as the central entry point for generating coverage reports. It encapsulates the complex CLI arguments required for various tools, ensuring consistency between local execution and CI/CD pipelines. It allows for targeted runs (e.g., only running mutation tests) or a full suite execution.
add-coverage-links.py (Navigation Post-Processor): Most coverage tools generate static HTML reports that are isolated from one another. This script uses BeautifulSoup to programmatically inject a navigation header (“Back to Perf Coverage Dashboard”) into the generated HTML files. This transforms a collection of disparate reports into a cohesive, navigable documentation site. It handles idempotency by removing existing links before inserting new ones to prevent duplicate UI elements during re-runs.
stryker.config.json: Configures the Mutation Testing framework. It is specifically tuned to exclude Puppeteer (integration) tests and page objects, focusing purely on the core business logic within perf/modules. It balances performance and thoroughness by defining precise ignorePatterns.
tsconfig.coverage.json: A specialized TypeScript configuration used specifically for type-coverage reporting. It extends the base project configuration but restricts the scope to source files, excluding tests and demo files to ensure the reported coverage percentage reflects the production logic accurately.
The following diagram illustrates how the module transforms source code and tests into a unified quality dashboard:
Source Code + Tests | |----(typescript-coverage-report)----> [Type Coverage HTML] | | |----(c8 + mocha)--------------------> [Test Coverage HTML] | | |----(stryker)-----------------------> [Mutation Report HTML] | | V V [Raw HTML Reports] <----------------------- (add-coverage-links.py) | | (Injects Navigation UI) V [Unified Coverage Dashboard]
*_puppeteer_test.ts. This is a deliberate choice to keep the coverage feedback loop fast. Integration tests are often too slow and “noisy” for mutation testing, which requires thousands of test re-runs.lxml to ensure that even if the reporting tools produce slightly malformed HTML, the navigation links can be reliably injected without corrupting the reports.tsconfig checks with stryker runtime analysis, the module covers the entire lifecycle of code reliability—from compile-time correctness to runtime logic validation.csv2days is a command-line utility designed to post-process CSV files exported from Skia Perf. Its primary purpose is to aggregate time-series data from sub-day granularity (RFC3339 timestamps) into daily granularity.
When exporting data from Perf, a CSV may contain multiple columns representing different data points collected on the same calendar day. This granularity can be excessive for certain types of reporting or spreadsheet analysis. csv2days simplifies these files by collapsing all columns belonging to the same date into a single column.
When multiple columns from the same day are merged, the tool must decide how to represent the data for that day. csv2days implements a Max strategy. For any set of columns being collapsed, the tool calculates the maximum numerical value across those columns for each row.
This decision is rooted in the common use case of monitoring performance metrics where the “peak” value for a day is often more significant than an average or a sum, particularly when dealing with sparse data where different columns represent different runs of the same task. If a value cannot be parsed as a float, the tool defaults to the first available string in that “run” of columns.
The transformation logic is strictly driven by the headers of the CSV. The tool assumes that the CSV contains a horizontal timeline where headers follow the RFC3339 format.
YYYY-MM-DD portion.main.goThe core logic resides in transformCSV. The process follows a specific pipeline:
stdout.The following diagram illustrates how multiple timestamped columns are collapsed into a single date column:
INPUT CSV: [Header] | Key | 2023-01-01T08:00Z | 2023-01-01T12:00Z | 2023-01-02T09:00Z | [Row 1] | A | 10 | 20 | 15 | PROCESS: 1. Identify 2023-01-01 columns as a "Run" 2. Calculate Max(10, 20) for Row 1 -> 20 3. Truncate headers to YYYY-MM-DD 4. Remove redundant indices OUTPUT CSV: [Header] | Key | 2023-01-01 | 2023-01-02 | [Row 1] | A | 20 | 15 |
The tool requires an input file specified by the --in flag and outputs the transformed CSV directly to standard output:
csv2days --in=perf_export.csv > daily_summary.csv
The /demo module provides a self-contained environment for generating and storing synthetic performance data. Its primary purpose is to demonstrate the capabilities of the Perf system—such as anomaly detection, regression tracking, and trend visualization—without requiring a live production environment.
The module is designed to work in tandem with the perf-demo-repo, mapping performance metrics to specific Git commits within that repository.
Rather than providing static files that might become stale, the module includes a data generator (generate_data.go) that programmatically creates JSON files following the format.Format schema.
The generation logic is intentionally designed to simulate real-world performance scenarios:
1. Data Generator (generate_data.go) This is a Go binary responsible for creating the data/ directory. It iterates through a hardcoded list of Git hashes from the demo repository and generates a JSON payload for each.
perf/go/ingest/format directly, ensuring the generated data is always compatible with the system's ingestion requirements.bot, benchmark, units) to demonstrate how Perf can pivot and filter data across different environmental facets.2. Storage (/demo/data) This directory acts as a mock “data lake.” It contains the JSON output of the generator.
dir. This replicates a simple filesystem-based ingestion workflow where a watcher monitors a directory for new performance results.demo_data_commit_N.json) to provide a clear chronological lineage for performance trends.The workflow follows a path from source code state to visual representation:
[ Git Commits ] [ generate_data.go ] [ /demo/data/*.json ] (perf-demo-repo) ------> (Logic + Randomness) ------> (Structured JSONs) | | (Ingested by Perf) v [ Perf UI / Alerting ] - Detect "encode" spike - Graph "ms" vs "kb"
ms and one for memory in kb). This illustrates how a single ingestion event can update multiple disparate metrics (CPU vs. RAM) simultaneously.x86) and the branch (master) are stored in a top-level Key map. This allows the Perf system to index these files efficiently and enables users to compare performance across different hardware configurations or branches.SingleMeasurement objects (mapping test categories to specific operations like encode or decode) provides a granular view, allowing the system to track specific sub-routines within a larger benchmark.The /demo/data directory serves as the primary storage for performance benchmark results within the project. It contains a collection of JSON files, each representing a “point-in-time” snapshot of performance metrics associated with specific Git commits. This structured data allows for regression tracking, performance analysis over time, and cross-platform comparisons.
The module follows a standardized JSON schema (Version 1) designed to decouple the environmental metadata from the actual performance measurements. This structure ensures that as new benchmarks or hardware bots are added, the reporting format remains consistent.
1. Metadata and Context Each file identifies the specific build and environment that produced the results:
git_hash: The unique identifier for the source code state.key: A set of environmental descriptors including the benchmark ID, the hardware architecture/platform (bot), and the project branch (master).2. Result Grouping Performance data is grouped within the results array. This grouping strategy is chosen to allow a single commit to report multiple categories of metrics (e.g., time-based vs. size-based) in a single transaction. Each result group is defined by its own key (typically defining the units).
3. Measurement Hierarchy Inside each result group, the measurements object maps specific test categories to an array of values.
test: The common category for operational metrics.value: The specific operation performed (e.g., “encode”, “decode”).measurement: The raw numerical data point.4. Extensibility via Links The schema supports optional links objects at both the global and measurement levels. This design allows for traceability, enabling tools to link a specific performance outlier directly to external logs, search queries, or profiling reports.
The data is intended to be consumed by a visualization or monitoring system. The relationship between the files reflects a chronological progression of the codebase:
[ Commit Hash A ] -> [ Commit Hash B ] -> [ Commit Hash C ] | | | v v v +--------------+ +--------------+ +--------------+ | JSON Data 1 | | JSON Data 2 | | JSON Data 3 | | (encode: X) | | (encode: Y) | | (encode: Z) | +--------------+ +--------------+ +--------------+ | | | +----------+---------+----------+---------+ | v [ Performance Trend Graph ] (Detection of Regressions)
key object rather than a hardcoded field. This allows the reporting logic to be agnostic of what is being measured (e.g., milliseconds, kilobytes, or operations per second).value and measurement) rather than a simple map allows for the future inclusion of per-measurement metadata, such as the links found in demo_data_commit_4.json.The /docs module serves as the central knowledge repository and architectural blueprint for the Skia Performance Dashboard (Perf). It acts as the “source of truth” for the system’s design, data protocols, and operational procedures. Beyond simple user guides, this module defines the rigid contracts required for cross-project data ingestion and the multi-service architecture that enables performance regression detection at scale.
The structure of the /docs module reflects a design philosophy where documentation is treated with the same rigor as source code:
FORMAT.md (defining the nanobench JSON structure) exist to decouple data producers from the dashboard. Because Perf ingests data from diverse ecosystems (Fuchsia, Chrome, Android), a strictly versioned and documented format allows the ingestion pipeline to remain generic and stable while producers evolve independently.ai_generated_doc.md. This reduces the onboarding barrier for a highly fragmented microservice architecture.ai_generated_doc.md)This component acts as the primary technical manual for the entire project. It documents the “why” behind the most significant architectural decisions, such as:
perfserver is a single binary that handles ingestion, clustering, and frontend serving through different flags to simplify containerized deployments.FORMAT.md)This file is the definitive specification for how performance data must be structured before it reaches the dashboard. It defines the hierarchical relationship between:
API.md)Defines the programmatic interfaces for external interactions, primarily focusing on alert management. This allows automated tools to create, list, or update alerts without human intervention, facilitating a “monitoring-as-code” workflow.
The documentation defines how data moves through the system, ensuring that every component adheres to the documented state transitions:
PRODUCERS (Fuchsia, Chrome, CI) | | 1. Format raw data into 'nanobench' JSON (per FORMAT.md) v STORAGE (Google Cloud Storage) | | 2. Organized by YYYY/MM/DD/HH structure v INGESTION SERVICE (perfserver ingest) | | 3. Validate against formatSchema.json (in /go/ingest/format) | 4. Resolve Git Hash to Commit Number (via /go/git) v DATABASE (Google Cloud Spanner) | | 5. Store TraceValues and inverted ParamSets (per /go/sql/spanner) v ANALYSIS ENGINE (perfserver cluster) | | 6. Group similar traces using k-means (per /go/clustering2) | 7. Fit Step Functions to detect regressions (per /go/stepfit) v USER INTERFACE (perf.skia.org) | | 8. Visualize via explore-simple-sk and handle triage
The /docs module provides the necessary context to understand how the various sub-directories function as a unified whole:
/configs module by explaining how JSON configuration files translate to specific Perf instance identities./go, particularly the complex relationship between the tracestore and regression packages./modules, where UI elements like chart-tooltip-sk and anomalies-table-sk are orchestrated to provide a cohesive exploration experience./go)The /go directory contains the core backend services, data processing pipelines, and administrative tools for Skia Perf, a large-scale performance monitoring and regression detection platform.
Skia Perf is designed to ingest high-frequency telemetry data from diverse sources (Chrome, Android, Fuchsia, Skia), organize it into searchable time-series “traces,” and automatically identify performance regressions.
The architecture follows a specialized “tiled” storage model to handle millions of data points across years of history. It decouples the heavy analytical tasks—like k-means clustering and step-fit detection—from the user-facing web interface, utilizing an asynchronous “Start-Status-Result” pattern to maintain responsiveness.
A fundamental design choice in Perf is the translation of non-linear Git history into a linear, integer-based coordinate system (CommitNumber).
tracestore and dfbuilder operate on these tiles to fetch only the temporal slices necessary for a given request, preventing memory exhaustion.The system is highly multi-tenant. A single binary supports vastly different projects (e.g., v8 vs android) by interpreting an InstanceConfig.
config and validate modules perform semantic checks (e.g., dry-running Go templates and pre-compiling Regex) at startup to ensure the instance is logically sound before it handles data.sheriffconfig) and synchronized into the operational database, allowing teams to manage monitoring thresholds via standard code reviews.Heavy operations like regression detection and bisection are managed via Temporal workflows (workflows).
progress and dfiter modules facilitate a pattern where the frontend triggers a task and polls for updates, allowing the backend to handle long-running computations outside the HTTP request lifecycle.The project is organized into functional layers:
ingest & process: The continuous pipeline that monitors sources (GCS/PubSub), parses incoming files (parser), and populates the database.tracestore & sqltracestore: The low-level persistence layer. It separates numeric values from metadata (traceparamstore) to optimize query performance.perfgit: Manages the mapping between Git hashes and the internal CommitNumber timeline.regression: The central engine that coordinates regression detection across the commit history.clustering2 & kmeans: Implements shape-based grouping to find similar performance shifts across disparate tests.stepfit: Provides the mathematical logic for identifying “steps” (sudden jumps or drops) in individual traces.samplestats: Conducts statistical tests (Mann-Whitney U, Welch's T-test) to compare “before” and “after” samples.frontend: The web server orchestrator. It manages authentication, serves the UI, and coordinates between various backend stores.dataframe & dfbuilder: Constructs the matrix-like data structures used by the UI to render graphs and tables.ui/frame: The “brain” of the exploration page, handling complex query resolution and formula calculations (calc).notify: A modular delivery system that formats regressions into human-readable messages (HTML/Markdown) and dispatches them to Email or Issue Trackers.anomalygroup: Aggregates individual regressions into logical “groups” to prevent alert fatigue and streamline bisection.issuetracker: A high-level client for the Google Issue Tracker, automating bug filing and status updates.perf-tool: A Swiss-army-knife CLI for administrators to perform re-ingestion, backups, and database migrations.maintenance: A dedicated service role for background tasks like schema migration, cache warming (psrefresh), and data retention.ts: Automates the generation of TypeScript definitions from Go structs to ensure frontend/backend type safety.This workflow illustrates how a single performance measurement moves from a test bot to a user's screen.
[ Test Bot ] --(JSON)--> [ GCS Bucket ] | (PubSub Notification) | v [ Ingest Worker ] ----> [ Parser / Filter ] | | | (CommitNumber) <--- [ perfgit ] v | [ TraceStore ] <---------------+ | [ dfbuilder ] <--- [ Frontend Query ] | v [ DataFrame ] ----> [ Web UI (Graph) ]
This workflow shows how new data triggers automated analysis and alerting.
[ Ingest Event ] | v [ continuous/Detector ] ----> [ alert/ConfigProvider ] | | | (Fetch Alert Configs) <-----' | v [ regression/Detector ] ----> [ clustering2 / stepfit ] | | | (Anomalies Found) <---------' v [ anomalygroup ] ------------> [ notify ] | | | (Merge into Group) |-- [ Email ] | |-- [ IssueTracker ] v `-- [ Pinpoint (Bisection) ] [ Temporal Workflow ]
/proto: Defines the gRPC and storage contracts used for cross-service communication.infra/go/sql: Provides underlying SQL pooling and timeout management.infra/go/pubsub: Manages the event-driven triggers for the ingestion pipeline.The alertfilter module provides a centralized set of constants used to define the scope of alert visibility within the Skia Perf application. It acts as a shared vocabulary between the backend logic that queries alert configurations and the frontend components that allow users to toggle between different views of those alerts.
The primary design goal of this module is to eliminate “magic strings” and ensure consistency across the Perf codebase when filtering alerts. Alerts in Perf can be numerous, often belonging to different teams or individual developers. To make the system manageable, the UI provides mechanisms to filter these alerts based on ownership.
By defining these modes as constants, the system ensures that:
The module currently defines two primary filtering modes:
ALL: Represents a global view. This mode is used when a user or a service needs to inspect every active alert configuration within the instance, regardless of who created them or who is listed as the owner.OWNER: Represents a personalized view. This mode restricts the alert list to those specifically associated with the authenticated user. This is the primary mechanism for reducing noise in the dashboard, allowing developers to focus on the performance regressions they are directly responsible for.When a user interacts with the Perf alert dashboard, the filtering logic typically follows this flow:
User Interface Backend Handler Database/Store +--------------+ +-----------------+ +------------------+ | Select View | | Validate Filter | | Query Alerts | | (ALL/OWNER) |------> | using constants |----> | with Filter Type | +--------------+ +-----------------+ +------------------+ | v [Result Set Filtered]
This simple constant-based approach ensures that the “intent” of the user's filter is preserved and correctly interpreted as it passes through the various layers of the Perf service.
The alerts module provides the core data structures and logic for managing performance regression detection configurations in Perf. It defines how an alert is structured, how to derive specific trace queries from generalized alert configurations, and provides a caching layer to ensure high-performance access to these configurations during the anomaly detection process.
The module is designed around the concept of a “Dynamic Alert Configuration.” Rather than requiring a separate alert for every single hardware/software combination, the system allows for generalized queries that can be expanded into many specific sub-queries using a “Group By” mechanism. This reduces configuration toil while maintaining granular detection.
paramset of the data to expand a single Alert config into multiple specific queries. This allows an admin to say “alert on all models,” and the system will automatically generate a query for “model=nexus4”, “model=nexus6”, etc.ConfigState (ACTIVE to DELETED) to maintain historical context for previously detected anomalies.int64 database IDs and frontend JSON/JavaScript requirements, the module uses a custom SerializesToString type. This ensures that large integer IDs do not lose precision in the browser and that uninitialized IDs (like 0 for issue tracker components) are handled gracefully as empty strings.config.go)The Alert struct is the central entity. It contains:
Query string (URL-encoded params) and GroupBy fields.Algo (e.g., K-Means), Step detection settings, Radius (the window of commits to analyze), and Interesting (the threshold for regression).Alert email, IssueTrackerComponent) and what Action to take (report, bisect, or none).configprovider.go)Since anomaly detection is a frequent background process, querying the database for every check would be inefficient. The ConfigProvider implements a thread-safe, in-memory cache of all alert configurations.
Store to update the local cache.store.go)This file defines the Store interface, which abstracts the persistence layer. It supports standard CRUD operations and specialized batch operations like ReplaceAll (used for synchronizing alerts with external subscription files). Implementations of this interface (such as sqlalertstore) handle the mapping between the Go structs and the database schema.
When the detection engine processes an Alert, it doesn't just run the raw Query. It expands it based on the GroupBy field:
1. Alert Config: Query: "benchmark=blink_perf" GroupBy: "browser, machine" 2. ParamSet (Available Data): browser: [chrome, firefox] machine: [m1, m2] 3. Expansion (QueriesFromParamset): -> "benchmark=blink_perf&browser=chrome&machine=m1" -> "benchmark=blink_perf&browser=chrome&machine=m2" -> "benchmark=blink_perf&browser=firefox&machine=m1" -> "benchmark=blink_perf&browser=firefox&machine=m2"
The ConfigProvider ensures that the detection engine always has a low-latency view of the configurations:
Detection Engine ConfigProvider Alert Store (DB) | | | | GetAllAlertConfigs() | | |----------------------->| (Check Cache) | | [Alert List] | | |<-----------------------| | | | | | | <--- Periodically Refresh ---| | | List(includeDeleted) | | |----------------------------->| | | [Fresh Alerts] | | |<-----------------------------| | | (Update Internal Maps) |
sqlalertstore: The primary SQL implementation of the Store interface.mock: Mock implementations of Store and ConfigProvider for unit testing.perf/go/types: Provides shared enums and types like StepDetection and RegressionDetectionGrouping.The go/alerts/mock module provides autogenerated mock implementations of the interfaces defined in the go/alerts package. These mocks are built using testify, allowing developers to simulate the behavior of alert storage and configuration retrieval in unit tests without requiring a live database or complex setup.
The primary goal of this module is to decouple testing of higher-level components (like the anomaly detection engine or the UI handlers) from the underlying persistence layer. By providing programmable behaviors for alert configurations, tests can verify how the system reacts to specific alert states, missing configurations, or database errors.
The mocks are generated via mockery and adhere to the standard testify/mock pattern. Each mock struct includes a New[InterfaceName] constructor that automatically registers a cleanup function with the test runner (t.Cleanup), ensuring that expectations are asserted when the test finishes.
The ConfigProvider mock simulates an object responsible for providing read access to alert configurations. This is typically used by components that need to query alert settings frequently, possibly with caching logic in the real implementation.
GetAlertConfig and GetAllAlertConfigs.The Store mock simulates the persistent storage layer (usually backed by PostgreSQL). It encompasses the full CRUD lifecycle of an alert configuration.
Save, Delete, ReplaceAll) and complex read operations (List, ListForSubscription).ReplaceAll method accepts a pgx.Tx parameter. In the mock, this allows verifying that bulk updates are intended to be part of a transaction, even if no actual transaction is executed during the test.The mocks are utilized by setting expectations on specific method calls and defining what they should return.
[ Test Case ]
|
| 1. Create Mock: m := mock.NewStore(t)
|
| 2. Set Expectation: m.On("Save", ...).Return(nil)
|
| 3. Inject Mock into Component under test
|
| 4. Execute Logic
|
V
[ Assertions ] <--- (Automatic cleanup checks if "Save" was called)
By using these mocks, you can simulate failure modes that are difficult to trigger with a real database, such as specific pgx errors or race conditions where an alert is deleted between two different read operations.
The sqlalertstore module provides a SQL-backed implementation of the alerts.Store interface used in Perf. It manages the persistence, retrieval, and lifecycle of alert configurations, which define how the system detects anomalies in performance data.
To balance the need for high-performance querying with the flexibility required for evolving alert configurations, this module employs a hybrid storage strategy:
config_state and sub_name) are “promoted” from the JSON blob to top-level SQL columns. This enables the database to perform efficient indexing and filtering, which is essential for performance-sensitive tasks such as dashboard rendering and subscription-based alert processing.The primary struct SQLAlertStore implements the alerts.Store interface. It wraps a database connection pool (pool.Pool) and manages a pre-defined map of SQL statements.
The underlying table structure uses specific columns to optimize common workflows:
ACTIVE or DELETED). The store implements “soft deletes” by updating this column rather than removing rows, ensuring historical data remains intact while allowing the application to filter for active alerts quickly.sub_name and sub_revision link alerts to specific subscriptions. An index on sub_name ensures that ListForSubscription operations are highly performant.When an alert is saved, the store determines if it is a new entry or an update based on the presence of a valid ID. It serializes the entire configuration into JSON and extracts the relational fields for the SQL columns.
Application SQLAlertStore Database | | | | Save(SaveRequest) | | |------------------------------>| | | | Serializes Cfg to JSON | | | Identifies ID status | | | | | | INSERT/UPDATE ... | | |---------------------------->| | | | | Success/Error | <---------------------------| | <-----------------------------| |
ReplaceAll)This workflow is used when a set of alerts needs to be synchronized with an external source (like a subscription configuration). It operates within a single transaction:
ACTIVE alerts as DELETED.The List and ListForSubscription methods retrieve alerts and deserialize the JSON blobs back into Go structs. Because database results are not guaranteed to be ordered by application-level logic, the module explicitly sorts the resulting slice by DisplayName (and then by ID as a tie-breaker) before returning it to the caller.
Delete method performs an UPDATE statement setting config_state to 1 (DELETED) and updating the last_modified timestamp.alerts.Alert struct. During retrieval, the SQL id is injected back into the struct after unmarshaling to ensure the application remains synchronized with the database's primary key.last_modified and standard SQL transactions (in ReplaceAll), the store maintains consistency even when multiple processes attempt to update alert configurations simultaneously.The sqlalertstore/schema module defines the structural contract for persisting Perf alerts within a SQL database. It serves as the single source of truth for the database schema, ensuring that the Go representation of an alert maps correctly to the relational storage layer used by the sqlalertstore.
The schema employs a hybrid storage strategy, balancing relational querying capabilities with the flexibility of document storage:
Alert column. This allows the alert definition to evolve (adding or removing fields) without requiring frequent and expensive SQL migrations.The ConfigState column extracts the operational state of an alert (e.g., active, deleted) from the JSON blob. By storing this as an integer, the system can perform rapid lookups of all “active” alerts across thousands of entries without parsing JSON strings in the database engine.
Alerts in Perf are often tied to specific subscriptions. The schema explicitly tracks:
SubscriptionName: The identifier for the alert's origin.SubscriptionRevision: A pointer to the specific version of the subscription configuration.The module includes an explicit index (idx_alerts_subname) on the sub_name column. This design choice optimizes for the common workflow of retrieving all alerts associated with a specific subscription, which is a frequent operation in both the UI and the automated ingestion pipeline.
The LastModified column stores a Unix timestamp. This is primarily used for cache invalidation and ensuring that downstream consumers (like the anomaly detection engine) are operating on the most recent version of the alert definition.
The following diagram illustrates how the schema interacts with the application and storage layers:
Go Application Layer SQL Database Layer +-----------------------+ +----------------------------+ | | | Alerts Table | | alerts.Alert Struct | ----+ | [ID] (Primary Key) | | | | | | +-----------------------+ +-->| [Alert] (JSON Blob) | | | | | (Extraction Logic) +-->| [ConfigState] (Indexed) | | | | | +------------------+-->| [SubscriptionName] (Index) | | | | [LastModified] | +----------------------------+
The schema identifies two specific areas for technical debt reduction to improve performance and consistency:
TEXT to JSONB for the Alert column to allow for more efficient internal database indexing of the blob content.ConfigState representation across the Go codebase and SQL to prevent casting overhead and improve type safety.The go/anomalies module defines the core abstraction for interacting with performance anomalies (regressions) within Skia Perf. It provides a standardized interface for querying anomaly data, regardless of whether that data resides in the legacy Chrome Perf system or Skia Perf's native SQL-based regression store.
The primary goal of this module is to decouple the consumption of anomaly data—used for visualization, alerting, and analysis—from the underlying storage implementation and the specific protocols required to communicate with external APIs.
The central component of the module is the Store interface. This design choice allows the Perf system to remain agnostic about the data source. By using a single interface, the system can switch between a direct SQL backend, a proxied Chrome Perf API, or a cached implementation without modifying the business logic of the calling components.
The module leverages data structures defined in chromeperf.AnomalyMap and chromeperf.AnomalyForRevision. This maintains a consistent data contract between the frontend (which historically expected Chrome Perf formats) and various backends. This compatibility layer ensures that anomalies generated by different systems can be merged and displayed in a uniform way on performance dashboards.
The interface is designed to support the three primary ways users and automated systems interact with performance data:
anomalies.goThis file defines the Store interface, which is the foundational contract for the entire module.
GetAnomalies: Retrieves anomalies based on commit positions. It allows for filtering by specific traceNames. If the slice is empty, the implementation is expected to return all anomalies within the commit range.GetAnomaliesInTimeRange: Facilitates temporal lookups. Implementations (like the SQL-based one) often need to resolve these time ranges into commit ranges using a Git provider before querying the underlying database.GetAnomaliesAroundRevision: Provides a way to “zoom in” on a specific point in history, returning anomalies that occurred at or near a target revision.The module's functionality is extended and specialized through its submodules:
impl: Contains the concrete logic for data retrieval. This includes the sql_impl.go for native Skia Perf storage and chromeperf_impl.go for interacting with the legacy Google-internal Chrome Perf API.cache: Implements a middleware layer that wraps another Store. It uses LRU (Least Recently Used) caches and a time-based invalidation strategy to reduce the latency of repeated queries and minimize the load on the source-of-truth databases or APIs.mock: Provides autogenerated mocks for unit testing, allowing other modules to simulate various anomaly data scenarios (such as empty results or API errors) in a controlled environment.The following diagram illustrates how the Store interface acts as a gateway between the Perf UI/Services and the various data backends:
+---------------------------------------+ | Perf UI / Regression Detection | +---------------------------------------+ | v +-----------------------+ | anomalies.Store (I) | +-----------------------+ | +----------------+----------------+ | | | v v v +----------------+ +----------------+ +----------------+ | cache.Store | | sql.Store | | chromeperf. | | (Middleware) | | (Native DB) | | Store (API) | +----------------+ +----------------+ +----------------+ | | | +------ wraps ---+ +--- calls ---> Chrome Perf API
The module depends heavily on the perf/go/chromeperf package for its data models. This dependency reflects the module's role as a bridge between the modern Skia Perf infrastructure and the established data formats of the Chrome Performance monitoring ecosystem.
The anomalies/cache module provides a performance-optimized caching layer for anomaly data retrieved from the Chrome Perf API. It acts as an intermediary Store that reduces the load on external API services and improves the responsiveness of Skia Perf when querying for regressions and anomalies.
It is designed to handle three primary types of lookups:
The module utilizes two distinct LRU (Least Recently Used) caches to balance memory usage and performance:
testsCache: Indexed by a composite key of trace name and commit range. This handles the most frequent queries where users are looking at specific performance graphs.revisionCache: Indexed by revision number, supporting workflows that investigate specific changesets.A key challenge in caching anomaly data is that anomalies can be modified (e.g., marked as “invalid” or “fixed”) in the source system. To handle this, the module implements an invalidationMap.
Instead of a complex, fine-grained invalidation logic that would require deep inspection of every cache entry, the module uses a “simple and safe” approach:
InvalidateTestsCacheForTraceName, its name is added to a map.Standard LRU behavior handles capacity, but it doesn't account for data staleness. The module implements a background goroutine that periodically checks the oldest items in the cache against a Time-to-Live (TTL) of 10 minutes. This ensures that even low-traffic data is eventually refreshed to reflect the current state of the Chrome Perf database.
cache.goThis is the primary implementation file. It defines the store struct and the logic for the anomaly store.
GetAnomalies: Orchestrates a hybrid fetch. It checks the LRU cache for each requested trace. Any traces missing from the cache or marked in the invalidationMap are bundled into a single batch request to the ChromePerf client. The results are then merged and the cache is updated.cleanupCache: A background worker function that drains the LRU cache of items older than the cacheItemTTL. It specifically targets the “oldest” items to minimize the work performed during each tick.getAnomalyCacheKey: Generates a deterministic string key: traceName:startCommit:endCommit. This ensures that different ranges for the same trace are cached independently, preventing range-mismatch bugs.The following diagram illustrates how the GetAnomalies method handles a request for multiple traces:
User Request (Traces A, B, C) | v +---------+----------+ | Check testsCache | <-------+ +---------+----------+ | | | +-----+-----+ | | | | [A, B] Hit [C] Miss/Invalid | | | | | +-----v--------------+-----+ | | Fetch [C] from ChromePerf| | +------------+-------------+ | | | +------v------+ | | Update Cache| | +------+------+ | | +--------+---------+ | v Merged Results
The module heavily relies on the chromeperf.AnomalyApiClient interface. This decoupling allows the cache to be tested with mocks (as seen in cache_test.go) and ensures the caching logic remains independent of the underlying transport (HTTP/gRPC) used to communicate with Chrome Perf.
The go/anomalies/impl module provides concrete implementations of the anomalies.Store interface. Its primary purpose is to abstract the retrieval of performance anomalies (regressions) from different backends—specifically the legacy Chrome Perf API and the modern Skia Perf SQL-based regression store.
By providing a unified interface, this module allows the rest of the Perf system to query for anomalies using commit ranges, time ranges, or specific revisions without needing to know whether the data is coming from an external service or a local database.
chromeperf_impl.go)The store struct in this file acts as a proxy to the Chrome Perf Anomaly API. It is used in deployments where Skia Perf needs to display or synchronize with anomalies managed by the legacy Chrome Perf system.
chromeperf.AnomalyApiClient.chromeperf.AnomalyMap format.sql_impl.go)The sqlAnomaliesStore provides an implementation that retrieves anomalies directly from Skia Perf's own database by wrapping a regression.Store.
regression.Regression objects, this implementation uses compat.ConvertRegressionToAnomalies to transform them.multiplicities map during the conversion process to increment the Multiplicity field of the anomaly, ensuring each unique regression is identifiable even if they overlap in the commit/trace dimensions.git.Git provider. This is because Skia Perf's regression store is indexed by commit numbers. When a user requests anomalies for a time range, the store first uses the Git provider to resolve that time range into a slice of commits, determining the start and end commit positions before querying the database.When querying by time, the module performs a two-step resolution to bridge the gap between wall-clock time and the commit-indexed database.
User Request (Time Range) | v +-----------------------+ +-----------------------+ | sqlAnomaliesStore | | git.Git | | |----->| | | GetAnomaliesInTime... | | CommitSliceFromTime...| +-----------------------+ +-----------|-----------+ | | | <---------- Commit IDs --------+ v +-----------------------+ +-----------------------+ | regression.Store | | SQL Database | | |<---->| | | Range/RangeFiltered| | (Regressions Table) | +-----------------------+ +-----------------------+ | v Convert to Anomalies --------> Result Map
For the GetAnomaliesAroundRevision method, the SQL implementation implements a “sliding window” strategy. It defines a hardcoded window (currently 500 commits) around the target revision. This provides context to the user, showing not just an anomaly at a specific point, but also nearby fluctuations that might be related to the same root cause.
chromeperf package's data structures (like AnomalyMap and AnomalyForRevision). This design choice was made to maintain compatibility with existing UI components and tools that were originally built for the Chrome Perf ecosystem.sql_impl.go, the code distinguishes between fetching all regressions (Range) and fetching regressions for specific traces (RangeFiltered). This allows the module to offload filtering to the database layer when possible, rather than fetching all data and filtering in-memory.The go/anomalies/mock module provides a mock implementation of the anomaly storage interface used within the Perf system. Its primary purpose is to facilitate unit testing for components that depend on anomaly data without requiring a live connection to ChromePerf or a production database.
The module leverages the testify/mock framework to provide a programmable substitute for the anomaly Store. This approach allows developers to define expected behaviors, such as returning specific sets of anomalies or simulating network errors, ensuring that higher-level logic (like regression detection or UI rendering) handles various data scenarios correctly.
The mock is autogenerated based on the Store interface defined in the anomalies package. This ensures that the mock remains synchronized with the actual interface used by the system.
The core of this module is the Store struct. It implements the methods required to query anomaly data across different dimensions:
GetAnomalies method allows tests to simulate retrieving anomalies within a specific range of commit positions for a set of traces.GetAnomaliesInTimeRange method enables testing workflows that rely on temporal queries rather than commit sequences.GetAnomaliesAroundRevision method provides functionality to mock the retrieval of anomalies centered around a specific point in history, useful for validating “nearby” anomaly detection logic.When writing a test for a component that consumes anomalies, the mock.Store is instantiated and injected as a dependency. The general flow for using this module is as follows:
+------------------+ +------------------------+ +------------------+ | Unit Test | | Mock Store | | System Under Test| +------------------+ +------------------------+ +------------------+ | | | | | | | 1. Setup Mock |--------->| Register Expectations | | | | (On/Return) | | (e.g., GetAnomalies) | | | | | +------------------------+ | | | 2. Inject Mock |--------------------------------------------->| Execute Logic | | | +------------------------+ | | | | | 3. Intercept Call | <--------| Call Store API | | | | Return Mock Data | -------->| | | | +------------------------+ | | | 4. Verify | | | | | | AssertExpects |--------->| Check Call History | | | +------------------+ +------------------------+ +------------------+
NewStore(t) to create a new mock instance. This automatically registers cleanup functions to verify that all expected calls were made before the test finishes..On(...) syntax to define which parameters the mock should expect and what values (or errors) it should return.The anomalygroup module provides a centralized system for aggregating individual performance regressions (anomalies) into cohesive groups. In a high-scale performance monitoring environment, a single root cause (like a specific commit) often triggers multiple alerts across different benchmarks or configurations. This module shifts the workflow from managing hundreds of isolated alerts to managing a single “Anomaly Group,” which acts as the unit of work for bisection, bug reporting, and remediation tracking.
The primary goal of this module is to reduce “alert fatigue” and streamline root-cause analysis. It achieves this by correlating new anomalies with existing ones based on shared metadata (e.g., benchmark name, domain, and subscription) and overlapping commit ranges.
The module is structured as a tiered system:
Store interface and provides a SQL implementation for persisting group metadata and membership.The module follows a strict “find-or-create” pattern. When a regression is detected, the system does not immediately alert a human. Instead, it queries the Store for existing groups that match the anomaly's context.
A key design decision is the “Common Revision Range” logic. When an anomaly is added to a group, the group‘s start_commit and end_commit are narrowed to the intersection of the current group range and the new anomaly’s range. This ensures that a group only contains anomalies that could logically have been caused by the same commit.
The module uses JSONB (in the SQL implementation) for group metadata. This allows the system to store heterogeneous attributes like subscription_name or benchmark_name without requiring rigid schema migrations as the types of performance data evolve.
To prevent race conditions where multiple detection workers might try to create a group for the same regression simultaneously, the utility layer employs a global mutex during the “find-or-create” phase. This ensures that the mapping of anomalies to groups remains consistent and prevents duplicate bisection jobs or bug reports.
/go/anomalygroup/store.go)The Store interface defines the data access layer. It is responsible for the persistence of groups and their associations (Anomaly IDs and Culprit IDs).
AddAnomalyID: Not only links an ID but also performs the mathematical narrowing of the group's commit range.FindExistingGroup: The primary discovery mechanism used to deduplicate regressions./go/anomalygroup/service)The AnomalyGroupService implements the gRPC API. It acts as an aggregator, fetching data from the anomalygroup store while also pulling detailed regression data (like medians and trace params) to provide a rich view of the group's impact. It includes ranking logic to identify “top anomalies” within a group based on the percentage change in performance.
/go/anomalygroup/notifier)This acts as the glue between the Perf engine and the grouping logic. It filters out “summary-level” regressions (which are too broad for specific grouping) and constructs a canonical Test Path (e.g., master/bot/benchmark/test/subtest) required for consistent cross-referencing with external systems like Chromeperf.
/go/anomalygroup/utils)Contains the AnomalyGrouper, which handles the high-level business logic. It coordinates between the internal stores and external systems (Issue Trackers and Temporal). It decides whether to post a comment to an existing bug or trigger a new bisection workflow.
This workflow illustrates how a detected anomaly is integrated into the grouping system.
[ Perf Detection Engine ] | v [ AnomalyGroupNotifier ] --------------------. | | (Validate Trace & | Build Test Path) | | | v | [ AnomalyGrouper (Utils) ] <-----------------' | (Lock Mutex) | v [ FindExistingGroup? ] / \ (No) (Yes) | | v v [ Create New Group ] [ AddAnomalyID ] [ Trigger Bisection ] [ Update Issue/Bug ] | | '-------.---------' | (Unlock Mutex) | v [ Return GroupID ]
When adding an anomaly to a group, the module maintains the narrowest possible window for bisection:
Group Range: [100 ..................... 150] New Anomaly: [120 ............. 160] =============================== Result Range: [120 .......... 150] (Narrowest common overlap)
The go.skia.org/infra/perf/go/anomalygroup/mocks module provides mock implementations of the interfaces defined in the anomalygroup package. Its primary purpose is to facilitate unit testing for components that depend on anomaly group persistence and retrieval without requiring a live database or complex setup.
This module is generated using mockery and is based on the testify assertion framework. It focuses on mocking the Store interface, which is the primary abstraction for managing groups of performance anomalies. By using these mocks, developers can simulate various database states, such as existing groups, missing records, or successful updates, to verify the business logic of higher-level services.
The Store struct is a mock implementation of the anomalygroup.Store interface. It records method calls and allows tests to define expected return values or behaviors (using On and Return methods from the testify mock package).
The mock covers the full lifecycle of an anomaly group:
Create and FindExistingGroup allows testing of the logic that decides whether a new anomaly should start a new group or join an existing one based on subscription name, domain, and commit range.AddAnomalyID and AddCulpritIDs enable verification that the system correctly links individual detections and identified root causes to a group.UpdateBisectID and UpdateReportedIssueID are used to test the integration between anomaly grouping and external systems like bisection services and issue trackers.LoadById, GetAnomalyIdsByAnomalyGroupId, and GetAnomalyIdsByIssueId support testing data access patterns and reporting logic.When writing a test for a component that manages anomalies, the mock is typically initialized and injected as a dependency.
Tester Mock Store Component Under Test
| | |
|-- NewStore(t) ------->| |
| | |
|-- On("LoadById")... ->| |
| | |
|-- Inject Store ------>|-------------------------->|
| | |
|-- Trigger Action -------------------------------->|
| | |
| |<-- Call LoadById() -------|
| | |
| |--- Return AnomalyGroup -->|
| | |
|-- AssertExpectations()| |
mockery ensures that the mock stays in sync with the Store interface defined in the core anomalygroup package. If the interface changes, the mock can be regenerated to reflect the new API.github.com/stretchr_testify/mock provides a standard, expressive syntax for setting up expectations, making tests more readable and maintainable.go.skia.org/infra/perf/go/anomalygroup/proto/v1, ensuring that the data structures returned by the mock methods (like v1.AnomalyGroup) are exactly the same as those used by the real implementation.The anomalygroup/notifier module provides an implementation of a regression notifier that integrates with the Anomaly Grouping system. Instead of simply sending a static notification (like an email or chat message), this notifier delegates the handling of a detected regression to an AnomalyGrouper, which manages how regressions are aggregated, tracked, and associated with issues.
In the Perf system, a “Notifier” is typically responsible for alerting users when a regression is found. The AnomalyGroupNotifier fulfills this interface but focuses on the structured management of anomalies. Its primary role is to validate the incoming regression data, extract relevant metadata (such as “Test Paths”), and pass it to the anomalygroup/utils package for logic-heavy operations like finding or creating groups and updating issue trackers.
testPath string by concatenating parameters like master, bot, benchmark, test, and various subtest levels.The central struct that implements the notification interface. It holds a reference to an AnomalyGrouper.
vec32) for logging and diagnostic purposes.testPath.ProcessRegressionInGroup to handle the actual grouping and issue tracking.RegressionMissing and UpdateNotification are currently implemented as no-ops. This indicates that the anomaly grouping logic currently focuses on the discovery of regressions rather than the automated closing or updating of groups when a regression disappears.The functions getTestPath and isParamSetValid encapsulate the requirements for a regression to be “groupable.”
master, bot, benchmark, test, and subtest_1.master/bot/benchmark/test/subtest_1/.../subtest_3.The following diagram illustrates how a detected regression flows through this module:
Perf Detection Engine | v [AnomalyGroupNotifier.RegressionFound] | |-- (Validate: Is it a single trace?) |-- (Validate: Does it have required params?) |-- (Action: Construct Test Path) | v [AnomalyGrouper (via /go/anomalygroup/utils)] | |-- (Action: Find existing group or create new one) |-- (Action: Update Issue Tracker) | v End Result: Regression is grouped and tracked.
perf/go/anomalygroup/utils: Contains the AnomalyGrouper interface and the core logic for managing the lifecycle of an anomaly group.perf/go/alerts: Provides the alert configurations that trigger these notifications.perf/go/issuetracker: Used by the underlying grouper to link anomalies to bug reports.The /go/anomalygroup/proto module serves as the primary definition layer for the Anomaly Grouping system. It establishes the contract between the performance monitoring services and the data storage layer responsible for organizing regressions. This module is essential for transitioning from “individual data point alerts” to “structured performance investigations.”
While the actual implementation details and specific gRPC service definitions are contained within versioned subdirectories (e.g., v1), this root module acts as the entry point for cross-service communication regarding grouped performance regressions.
The choice to define anomaly groups using Protocol Buffers (protobuf) is driven by the need for interoperability across different microservices in the Perf ecosystem. By defining the “Anomaly Group” as a structured message, the system ensures that the detector, the bisection engine, and the reporting UI all share a consistent view of what constitutes a group, regardless of their specific internal languages or storage backends.
The module structure (specifically the use of the v1 subdirectory) reflects a design decision to support long-term API stability. Because Anomaly Groups are often linked to external trackers (like Monorail or Buganizer), the schema must evolve without breaking existing integrations. Versioning allows for:
The primary responsibility of this module is to provide the data models and service definitions required to manage the lifecycle of an anomaly group.
The module defines how individual anomalies (sketches of performance drops) are aggregated. Instead of treating every regression as a unique event, the proto definitions allow for a many-to-one mapping. This choice minimizes “alert fatigue” by ensuring that multiple regressions caused by the same commit or affecting the same benchmark are treated as a single unit of work.
The module hosts the gRPC service definitions that facilitate:
The proto definitions in this module facilitate the following logical flow across the Skia Perf system:
[ Perf Detector ] --> [ Proto: FindExistingGroups ]
|
+---( Existing Group? )
| |
(No) CreateNewGroup <----+ +----> (Yes) UpdateGroup
| |
v v
[ Anomaly Group Data Model: ActionType, Metadata, Anomaly IDs ]
|
+-----------> [ Bisection Service ]
|
+-----------> [ Reporting/Issue Service ]
By providing a unified message format, this module ensures that once a group is created or updated via the gRPC interface, all downstream services (like Pinpoint for bisection or the auto-filer for bug reports) can consume the data without needing to understand the underlying database schema.
The go/anomalygroup/proto/v1 module defines the core data structures and gRPC service interface for managing Anomaly Groups within the Perf system. Anomaly grouping is a critical abstraction used to cluster related performance regressions—typically those sharing a similar commit range, benchmark, or subscription—into a single actionable entity.
By grouping anomalies, the system can automate post-detection workflows, such as filing single bug reports for multiple related regressions or triggering bisection jobs to identify a specific culprit commit.
The design pivots around the GroupActionType enum (REPORT, BISECT, NOACTION). Rather than being a passive collection of data points, an anomaly group is defined by the intended outcome.
REPORT: Indicates the group is intended for manual review or automated bug filing.BISECT: Indicates the group is a candidate for automated regression testing (Pinpoint/Bisection) to find a culprit.The CreateNewAnomalyGroup RPC is explicitly designed to avoid binding a group to a single regression initially. This allows the system to find “existing” groups that match the criteria of a newly detected anomaly before creating a redundant group. This deduplication logic is supported by FindExistingGroups, which searches based on subscription_name, test_path, and commit ranges.
The AnomalyGroup message separates entity relationships (anomaly_ids, culprit_ids, reported_issue_id) from the metadata that defined the group (subscription_name, benchmark_name). This allows the service to track the evolution of a group as more anomalies are discovered or as a bisection job identifies specific culprits.
This gRPC service (anomalygroup_service.proto) is the primary interface for the anomaly management lifecycle.
FindExistingGroups to see if it fits into a current investigation. If not, CreateNewAnomalyGroup initializes a new tracker.UpdateAnomalyGroup is used to append culprit_ids (found by bisection) or issue_id (from a bug tracker).FindTopAnomalies provides a prioritized list of regressions within a group, allowing the system to pick the “most significant” anomaly to lead a bisection job.The Anomaly message acts as a bridge between Skia's internal regression format and the requirements of external tools (like Pinpoint). It captures:
paramset map translates Skia tags (e.g., stat, measurement) into the “bot/benchmark/story” format required by ChromePerf.median_before and median_after provide the raw data needed to calculate the magnitude of the regression.The following diagram illustrates how a new anomaly interacts with this module to determine if it should trigger a new action or join an existing investigation:
[ New Anomaly Detected ]
|
v
[ FindExistingGroups ] <---------- [ Search by Sub, Benchmark, & Commit ]
|
+---( Match Found? )---+
| |
YES | NO |
v v
[ UpdateAnomalyGroup ] [ CreateNewAnomalyGroup ]
(Append Anomaly ID) |
| v
| [ Determine Action ]
| (BISECT or REPORT)
| |
+----------+-----------+
|
v
[ Anomaly Group State ]
- List of Anomaly IDs
- Culprit IDs (if bisected)
- Issue ID (if reported)
anomalygroup_service.proto: The source of truth for the API and data models.anomalygroup_service.pb.go: Contains the generated Go structs for messages and enums, including the GroupActionType logic.anomalygroup_service_grpc.pb.go: Contains the gRPC client and server interfaces used by Perf components to communicate with the anomaly group store.This module provides mock implementations of the AnomalyGroupService defined in the version 1 Protocol Buffers for the Perf system. Its primary purpose is to facilitate isolated unit testing of components that interact with anomaly grouping logic. By using these mocks, developers can simulate various service behaviors—such as successful data retrieval, persistence errors, or specific search results—without requiring a live gRPC server or an underlying database.
The mocks in this module are built using the stretchr/testify/mock framework. This choice allows for a declarative style of testing where expectations (input arguments) and returns (output data or errors) are defined before the execution of the code under test.
A key implementation detail in AnomalyGroupServiceServer.go is the manual embedding of v1.UnimplementedAnomalyGroupServiceServer.
Unimplemented version of the server struct. This ensures forward compatibility; if new methods are added to the Protobuf definition, existing implementations (including mocks) will still satisfy the interface by inheriting the default “Unimplemented” behavior for the new methods.mockery generation tool occasionally fails to include this embedding, it was added manually. This ensures the mock type remains a valid AnomalyGroupServiceServer even as the service definition evolves.The module provides a NewAnomalyGroupServiceServer constructor that integrates with Go's testing.T. It automatically registers a cleanup function via t.Cleanup. This design ensures that mock.AssertExpectations(t) is called at the end of every test, verifying that all expected calls to the service were actually made, which prevents “silent” test passes where expected logic was bypassed.
This is the central mock struct. It mirrors the gRPC server interface and provides hooks for the following service responsibilities:
CreateNewAnomalyGroup and UpdateAnomalyGroup allow tests to simulate the creation and modification of anomaly clusters.LoadAnomalyGroupByID and FindExistingGroups allow callers to simulate the lookup of groups based on specific identifiers or search criteria.FindTopAnomalies facilitates testing logic that prioritizes or filters specific anomalies within a group context.The typical workflow involving this module focuses on intercepting calls between a high-level business logic component and the anomaly group persistence layer.
[ Test Case ]
|
| (1) Set Expectations:
| mock.On("LoadAnomalyGroupByID", ...).Return(fakeGroup, nil)
v
[ Component Under Test ]
|
| (2) Call: LoadAnomalyGroupByID(ctx, req)
v
[ AnomalyGroupServiceServer (Mock) ]
|
| (3) Matches arguments and returns fakeGroup
v
[ Component Under Test ]
|
| (4) Process fakeGroup and perform assertions
v
[ Test Case ]
|
| (5) Cleanup: Verify all mock expectations were met
The anomalygroup/service module provides a gRPC implementation for managing and querying Anomaly Groups. Anomaly groups are logical collections of performance regressions (anomalies) that share common characteristics, such as being detected within the same benchmark, subscription, or commit range.
This service acts as an orchestration layer that interfaces with underlying storage systems for anomaly groups, culprits, and regressions to provide a unified API for the Skia Perf backend.
The service is responsible for the lifecycle and metadata management of grouped anomalies:
When searching for existing groups (FindExistingGroups), the service parses a TestPath string. It expects a specific hierarchical format (e.g., domain/bot/benchmark/measurement/test). The service specifically extracts the Domain and Benchmark to query the store, effectively grouping anomalies that occur on the same benchmark even if they are on different bots or specific test sub-metrics.
The FindTopAnomalies functionality implements a specific ranking strategy:
MedianBefore and MedianAfter.subtest_3, then subtest_2, then subtest_1 in the paramset. This prioritization ensures the most specific test description available is returned.The service enforces a strict schema for anomaly metadata via isParamSetValid. It requires the presence of specific keys (bot, benchmark, test, stat, subtest_1) and ensures that these keys contain exactly one value. This ensures consistency when these anomalies are exported or displayed in the UI.
The primary struct implementing the gRPC server defined in anomalygroup/proto/v1. It integrates the following dependencies:
The UpdateAnomalyGroup method acts as a multi-purpose update sink. Depending on the fields populated in the request, it routes to different store operations:
Request (UpdateAnomalyGroup) | |-- Has BisectionId? ----> anomalygroupStore.UpdateBisectID | |-- Has IssueId? --------> anomalygroupStore.UpdateReportedIssueID | |-- Has AnomalyId? ------> regressionStore.GetByIDs (to get commit range) | | | +-> anomalygroupStore.AddAnomalyID | +-- Has CulpritIds? -----> anomalygroupStore.AddCulpritIDs
When identifying the most significant regressions in a group:
Load Group by ID | Fetch all Regression details for AnomalyIds in Group | For each Regression: Calculate: (MedianAfter - MedianBefore) / MedianBefore | Sort descending by calculated diff | Take Top N (Limit) | Extract specific params (bot, benchmark, measurement, etc.) | Return Anomaly list
The sqlanomalygroupstore module provides a SQL-backed implementation for managing anomaly groups in the Perf system. It transitions the system from a “per-anomaly” management style to a “group-centric” workflow, allowing related performance regressions to be handled as a single unit for bisection, bug reporting, and state tracking.
In performance monitoring, a single underlying issue often triggers multiple anomalies across different bots or benchmarks. Treating these as independent events leads to redundant bisections and fragmented issue tracking. This module solves that by providing a persistent store to aggregate these anomalies.
The store acts as the source of truth for the lifecycle of a regression:
The implementation balances relational structure with the flexibility needed for heterogeneous performance data.
group_meta_data field uses JSONB to store attributes like subscription_name, domain_name, and benchmark_name. This avoids rigid schema migrations when new metadata categories are introduced while still allowing for efficient SQL filtering via JSON path expressions.AnomalyIDs and CulpritIDs are stored as UUID ARRAY (or text arrays). This allows the system to retrieve all members of a group in a single row fetch, optimizing for read-heavy “group view” operations.common_rev_start and common_rev_end are stored directly on the group. This denormalization allows the system to perform fast range-based lookups (e.g., “Find all groups affecting commit X”) without joining against hundreds of individual anomaly records.The store implements specific logic when adding an anomaly to an existing group via AddAnomalyID. Rather than just appending an ID, it updates the group‘s common_rev_start and common_rev_end using GREATEST and LEAST functions respectively. This ensures the group’s “Common Revision Range” always represents the narrowest overlapping window shared by all member anomalies, which is essential for accurate bisection.
The FindExistingGroup method is the entry point for anomaly deduplication. When a new anomaly is detected, the system queries for existing groups that match the metadata and whose revision range overlaps with the new anomaly.
New Anomaly Detected | v Check Store: FindExistingGroup() (Match Metadata + Revision Range Overlap) | +----[ Match Found ]----> AddAnomalyID() | (Narrows common_rev_start/end) | +----[ No Match ]-------> Create() (Starts new group lifecycle)
The module provides dedicated update methods to link the group to external entities.
UpdateBisectID: Links the group to a specific bisection job.UpdateReportedIssueID: Links the group to a bug in an issue tracker.These links prevent duplicate actions. For example, the system can query GetAnomalyIdsByIssueId to find all data points associated with a specific bug, facilitating “cluster” views of performance regressions.
sqlanomalygroupstore.go: Implements the AnomalyGroupStore struct and its methods. It contains the raw SQL logic for Spanner/PostgreSQL, including complex array unnesting for ID lookups and JSONB extraction for group metadata.schema/: Defines the database layout and provides the conceptual “why” behind the table structures, such as the use of temporal tracking for audit trails.sqlanomalygroupstore_test.go: Validates the SQL logic using a real database instance (Spanner), specifically testing edge cases like revision range narrowing and UUID validation.The schema package defines the structured data model for storing and managing anomaly groups within a SQL database. It serves as the single source of truth for the database layout used by the sqlanomalygroupstore, ensuring that anomaly aggregations, their associated metadata, and subsequent remedial actions are persisted consistently.
In the Perf system, individual anomalies are often related by shared characteristics such as benchmark, bot, or revision range. The AnomalyGroupSchema is designed to transition from a “per-anomaly” view to a “group-centric” view. This grouping is critical for:
The core structure represents a single row in the AnomalyGroups table. The implementation choices reflect a balance between strict relational integrity and the flexibility required for evolving metadata.
Identity and Temporal Tracking: Each group is assigned a UUID (ID) to prevent collisions across distributed systems. It tracks CreationTime and LastModifiedTime to allow the cleanup of stale groups and to provide audit trails for when a group's state last changed.
Anomalies and Culprits (Array Storage): The schema utilizes UUID ARRAY types for AnomalyIDs and CulpritIDs. This design choice favors read performance for group-specific views, as it allows the system to retrieve the entire membership list of a group in a single row fetch, avoiding the overhead of a separate many-to-many mapping table for common operations.
Dynamic Metadata (JSONB): The GroupMetaData field is implemented as a JSONB object. This is a deliberate choice to accommodate the heterogeneous nature of performance data. While currently used for tracking subscriptions and benchmark identifiers, the JSONB format allows the system to store additional context (like environment variables or hardware configurations) without requiring a schema migration every time a new metadata tag is introduced.
Denormalized Revision Ranges: CommonRevStart and CommonRevEnd represent the overlapping revision range shared by all anomalies within the group. These values are recalculated and updated as the group grows. By storing these directly on the group record, the system can quickly identify which groups are relevant to a specific commit range during bisection lookups.
Action and Workflow State: The schema integrates directly with the alerting and bisection workflows through fields like Action, BisectionID, and ReportedIssueID.
Action acts as a state machine indicator (e.g., report, bisect).ActionTime tracks when these external processes were triggered to prevent duplicate actions during subsequent scanning loops.The following diagram illustrates how the schema fields are populated and updated during the lifecycle of an anomaly group:
Discovery Phase Aggregation Phase Action Phase (Anomaly Detected) (Group Created/Updated) (Remediation) | | | v v v [ Individual Anomaly ] ----> [ AnomalyGroupSchema ] ----> [ Bisection Job ] | - CommonRevStart/End | | - BisectionID | - AnomalyIDs (Array) | <---+ | - GroupMetaData | | | - Action ('bisect') | ----+ | +--------------> [ Issue Tracker ] | - ReportedIssueID
This workflow ensures that as the system moves from detecting a regression to investigating it, the AnomalyGroupSchema remains the central repository for the group's evolving state and history.
The anomalygroup/utils module provides the logic for organizing individual performance regressions (anomalies) into cohesive groups. Instead of treating every detected regression as an isolated event, this module attempts to correlate new anomalies with existing ones based on shared metadata like subscription names, commit ranges, and test paths. This grouping mechanism is critical for reducing alert fatigue and enabling automated root-cause analysis workflows, such as bisection.
The module is designed around a “find-or-create” pattern for anomaly groups, prioritizing the consolidation of information into existing groups to maintain a single source of truth for related issues.
sync.Mutex during the grouping process. This design choice addresses the potential for race conditions where multiple parallel processing containers might simultaneously attempt to create a new group for the same set of regressions.REPORT (creating/updating bug tracker issues) and BISECT (triggering automated culprit finding). The implementation chooses how to update external systems (like the Issue Tracker) based on these action types.The AnomalyGrouper interface defines the contract for processing a regression within the context of grouping. The primary implementation, AnomalyGrouperImpl, acts as a coordinator between the Perf backend services, the Issue Tracker, and the Temporal workflow engine.
anomalygrouputils.go)The core logic resides in ProcessRegression. Its responsibilities include:
FindExistingGroups to see if the new anomaly fits into an active group based on its subscription and commit range.MaybeTriggerBisection Temporal workflow.FindIssuesToUpdate)This helper function encapsulates the logic for mapping an AnomalyGroup back to physical issue IDs.
REPORT actions, it looks for a specifically linked ReportedIssueId.BISECT actions, it queries the backend for issues associated with “culprits” (identified causes) linked to the group.The following diagram illustrates how the module handles an incoming regression and decides whether to create a new group or update an existing one.
[ New Regression Detected ]
|
v
+--------------------------+
| Lock Grouping Mutex | (Prevent race conditions)
+--------------------------+
|
v
+--------------------------+ YES +----------------------------+
| Find Existing Groups? |-------------->| 1. Link Anomaly to Groups |
+--------------------------+ | 2. Find Associated Issues |
| | 3. Post Updates to Issues |
| NO +----------------------------+
v |
+--------------------------+ |
| 1. Create Anomaly Group | |
| 2. Link Anomaly to Group | |
| 3. Trigger Temporal WF | |
+--------------------------+ |
| |
v v
+-----------------------------------------------------------------------+
| Unlock Mutex & Return |
+-----------------------------------------------------------------------+
The anomalygroup/utils/mocks module provides mock implementations of the interfaces defined within the anomalygroup utility suite. Its primary purpose is to facilitate unit testing for components that depend on anomaly grouping logic—specifically the categorization and association of regressions into logical groups—without requiring a live database or the complex state management associated with real anomaly grouping operations.
The module utilizes testify/mock to provide a programmatic way to simulate the behavior of the AnomalyGrouper interface.
The core design decision here is the use of automatically generated mocks (via mockery). This approach ensures that the mock implementation remains strictly in sync with the parent interface. By generating these mocks in a dedicated package, the project maintains a clean separation between production code and testing utilities, preventing test dependencies (like testify) from polluting the production binary.
The mock is designed to support:
This file contains the AnomalyGrouper struct, which mocks the primary service responsible for regression management.
The central responsibility of this mock is to simulate the ProcessRegressionInGroup workflow. In a real-world scenario, this method involves complex logic to determine if a new anomaly should be joined to an existing group or start a new one based on metadata. The mock simplifies this for callers by allowing them to define expectations:
Input Parameters: - ctx: Request context. - alert: The alert configuration that triggered the detection. - anomalyID: The unique identifier for the detected regression. - startCommit/endCommit: The range where the regression occurred. - testPath/paramSet: Metadata describing the specific trace and attributes. Return Values: - string: The ID of the anomaly group the regression was assigned to. - error: Any simulated operational failure.
The mock includes a NewAnomalyGrouper constructor that integrates with the Go testing.T cleanup lifecycle, ensuring that any unmet expectations (e.g., a method was expected to be called but wasn't) are automatically reported as test failures.
When a component (such as a regression detector or a notification manager) identifies a regression, it interacts with the AnomalyGrouper. The mock allows you to simulate this interaction:
+-------------------+ +-----------------------+ +-------------------------+
| Unit Test | | Component Under Test | | Mock AnomalyGrouper |
+---------+---------+ +-----------+-----------+ +------------+------------+
| | |
| 1. Set Expectations | |
| (On ProcessRegressionInGroup)| |
+---------------------------->| |
| | |
| 2. Trigger Action | |
+---------------------------->| 3. Call Process... |
| +--------------------------->|
| | |
| | 4. Return Preset Result |
| |<---------------------------+
| 5. Assert Requirements | |
+---------------------------->| |
The go/backend module implements the internal gRPC service architecture for the Skia Perf application. It serves as a centralized, non-user-facing API layer designed to decouple the frontend from heavy background operations and workflow orchestrations.
The backend service acts as a standard interface contract between different components of the Perf cluster. By isolating logic such as manual Pinpoint job triggering, anomaly group management, and culprit tracking into a dedicated service, the system ensures that user-facing components (the frontend) remain responsive.
This architecture allows for significant backend implementation changes—such as swapping out the underlying workflow engine (Temporal) or database logic—without requiring modifications to the frontend or other calling services.
BackendService interface, which requires providing an AuthorizationPolicy. This policy is then enforced by a unified gRPC interceptor.Backend struct is initialized with various “stores” (AnomalyGroup, Culprit, Subscription, Regression). This allows the service to remain agnostic of the specific storage implementation (e.g., Spanner vs. CockroachDB) while facilitating easier unit testing through mocks.backend.go)This is the core orchestrator. Its primary responsibility is the lifecycle management of the gRPC server. During initialization, it:
pinpoint.go)A specialized wrapper around the Pinpoint service logic. It bridges the Perf backend to the Pinpoint bisection engine. Its primary role is to expose gRPC endpoints that allow the Perf UI to trigger and monitor performance bisection jobs. It implements strict role-based access control, typically requiring the Editor role.
/shared)Contained within the shared sub-package, the AuthorizationPolicy structure defines the security contract for every endpoint. It supports:
/client)To ensure uniformity across the codebase, this sub-module provides a factory for creating gRPC clients. It abstracts away the complexities of:
grpc.Dial with appropriate interceptors.The following diagram illustrates how the backend service starts up and wires its internal dependencies:
[ Config File ] -> [ validate.LoadAndValidate ] | v [ Storage Builders ] -> [ NewAnomalyGroupStore ] [ NewCulpritStore ] [ NewRegressionStore ] | v [ External Services ] -> [ NewTemporalClient ] [ GetDefaultNotifier ] | v [ Service Registry ] -> [ NewPinpointService ] [ NewAnomalyGroupServ ] [ NewCulpritService ] | v [ gRPC Server ] <------- [ Apply Auth Interceptors ] | +--> [ Listen on Port (e.g., :8005) ] +--> [ Enable Reflection ] +--> [ Serve Traffic ]
backendserver: The executable entry point that parses CLI flags and calls the backend initialization logic.testdata: Contains environment-specific configurations (like demo.json) used to bootstrap the service in development or CI environments.The backendserver module provides the executable entry point for the Perf backend service. Its primary purpose is to act as a thin wrapper that bootstraps the backend environment, parses operational configuration from the command line, and initiates the long-running service process. It bridges the gap between the infrastructure's execution environment and the core logic defined in the perf/go/backend package.
The module is designed around the urfave/cli framework to ensure that the service is highly configurable and self-documenting.
config.BackendFlags struct to define its requirements. This allows the deployment system to pass parameters directly, facilitating easier integration with container orchestration tools.main.go file intentionally contains minimal logic. It delegates the heavy lifting—such as database connections, caching, and API routing—to the perf/go/backend package. This ensures that the core backend logic is decoupled from the CLI interface, making the system easier to test and reuse in different contexts.The core responsibility of main.go is to define the command structure for the backend. It currently supports a run command, which serves as the primary execution path for the service.
When the run command is executed:
config.BackendFlags into CLI flags.backend.New(), passing the parsed flags. While the current implementation passes nil for several parameters (likely reserved for dependency injection or specialized handlers), this is where the system's core components are wired together.Serve(), which enters the main event loop of the backend, handling incoming requests until an interrupt signal is received.The following diagram illustrates the initialization and execution flow of the backendserver:
[ OS Args ] | v [ CLI Flag Parser ] ----> [ Log Configuration ] | v [ backend.New() ] <----- [ BackendFlags ] | +--> [ Instantiate internal components ] +--> [ Setup Listeners/Handlers ] | v [ b.Serve() ] <--------- [ Infinite Loop ] | +--> [ Accept RPC/HTTP Requests ] +--> [ Process Data ]
perf/go/backend: Contains the actual service implementation. The backendserver is essentially a caller for this package.perf/go/config: Defines the schema for the backend's configuration.go.skia.org/infra/go/urfavecli: Provides the standardized CLI scaffolding used across Skia infrastructure projects.The backend/client module serves as the central factory for establishing gRPC connections to various Perf backend services. It abstracts the complexities of authentication, transport security, and connection management, providing a unified interface for other components of the system to communicate with backend microservices like Anomaly Groups, Culprits, and Pinpoint.
The module is designed around the concept of a shared connection utility (getGrpcConnection). By centralizing how gRPC connections are dialed, the system ensures consistent application of security policies and authentication headers across all clients. This approach allows developers to instantiate high-level service clients without needing to understand the underlying networking or security configuration of the cluster.
The client supports two primary connection modes based on the environment and specific service requirements:
InsecureSkipVerify: true. This decision reflects a common internal networking pattern where communication stays within a trusted VPC/cluster boundary, making full certificate chain validation secondary to ensuring encrypted transit.PerRPCCredentials to the gRPC connection, ensuring that every request is authorized with the appropriate identity (scoped to userinfo.email).The module relies on the global perf/go/config to determine the target host (BackendServiceHostUrl). This allows the same binary to target different backend instances based on the deployment configuration. Additionally, every client factory supports an override parameter, facilitating flexible routing for integration tests or cross-cluster communication.
This is the primary implementation file containing the logic for connection lifecycle management and client instantiation.
getGrpcConnection): This internal function manages the grpc.Dial process. It handles the logic for choosing between insecure credentials and the TLS/OAuth2 stack.NewPinpointClient: For interacting with the Pinpoint service.NewAnomalyGroupServiceClient: For managing and querying anomaly groups.NewCulpritServiceClient: For accessing information regarding identified culprits.The following diagram illustrates the internal process when a consumer requests a new service client:
[ Consumer Call ]
|
v
[ Check if Backend Enabled? ] ---- No ----> [ Return Error ]
|
Yes
|
[ Determine Host URL ] <--- (Override or Global Config)
|
[ Create gRPC Connection ]
|
+---- If Secure: [ Fetch OAuth Token ]
| [ Configure TLS (Skip Verify) ]
|
+---- If Insecure: [ Use Insecure Creds ]
|
[ grpc.Dial(host, opts) ]
|
v
[ Wrap Connection in Service Client ]
|
v
[ Return (e.g., AnomalyGroupServiceClient) ]
perf/go/anomalygroup/proto/v1: Provides the interface for anomaly group interactions.perf/go/culprit/proto/v1: Provides the interface for culprit tracking.pinpoint/proto/v1: Provides the interface for Pinpoint integration.go/auth: Used for managing Google-based authentication scopes.The backend/shared module serves as a centralized location for common data structures and logic used across various backend services within the Perf system. Its primary purpose is to standardize how cross-cutting concerns—specifically security and access control—are defined and enforced across different service implementations.
The core of this module is the AuthorizationPolicy structure. Rather than hard-coding permission checks within individual RPC handlers or middleware, this module provides a declarative way to define access requirements. This approach decouples the “rules” of the service from the “engine” that enforces them.
AuthorizedRoles (service-wide) and MethodAuthorizedRoles (method-specific), the system allows developers to define a baseline security posture for a service while overriding or tightening requirements for sensitive operations.go/roles package. This ensures that the backend uses a unified identity and permission vocabulary, preventing discrepancies where different services might interpret “Admin” or “Viewer” differently.AllowUnauthenticated flag allows the policy to explicitly document when a service is intended to be public. This makes security audits easier, as public-facing endpoints are opted-into explicitly rather than being the default state.When a request enters a backend service, the service implementation typically references an AuthorizationPolicy instance to determine if the request should proceed.
Incoming Request | v [ Auth Middleware ] <--- References --- [ AuthorizationPolicy ] | | +---- (1) Is AllowUnauthenticated? ------+--> [ Allow ] | YES | | | +---- (2) Does user have a role in ------+--> [ Allow ] | MethodAuthorizedRoles[RPC]? | | YES | | | +---- (3) Does user have a role in ------+--> [ Allow ] | AuthorizedRoles? | | YES | | | +---- (4) No conditions met -------------+--> [ Deny (403) ]
authorization.go: Defines the AuthorizationPolicy struct. This file is the source of truth for how backend services should describe their security requirements. It acts as the contract between service definitions and the middleware responsible for enforcing those definitions.The /go/backend/testdata directory serves as a repository for static configuration files used to simulate real-world runtime environments during development, testing, and demonstration of the Perf backend. Rather than relying on hardcoded defaults within the Go source code, this module provides a centralized location for JSON-based configurations that define how a Perf instance behaves, connects to data sources, and interacts with external services.
The primary motivation for maintaining this module is to provide a “Single Source of Truth” for a functional Perf deployment environment that can be spun up locally or in a CI environment.
By using demo.json, the system achieves:
Config struct used within the backend, ensuring that changes to the configuration format are reflected in a working example.demo.json)This file is the core of the module. it defines a comprehensive instance profile. Its responsibilities include:
chrome-perf-demo) and mapping the local communication ports for both the frontend and backend services.cockroachdb as the storage engine and defining the tile_size (e.g., 256). This choice impacts how the backend optimizes data retrieval for trace queries../demo/data/) rather than a cloud-based Pub/Sub or GCS bucket. This is crucial for offline development and rapid prototyping of data parsers.X-WEBAUTH-USER), and Git repository synchronization. By pointing to a public demo repo (perf-demo-repo.git), it allows the system to demonstrate commit-linking functionality without requiring private credentials.The backend utilizes these files to bootstrap its internal services. The flow generally follows this pattern:
[ Backend Startup ]
|
V
[ Load /go/backend/testdata/demo.json ]
|
+-----> [ Initialize CockroachDB Connection ]
| (Using connection_string)
|
+-----> [ Initialize Ingestion Service ]
| (Watching ./demo/data/ for new trace files)
|
+-----> [ Sync Git Provider ]
| (Cloning/Updating /tmp/perf-demo)
|
+-----> [ Apply Auth/Notification Policies ]
(Setting header names and issue tracker secrets)
This structure ensures that the backend can transition from a “demo” state to a “production” state simply by swapping the configuration file, keeping the underlying binary logic identical across environments.
The bug module provides a specialized utility for generating bug reporting URLs within the Perf application. Its primary purpose is to bridge the gap between performance regression detection and issue tracking by dynamically populating bug templates with contextual metadata.
The module is built around the concept of URI templates. Rather than hard-coding support for specific issue trackers (like Monorail or GitHub Issues), it utilizes a template-based approach to remain agnostic of the underlying bug-tracking system. This allows administrators to configure different reporting destinations without modifying the source code.
The core logic relies on the RFC 6570 URI Template standard via the uritemplates library. This ensures that all components of the URL—specifically those containing special characters like query parameters in a cluster link—are correctly escaped and encoded to prevent broken links in the resulting bug report.
Template Expansion (bug.go) The module exposes the Expand function, which serves as the primary entry point. It takes a raw template string and injects three critical pieces of context:
cluster_url: A direct link to the Skia Perf cluster view where the regression was identified.commit_url: The link to the specific git commit (provided via provider.Commit) suspected of causing the regression.message: User-provided commentary or summary of the issue.The function handles the mapping of these domain-specific concepts to the template variables, ensuring that the integration between the performance monitoring UI and the bug tracker is seamless.
The following diagram illustrates how the module transforms raw performance data and user input into a navigable bug report link:
[Perf UI / Detection] [Git Provider] [User Input]
| | |
v v v
(clusterLink) (commit.URL) (message)
| | |
+-------------------------+------------------+
|
v
+----------------------+
| bug.Expand | <--- [URI Template]
+----------------------+
|
v
[Encoded Reporting URL]
|
v
(Opens in User Browser)
The module includes an ExampleExpand function and associated tests to verify that the encoding logic correctly handles complex URLs. This is particularly important for the cluster_url, which often contains its own set of encoded query parameters that must be safely nested within the final bug reporting URL.
The go/builders module serves as the central factory for the Skia Perf application. It is responsible for instantiating complex objects—such as data stores, version control interfaces, and file sources—by interpreting a central config.InstanceConfig object.
The primary motivation for this module is to resolve cyclical dependencies. Many sub-packages within Perf (like tracestore or regression) need to know about the configuration, but the configuration logic often needs to reference these packages to define how they are initialized. By centralizing the “construction” logic here, other packages can remain focused on their specific domains without needing to know how their peers are instantiated or how the global configuration is structured.
A key implementation choice is the use of a Singleton Database Pool. Since a Perf instance typically talks to a single backend (like Spanner or PostgreSQL), the module maintains a global singletonPool. This prevents the application from accidentally opening multiple connection pools to the same database, which could exhaust file descriptors or database connection limits.
The module handles the lifecycle of the database connection pool.
NewDBPoolFromConfig: This is the core initializer. It parses connection strings, configures connection limits (MaxConns and MinConns), and wraps the raw pool in a timeout layer to ensure query hygiene.expectedschema to ensure the database is compatible with the current version of the code before the application starts processing traffic.The module provides New[Component]StoreFromConfig functions for every major data entity in Perf. These functions encapsulate the logic of choosing between different implementations (e.g., SQL-based vs. Cache-backed stores).
sqltracestore instances. It also manages the initialization of InMemoryTraceParams to optimize trace lookups.sqlregressionstore and sqlregression2store based on the UseRegression2 config flag.The builders resolve how Perf reads incoming data files:
NewSourceFromConfig: Determines whether data should be pulled from Google Cloud Storage (GCSSource) or a local directory (DirSource).NewIngestedFSFromConfig: Provides a standard Go fs.FS interface to the underlying storage, allowing the rest of the application to treat GCS and local filesystems interchangeably.The GetCacheFromConfig function determines the caching layer for queries. It supports:
The typical flow for initializing a component involves resolving the database pool first, then passing it into the specific constructor for the requested store.
Config Object (InstanceConfig) | v [ NewDBPoolFromConfig ] <-----------+ | | (Check Schema) | v +------> [ singletonPool ] ---+ | (Thread-safe) | | | v v [ New...StoreFromConfig ] [ NewPerfGitFromConfig ] | | +---> Returns Interface +---> Returns perfgit.Git (e.g. alerts.Store)
singletonPool is protected by a sync.Mutex (singletonPoolMutex) to ensure that concurrent calls to initialize the database during startup do not create race conditions or multiple pools.pgxLogAdaptor is implemented to redirect internal database driver logs (from pgx) into the standard sklog system, ensuring unified log formatting across the application.go/sql/pool/wrapper/timeout. This enforces that every context passed to a database operation has a deadline, preventing “hanging” queries from blocking the application indefinitely.The chromeperf module provides a comprehensive Go client and integration layer for interacting with the Chrome Performance Monitoring (Chromeperf) ecosystem. Its primary responsibility is to bridge the gap between Skia Perf's internal data structures and the legacy Chromeperf APIs, specifically focusing on anomaly detection, regression reporting, and alert group management.
The module acts as a translation and transport layer, allowing Skia Perf to:
TestPath format.A key architectural decision is the use of skia-bridge-dot-chromeperf.appspot.com as the default endpoint. While a legacy direct path to chromeperf.appspot.com exists, the module defaults to the bridge. This design allows for a more stable interface and potentially specialized authentication/filtering logic between the two systems. The ChromePerfClient interface abstracts this, supporting URL overrides for local development and testing.
The SendPostRequest and SendGetRequest implementations in chromeperfClient.go incorporate specific logic for “accepted status codes.” Unlike standard HTTP clients that might treat any 2xx as success, this module allows callers to define exactly which codes are valid for a given operation. For example, ReportRegression accepts 404 as a non-error state in specific scenarios where parameter names differ between systems, preventing transient synchronization issues from triggering hard failures in the Skia backend.
Chromeperf identifies performance series using a hierarchical string (e.g., Master/Bot/Benchmark/Test/Subtest), whereas Skia Perf uses a flat map of key-value pairs. The TraceNameToTestPath function implements a deterministic mapping strategy:
master -> bot -> benchmark -> test -> subtest_1...N._avg, _max), the translator can optionally append suffixes based on Skia's stat parameter to ensure lookups hit the correct legacy series.Skia Perf restricts certain characters in trace keys (like ? or :), replacing them with underscores. To prevent this from breaking the ability to query the original data source, the module utilizes a ReverseKeyMapStore. This allows the system to “remember” that a sanitized Skia value like cpu_io actually corresponds to a Chromeperf value of cpu:io.
anomalyApi.go)This is the core functional area of the module. It defines the Anomaly struct, which contains extensive metadata about performance shifts (medians before/after, P-values, segment sizes, and bug tracking information).
ReportRegression sends new detections to Chromeperf to trigger the alerting pipeline.GetAnomalies) and time-based (GetAnomaliesTimeBased) queries.UnmarshalJSON method for Anomaly handles legacy numeric IDs by transparently converting them to strings, ensuring compatibility with different versions of the Chromeperf backend.alertGroupApi.go)Manages the grouping of anomalies. It provides methods to fetch details for a specific group key. A critical function here is GetQueryParams, which parses the anomaly list within a group to generate Skia-compatible query parameters, allowing users to jump from a Chromeperf alert group directly to a Skia Perf visualization of all affected traces.
chromeperfClient.go)The low-level transport implementation. It handles:
userinfo.email scope.The following diagram illustrates how the module transforms a Skia trace request into a Chromeperf anomaly set:
[ Skia Perf ] Trace Name: ",master=CP,bot=M1,benchmark=SunSpider,test=total,stat=value," | v [ TraceNameToTestPath ] Converts to: "CP/M1/SunSpider/total_avg" | v [ AnomalyApiClient.GetAnomalies ] POST /anomalies/find { "tests": ["CP/M1/SunSpider/total_avg"], ... } | v [ Chromeperf Backend ] Returns: { "anomalies": { "CP/M1/SunSpider/total_avg": [ {Anomaly_Data} ] } } | v [ getAnomalyMapFromChromePerfResult ] 1. Maps "CP/M1/SunSpider/total_avg" back to the original Skia Trace Name. 2. Resolves Git Hashes to Commit Numbers using perfgit.Git. | v [ AnomalyMap ] { "trace_name": { CommitNumber: Anomaly } }
compat/: A translation layer that converts internal Skia regression.Regression objects into chromeperf.Anomaly structures.sqlreversekeymapstore/: A SQL-backed implementation of the ReverseKeyMapStore for persisting character transformation mappings.mock/: Autogenerated mocks for unit testing components that depend on these APIs.The compat module provides a translation layer between Skia Perf's internal regression formats and the legacy ChromePerf (Anomaly) data structures. Its primary purpose is to ensure interoperability during the transition or integration period where Skia Perf needs to communicate regression data to systems that still rely on the ChromePerf “Anomaly” schema.
The module simplifies the complex, multi-dimensional data captured in a Skia regression.Regression object into a flat, trace-oriented chromeperf.AnomalyMap.
The translation logic addresses several structural differences between the two systems:
TestPath (e.g., Master/Bot/Benchmark/Test). This module handles the mapping of these identifiers to ensure regressions are attributed to the correct entities in legacy dashboards.StartRevision and EndRevision fields to satisfy the “range-based” anomaly model used by ChromePerf.TriageStatus into a string-based state and applies specific flags (like IgnoreBugIDFlag) when a regression is marked as “Ignored,” ensuring the legacy system respects the triage decisions made in Skia.The core functionality is encapsulated in ConvertRegressionToAnomalies. The process follows this logical flow:
TestPath.Anomaly struct.CommitNumberAnomalyMap, indexed by the trace key, allowing callers to look up anomalies by their specific performance series.[regression.Regression] | v +-----------------------------+ | ConvertRegressionToAnomalies | +-----------------------------+ | |-- Extract TraceSet Keys |-- Resolve TestPaths (e.g. Master/Bot/...) |-- Map Medians & Revisions |-- Resolve Bug IDs & Triage State v [chromeperf.AnomalyMap] { "trace_key_A": { CommitNum: Anomaly }, "trace_key_B": { CommitNum: Anomaly } }
compat.go: Contains the primary conversion logic. It is responsible for the heavy lifting of data transformation, error handling for malformed trace names, and the temporary logic for narrowing down multiple bug assignments into a single field.compat_test.go: Validates the conversion accuracy across various scenarios, including successful mappings, handling of nil data frames, and ensuring that different triage statuses (like Ignored) result in the correct legacy flag values.The /go/chromeperf/mock module provides a suite of autogenerated mock implementations for the interfaces defined in the chromeperf package. These mocks are designed to facilitate hermetic unit testing of the Skia Perf service by simulating interactions with external Chrome Performance monitoring APIs and storage layers.
The module leverages the testify/mock framework and is maintained via mockery. This approach was chosen to ensure that the testing infrastructure remains synchronized with the primary interfaces. When the core chromeperf interfaces evolve—such as adding new parameters to anomaly queries or modifying the regression reporting structure—the mocks can be regenerated to reflect these changes, reducing the manual overhead of updating test suites.
By using these mocks, developers can:
AnomalyMap or ReportRegressionResponse without requiring a live backend.The AnomalyApiClient mock simulates high-level operations related to performance anomalies. It allows tests to define expectations for fetching anomaly data across several dimensions:
GetAnomalies and GetAnomaliesTimeBased allows tests to simulate data retrieval over commit ranges or specific time intervals.GetAnomaliesAroundRevision enables testing of logic that centers on a specific point in time or a specific commit.ReportRegression mock is critical for verifying the logic that identifies and pushes new performance regressions to the Chrome Perf dashboard, including the validation of metadata like median values before and after a change.This mock represents the lower-level transport layer. While AnomalyApiClient focuses on the “what” (anomalies), ChromePerfClient focuses on the “how” (generic HTTP-like requests). It mocks SendGetRequest and SendPostRequest, providing a way to test the underlying serialization and communication logic. This is particularly useful for verifying that the correct API endpoints and query parameters are constructed before being sent over the wire.
The ReverseKeyMapStore mock facilitates testing of the data translation layer. In Skia Perf, keys or trace names may be modified or obfuscated for storage or display. This mock simulates the persistence and retrieval of mappings between “modified” values and “original” values. It allows tests to verify that the system can correctly resolve internal identifiers back to their source values during data processing or anomaly reporting.
The standard workflow for utilizing these mocks involves setting expectations within a Go test, injecting the mock into the component under test, and asserting that the interactions occurred as predicted.
+-------------------+ +-----------------------+ +-------------------------+ | Go Test | | Component Under Test | | Mock AnomalyApiClient | +-------------------+ +-----------------------+ +-------------------------+ | | | | 1. Set Expectations | | |---------------------------->| | | (On "GetAnomalies").Return()| | | | | | 2. Execute Logic | | |---------------------------->| | | | 3. Call API Method | | |------------------------------>| | | | | | 4. Return Mock Data | | |<------------------------------| | 5. Assert Expectations | | |---------------------------->| | | (AssertExpectations) | |
The New... constructor functions in each file include a Cleanup registration. This design ensures that AssertExpectations is automatically called at the end of each test, preventing “silent” failures where a test passes even if an expected API call was never actually made by the code.
The sqlreversekeymapstore module provides a persistent storage mechanism for mapping sanitized Skia Perf parameter values back to their original Chromeperf identifiers. This is a critical utility for maintaining interoperability between the two systems, particularly during anomaly detection and cross-platform data lookups.
When data flows from Chromeperf to Skia Perf, certain characters in test paths and parameter keys are considered “invalid” by Skia's internal naming conventions. To ensure compatibility, these characters are typically replaced with underscores (_).
This transformation is lossy. For example, both cpu:io and cpu-io might be sanitized to cpu_io. Because multiple distinct original values can map to the same sanitized value, it is impossible to programmatically “undo” the sanitization to find the original Chromeperf source of truth.
This module solves the problem by recording these transformations as they occur. By maintaining a lookup table, the system can deterministically resolve a sanitized Skia parameter back to the specific Chromeperf value it originated from, enabling accurate queries against Chromeperf's legacy APIs.
sqlreversekeymapstore.go)The core logic is encapsulated in the ReverseKeyMapStoreImpl struct. It abstracts the database interactions required to store and retrieve these mappings.
config.DataStoreType to select the appropriate SQL syntax, specifically handling differences in INSERT ... ON CONFLICT behavior.Create method is designed to be safe for concurrent or repeated calls. If a mapping for a specific ModifiedValue and ParamKey already exists, the database ignores the new insertion attempt.Get method allows callers to provide a sanitized value and its associated parameter key to retrieve the original string.The underlying database table, ReverseKeyMap, is structured to optimize for lookup speed and data consistency:
(modified_value, param_key). This ensures that for any given parameter category (like a test path component), a sanitized string can only point to one “correct” original string.The following diagram demonstrates the lifecycle of a parameter value as it moves from Chromeperf to Skia and back again via the store:
[ Chromeperf ] [ Sanitization ] [ Skia Perf ]
Original Value ---> Transformation ---> Modified Value
"cpu:io" ( ":" -> "_" ) "cpu_io"
| |
| [ Store.Create ] |
+----------------------------------------------+
|
[ SQL ReverseKeyMap ]
Modified: "cpu_io"
ParamKey: "test_path"
Original: "cpu:io"
|
+------------------------+
| [ Store.Get ]
v
[ Original Restored ] <--- Used for Anomaly Lookups in Chromeperf
New(db pool.Pool, dbType config.DataStoreType): Initializes the store with the appropriate SQL dialect based on the database provider.Create(ctx, modifiedValue, key, originalValue): Persists a new mapping. Returns the originalValue if successful, or an empty string/error if a collision or validation issue occurs.Get(ctx, modifiedValue, key): Retrieves the original value associated with the sanitized input. If no mapping exists, it returns an empty string without an error, signifying that no transformation was recorded for that specific pair.The sqlreversekeymapstore/schema module defines the database structure required to maintain a mapping between sanitized Skia Perf parameter values and their original Chromeperf counterparts. This mapping is essential for maintaining interoperability between the two systems, specifically during anomaly lookups and cross-platform queries.
When data is migrated or uploaded from Chromeperf to Skia Perf, “invalid” characters within test paths are replaced with underscores to comply with Skia’s data requirements. Because this transformation is lossy (multiple distinct original characters might all be mapped to the same underscore), it is mathematically impossible to deterministically reconstruct the original Chromeperf test path from the modified Skia Perf path without external metadata.
Without this schema, querying Chromeperf for anomalies based on a Skia Perf test path would be unreliable, as the system would not know which original characters the underscores represent.
By storing these transformations as they occur, the system can perform a reverse lookup to find the “source of truth” original value. The design assumes that the set of unique test paths is relatively stable; therefore, while the table grows initially as new paths are encountered, the storage overhead is expected to plateau once all existing test paths have been processed.
This file defines the ReverseKeyMapSchema struct, which represents the relational table structure. The schema is designed around three primary attributes:
The schema enforces uniqueness through a composite primary key consisting of the ModifiedValue and the ParamKey.
ModifiedValue and ParamKey as the primary key, the database is optimized for the most common workflow: taking a known Skia Perf parameter and looking up its original Chromeperf identity.The following diagram illustrates how this schema facilitates communication between the two systems:
Chromeperf Path Skia Perf Path Reverse Key Map
(Original) (Sanitized) (Database Store)
---------------- --------------- -----------------------------
"master/bot/cpu:io" -> "master/bot/cpu_io" -> Modified: "cpu_io"
ParamKey: "test_path"
Original: "cpu:io"
|
|
[Anomaly Detection] <- [Query Original] <- [Lookup via ModifiedValue]
The clustering2 module provides the logic for grouping performance traces based on their shapes using the k-means algorithm. It is primarily used within the Perf framework to identify patterns in telemetry data, such as regressions or improvements, by clustering similar behavioral trends across different test configurations.
The module is designed around the concept of “trace shapes.” Instead of looking at individual data points, it treats a series of values over time (a trace) as a multi-dimensional vector. By clustering these vectors, the system can discover that a specific set of tests all experienced a similar performance shift at the same point in time, even if the absolute values of their metrics differ.
arch=x86) within that cluster.clustering.go)The primary entry point is CalculateClusterSummaries. It orchestrates the following workflow:
dataframe.DataFrame into a slice of kmeans.Clusterable objects. Traces are normalized or processed via ctrace2 to ensure the clustering is based on the shape of the data rather than absolute magnitude.KMEAN_EPSILON).ClusterSummary: Contains the centroid data, the list of representative trace keys, the results of the step-fit analysis, and a summary of the parameters common to the cluster.ClusterSummaries: A container for all clusters found during a single run, including metadata like the $K$ value used and the standard deviation threshold.valuepercent.go)This component analyzes the metadata keys of all traces in a cluster to identify commonalities.
ValuePercent: Represents how often a specific key=value pair appears as a percentage of the total cluster size.SortValuePercentSlice function implements a specialized sorting logic. It groups values by their key (e.g., all config values together) and then sorts those groups by the highest percentage. This ensures that the most dominant traits of a cluster appear at the top of the report.DataFrame (Traces) | v [Convert to Clusterable Traces] <--- Normalize shapes | v [Initialize K Centroids] <--------- Randomly select K traces | +----[ Loop: K-Means Iteration ] | | | v | [Assign Traces to Nearest Centroid] | [Recalculate Centroid Positions] | [Calculate Total Error] | | +----------+--- (Break if Error Change < EPSILON) | v [Post-Processing] | +--> [Fit Centroids to Step Functions] +--> [Calculate Parameter Percentages] +--> [Sort Members by Distance to Centroid] | v ClusterSummaries (Final Result)
Distance implementation provided by the ctrace2 package's ClusterableTrace, which typically measures the similarity between two floating-point arrays.ctrace2.CalculateCentroid).CalculateClusterSummaries call, though it accepts a context.Context for cancellation and a Progress callback to report the total error back to the caller/UI.The go/config module defines the structural and semantic requirements for configuring a Skia Perf instance. It serves as the single source of truth for the application's runtime behavior, governing how data is ingested, stored, queried, and notified.
Perf is a highly configurable system designed to handle diverse performance data sources. The configuration system is built around a central InstanceConfig struct, which is typically populated from a JSON file at startup. This module handles:
Rather than maintaining a separate JSON schema file and Go struct, this module uses the invopop/jsonschema library. By performing reflection on the InstanceConfig struct, the system generates instanceConfigSchema.json. This ensures that any change to a Go field (like adding a new QueryConfig parameter) is automatically reflected in the validation logic and IDE autocompletion for configuration authors.
Validation is split into two distinct phases to maximize reliability:
validate submodule. This is crucial because a configuration might be valid JSON but logically broken (e.g., a notification template referencing a non-existent variable, or a Regex that uses unsupported syntax).Standard Go time.Duration serializes to an integer (nanoseconds) in JSON, which is not human-readable. The module implements a custom DurationAsString type. It supports Marshaling/Unmarshaling strings like "2h" or "10m", making the JSON configuration files much easier for humans to maintain and review.
config.go)The root configuration object. It aggregates several sub-configs, each responsible for a specific subsystem:
DataStoreConfig: Defines where trace data lives. It supports Spanner as the primary datastore and allows configuring connection pools and caching layers (either in-memory LRU or Memcached via CacheConfig).IngestionConfig & SourceConfig: Control the flow of data into Perf. It defines where files come from (Google Cloud Storage or local directories) and how to handle arrival events via PubSub (including “Dead Letter” topics for failing messages).GitRepoConfig: Configures how Perf interacts with source control. It supports both CLI-based git and the Gitiles API. It also handles “commit number” logic, allowing Perf to map git hashes to sequential integers used for graphing.NotifyConfig & IssueTrackerConfig: Manage regression alerts. These utilize Go text templates for subjects and bodies, allowing instances to customize how they report anomalies to developers.QueryConfig: Customizes the “Explore” UI. It allows instances to set default parameter selections (e.g., always default stat to value) and define “Conditional Defaults” (e.g., if a user selects metric=cpu, automatically suggest stat=avg)./validate)This submodule ensures the provided JSON is safe to run. It doesn‘t just check syntax; it performs “dry runs” of notification templates and compiles all regular expressions to ensure they are compatible with Go’s RE2 engine.
The module provides AsCliFlags() methods for different service types (BackendFlags, FrontendFlags, IngestFlags). This allows the various Perf microservices to share a consistent set of command-line arguments (like --config_filename and --connection_string) while keeping their specific needs isolated.
The following process describes how a configuration file moves from a static file to a running service:
[ config.json ] | v +-----------------------+ | Structural Check | Checks: JSON types, required fields, | (JSON Schema) | and valid nesting. +-----------------------+ | v +-----------------------+ Checks: | Semantic Validation | - Do Go templates compile? | (validate.go) | - Are Regex patterns valid RE2? +-----------------------+ - Are TileSizes logically consistent? | v +-----------------------+ | Global Config State | The validated object is stored in | (config.Config) | config.Config for the app to use. +-----------------------+
MaxSampleTracesPerCluster: Limits the number of traces shown in a cluster summary (default: 50) to maintain UI performance.QueryMaxRunTime: Hard limit (10 minutes) on trace queries to prevent runaway database processes from exhausting resources.MinStdDev: The floor for normalization (0.001); values smaller than this are treated as zero to avoid division-by-zero or noise amplification in regression detection.The /go/config/generate module serves as a bridge between Go type definitions and runtime configuration validation. Its primary responsibility is to ensure that the InstanceConfig struct—the central configuration object for Perf—is accurately represented as a JSON Schema.
By automating the generation of this schema, the system guarantees that any structural changes made to the configuration in Go code are immediately reflected in the validation logic. This prevents the “drift” that often occurs when manual documentation or separate validation files are maintained alongside source code.
The module is implemented as a minimal Go binary designed to be executed via go generate.
The core logic utilizes the jsonschema utility package to perform reflection on the config.InstanceConfig struct. This process transforms Go-specific metadata (such as struct tags, nested types, and field types) into a formal JSON Schema specification.
This approach was chosen to maintain a single source of truth. Instead of manually writing a JSON Schema to validate incoming configuration files, the Go struct itself defines the constraints. The generated schema at ../validate/instanceConfigSchema.json then acts as a portable artifact that can be used by:
The generation process follows a linear path from Go source to a serialized JSON file:
[ Go Source Code ] | | (reflection) v [ InstanceConfig Struct ] ----> [ jsonschema generator ] | | (serialization) v [ instanceConfigSchema.json ]
config package (where the business logic definitions reside) with the jsonschema package (the transformation engine). It targets a specific output path in the validate directory, ensuring the generated schema is placed where the validation logic expects it.InstanceConfig struct from //perf/go/config is the critical input. The generator relies on the struct tags (like json:) and documentation comments within that struct to produce a human-readable and accurate schema.The go/config/validate module provides a robust validation layer for Skia Perf instance configurations. Its primary purpose is to ensure that JSON configuration files are not only structurally sound according to a schema but also semantically valid for the Perf runtime environment.
Configuration in Perf is complex, involving regular expressions, Go templates for notifications, and interdependent database settings. Simple JSON schema validation is insufficient for catching errors like an invalid regex or a notification template that references a non-existent field. This module bridges that gap by performing deep inspection of the configuration object before the application starts.
The validation process follows a two-tier approach:
instanceConfigSchema.json) to ensure types, required fields, and nesting are correct.instanceConfigSchema.json)The module embeds a JSON schema that defines the structure of an InstanceConfig. This schema is the first line of defense, ensuring that mandatory blocks like data_store_config, ingestion_config, and git_repo_config are present. It also constrains the allowed properties for various sub-configs (e.g., QueryConfig, AuthConfig), preventing “silent” typos in configuration keys.
validate.go)The core validation logic resides in the Validate function. It performs several critical checks:
MarkdownIssueTracker, the validator doesn't just check if the template is valid Go syntax; it attempts to actually “dry-run” the template. It mocks data for commits, alerts, and clusters to ensure that the user-provided templates (subject and body) can be successfully expanded without runtime errors.invalid_param_char_regex are compiled using Go's regexp package. This ensures that the patterns are compatible with RE2 syntax. Specifically, for invalid_param_char_regex, the validator enforces that the regex must match both a comma (,) and an equals sign (=), as these are fundamental delimiters in the Perf trace system.notifications is set to a specific tracker type, the corresponding API key secrets are also provided. It also validates that CommitChunkSize in the query config is logically consistent with the TileSize in the data store config.testdata/ and validate_test.go)The module includes a comprehensive suite of fixtures to prevent regressions:
invalid_regex.json (testing unsupported RE2 features like lookaheads) and invalid-notify-template.json (testing references to non-existent template fields).The following diagram illustrates the lifecycle of a configuration file as it passes through this module:
[ JSON Config File ]
|
v
+-----------------------+
| JSON Schema Check | ----> [ Fail: Invalid types/missing keys ]
+-----------------------+
|
v
+-----------------------+ +-----------------------------------+
| Semantic Validation | | - Compile Regex |
| (Validate) | <--> | - Dry-run Notification Templates |
+-----------------------+ | - Verify cross-field logic |
| +-----------------------------------+
v
+-----------------------+
| Load into Global Mem | ----> [ Success: Perf proceeds to boot ]
| (config.Config) |
+-----------------------+
The module provides two primary entry points:
InstanceConfigFromFile: Reads a file from disk, performs schema validation, unmarshals it into the Go struct, and then runs semantic validation.LoadAndValidate: A higher-level wrapper that logs schema violations to the system logs and populates the global config.Config singleton if validation passes. This is typically called during the initial setup of the Perf server.The testdata module provides a suite of JSON-based test fixtures used to verify the robustness of configuration validation logic. Its primary purpose is to exercise the parser's ability to distinguish between structurally sound configurations and those that contain semantic errors in complex fields, such as Go templates and regular expressions.
The data within this module is structured to target specific failure modes that are difficult to catch with simple schema validation:
valid-notify-template.json) containing all supported variables (e.g., .Commit.GitHash, .Alert.DisplayName) and failure cases (invalid-notify-template.json). This allows the validator to ensure that templates do not reference non-existent properties or use invalid syntax, which would otherwise lead to runtime errors during alert generation.regexp package uses the RE2 syntax, which does not support certain features like lookahead assertions. The invalid_regex.json file specifically includes a lookahead pattern ((?=...)) to verify that the validator correctly identifies and rejects patterns that are incompatible with the underlying Go environment.empty.json file serves as a baseline for testing how the system handles null or empty inputs, ensuring that required fields are enforced and default values are applied correctly when an incomplete configuration is provided.The module is categorized by the specific validation criteria it aims to test:
The files valid-notify-template.json and invalid-notify-template.json define the expected interface for the notification engine.
Commit, Alert, Cluster, and StepFit objects.{{ .Alert.DirectionAsString }} or {{ index .ParamSet "device_name" }}.Configuration fields like invalid_param_char_regex and commit_number_regex are validated to ensure they can be compiled by the application.
invalid_regex.json provides a negative test case for patterns that might be valid in other engines (like Perl or JavaScript) but are unsupported in the project's Go environment.empty.json tests the “fail-fast” capability of the validator. It ensures that the application does not attempt to boot with a blank configuration, requiring at least the presence of mandatory blocks like auth_config or data_store_config.The following diagram illustrates how these files are typically utilized by the validation logic:
[ Configuration File ] [ Validation Logic ] [ Result ]
| | |
|---(Load JSON)--------------->| |
| |-- Check Structure |
| |-- Compile Templates |
| |-- Compile Regex |
| | |
|<-----------------------------|---(Report Errors)-------|
| | |
V V V
(testdata/*.json) (config/validate) (Pass / Fail)
By maintaining these fixtures, the module ensures that any changes to the configuration schema or the notification engine's data model are accompanied by corresponding updates to the validation suite, preventing regressions in configuration parsing.
The ctrace2 module provides the bridging logic between raw performance trace data and the kmeans clustering engine. It defines how individual performance traces are normalized, compared, and averaged to facilitate anomaly detection and pattern discovery in Perf.
The primary goal of ctrace2 is to transform raw, noisy performance data into a standardized mathematical representation. Performance traces often vary significantly in scale (e.g., one test might take 10ms while another takes 500ms), making direct comparison difficult.
To solve this, ctrace2 implements the ClusterableTrace struct, which satisfies the interfaces required by the kmeans package. This allows the clustering algorithm to treat performance traces as points in an N-dimensional space.
A key design choice in this module is the mandatory normalization of data via NewFullTrace. Before a trace can be used for clustering, it undergoes two critical transformations:
vec32.Fill. This ensures that traces with intermittent data can still be compared.vec32.Norm). This shift from absolute values to relative “shapes” allows the system to cluster traces that show similar behavior (e.g., a 10% performance regression) even if their absolute magnitudes differ.The normalization includes a minStdDev parameter to prevent the amplification of “flatline” noise; if a trace is almost perfectly flat, it will not be scaled up to unit standard deviation, as doing so would exaggerate insignificant measurement jitter.
This is the central data structure. It holds the identifying Key of a trace and its normalized Values.
ClusterableTrace that serves as the cluster's center.The following diagram illustrates how raw performance data is processed to become part of a cluster:
Raw Trace Data Normalization (NewFullTrace) K-Means Processing +-------------+ +---------------------------+ +-------------------+ | Key: "test" | | 1. Fill missing values | | Compare Distance | | [10, e, 12] | ===> | 2. Shift mean to 0 | ===> | to other traces | +-------------+ | 3. Scale std dev to 1 | +---------+---------+ +-------------+-------------+ | | | v v +-------------------+ +-------------------+ | ClusterableTrace | <-------- | CalculateCentroid | | [ -1.2, 0.1, 1.1] | | (Average of group)| +-------------------+ +-------------------+
special_centroid to distinguish the “average” shape from the actual measured data traces.The go/culprit module serves as the central authority for managing “culprits”—specific commits or sets of commits definitively identified as the cause of performance regressions—within the Skia Perf ecosystem. It provides the infrastructure to persist culprit data, link it to detected anomalies, and orchestrate the notification process to alert developers via external issue trackers.
This module bridges the gap between the bisection engine (which discovers culprits) and the communication layers (which report them). It is responsible for the entire lifecycle of a culprit:
The module is structured as a gRPC service (/proto and /service). This design allows different components of the Skia Perf backend—such as automated bisection tools or manual triage UIs—to interact with culprit data through a unified interface.
A key architectural choice in the data schema (found in culprit_service.proto) is the local definition of the Anomaly structure. Instead of referencing external proto files from other services, culprit maintains its own representation. This ensures service independence: changes to how other modules represent anomalies won't cause cascading breaking changes in the culprit management logic.
Because performance alerts can be noisy, the service includes a “Subscription Guarding” mechanism (/service). Before sending a notification, the system checks an allowlist (SheriffConfigsToNotify). If a subscription is not yet verified, the service automatically reroutes the notification to a safe, internal “mock” destination. This allows for testing new configurations in production environments without spamming development teams.
The notification logic is strictly separated into two domains:
/formatter): Use Go's text/template engine to turn protobuf data into Markdown. This allows for flexible, instance-specific report styling without changing the underlying logic./transport): Handle the actual network communication (e.g., Google Issue Tracker API). This allows the system to swap delivery methods (or use a NoopTransport for local development) without affecting the notification orchestration./service and /proto)The orchestration layer. It implements PersistCulprit and NotifyUserOfCulprit. It coordinates between the storage layer and the notifier to ensure that when a bug is filed, the resulting Issue ID is recorded back into the database, creating a bidirectional link between the regression and the ticket.
/sqlculpritstore and store.go)The persistent storage implementation.
anomaly_group_id rather than creating a duplicate, ensuring a single commit's impact is tracked holistically.JSONB to store the GroupIssueMap, allowing it to track which specific anomaly group triggered which specific bug report in a single, efficient record./formatter)The “Logic-to-Markdown” engine. It takes Culprit and Anomaly protos and applies templates to generate subjects and bodies. It includes helper functions to calculate percentage changes and build URLs to the Perf UI or Git hosts.
/notify)The coordinator for alerts. It takes a request to notify, fetches the content from a Formatter, and passes it to a Transport.
The following diagram illustrates how a culprit is processed from the moment a bisection tool identifies it to the point an external bug is filed:
[ Bisection Engine ] [ Culprit Service ] [ SQL Store ] [ Issue Tracker ] | | | | |-- 1. PersistCulprit -->| | | | (Commit, GroupID) |-- 2. Upsert() ----->| | | | | | |-- 3. NotifyCulprit --->| | | | (CulpritID) |-- 4. Get Culprit -->| | | | | | | |-- 5. Format Msg ----| | | | | | | |-- 6. Transport Send -------------------->| | | | | | |<-- 7. Issue ID --------------------------| | | | | | |-- 8. AddIssueId() ->| | |<-- 9. Success (ID) ----| | |
The module manages the complex relationship between commits and regressions:
The formatter module is responsible for transforming raw performance regression data into human-readable notifications. It sits between the regression detection logic and the notification delivery systems (such as issue trackers or email services), ensuring that alerts contain actionable context like commit links, benchmark details, and performance delta percentages.
The module provides a standardized way to generate subjects and message bodies for two primary scenarios:
By decoupling the data representation from the final message format, the system allows for flexible reporting that can be customized per-instance via configuration files.
The core implementation uses Go‘s text/template engine. This choice allows the formatting logic to remain generic while supporting complex data injection. The MarkdownFormatter uses predefined default templates but can be overridden by an instance’s IssueTrackerConfig.
This design supports:
TemplateContext (for culprits) and ReportTemplateContext (for anomaly groups), which include metadata about the subscription, the commit, and the anomalies themselves.buildCommitURL, buildGroupUrl, and buildAnomalyDetails within the template engine. This moves complex string manipulation (like calculating percentage changes or formatting bot names) out of the raw template and into tested Go code.The NewMarkdownFormatter implements a “fallback” pattern. If the InstanceConfig does not provide a specific subject or body template, the module uses hardcoded defaultNewCulpritSubject, defaultNewReportBody, etc. This ensures the system is always capable of sending a notification even with a minimal configuration.
The Formatter is defined as an interface. This abstraction allows the Perf system to swap implementations easily:
formatter.go)Defines the contract for all formatting implementations. It requires two methods:
GetCulpritSubjectAndBody: Formats a message for a specific Culprit proto.GetReportSubjectAndBody: Formats a summary for an AnomalyGroup and its associated list of Anomaly protos.formatter.go)The primary implementation. It stores compiled templates and instance-specific URLs (like the host URL and commit URL templates). During initialization, it parses the templates and attaches the functional maps required to generate links.
The following diagram illustrates how the formatter processes an anomaly group into a notification:
[ Data Source ] [ MarkdownFormatter ] [ Output ]
| | |
|-- AnomalyGroup -------->| |
|-- Subscription -------->|-- Resolve Templates ------|
|-- Top Anomalies ------->|-- Execute Funcs: ---------|
| | * buildGroupUrl |
| | * buildAnomalyDetails --|--> Subject String
| | |--> Body (Markdown)
noop.go)A stub implementation that returns empty strings. It serves as a safe default when no notification formatting is required, preventing nil pointer exceptions in the orchestration services.
Culprit commit information and the Subscription details (e.g., the name of the team or component being notified).AnomalyGroup (group ID, benchmark name) and a list of TopAnomalies, which are the most significant regressions selected to represent the group.The go.skia.org/infra/perf/go/culprit/formatter/mocks module provides automated mock implementations of the Formatter interface. This module exists to facilitate unit testing for components within the Perf system that handle notifications and reports related to performance regressions (culprits) and anomaly groups.
The primary design goal is to allow developers to test high-level notification logic without depending on the actual formatting logic (which typically involves complex template rendering or external metadata lookups). By using these mocks, tests can verify that the system correctly passes data to the formatter and handles the resulting subject lines and message bodies as expected.
The implementation utilizes the testify mock framework. This choice allows for expressive test assertions, such as ensuring that a specific culprit or subscription triggered a formatting request, or simulating error conditions during the message generation process.
The Formatter struct in Formatter.go is the central component of this module. It is a mock object that simulates the behavior of a culprit/anomaly formatter. It implements two primary functional workflows:
GetCulpritSubjectAndBody, the mock simulates the creation of notification content for a specific performance culprit. It accepts a Culprit proto and a Subscription proto, returning a mocked subject string, body string, and error.GetReportSubjectAndBody, the mock simulates the creation of reports for collections of anomalies. This is used in testing workflows where multiple regressions are aggregated into a single notification for a specific subscription.In a typical test scenario, the mock acts as a stand-in for the real formatter to verify the orchestration logic of the notification service:
[ Test Case ] -> [ Notification Service ] -> [ Mock Formatter ]
| | |
|-- 1. Setup expectation ------------------>| (Expect GetCulpritSubjectAndBody)
| | |
|-- 2. Trigger Action ---->| |
| |-- 3. Call Format() --->|
| |<-- 4. Return Mock Data-|
| | |
|-- 5. Assert service used mock data ------>|
The NewFormatter function is the standard entry point for using this mock. It automatically registers cleanup functions with the Go testing framework (t.Cleanup), ensuring that expectations (e.g., “this method must be called exactly once”) are asserted at the end of the test execution without manual boilerplate.
The go/culprit/mocks module provides autogenerated mock implementations of the interfaces defined in the culprit package. Its primary purpose is to facilitate unit testing for components that depend on culprit persistence and retrieval without requiring a live database or a complex setup of the culprit.Store.
The module leverages testify/mock to provide a flexible way to simulate the behavior of the culprit storage layer. By using mocks, developers can:
anomaly_group_ids or commit slices.This file contains the Store struct, which mocks the primary interface for managing culprits. It is generated via mockery and mirrors the methods required to interact with culprit data in the Perf system.
The mock provides implementations for the following critical workflows:
Upsert): Allows tests to simulate the creation or updating of culprits associated with specific anomaly groups. It mimics the behavior of returning a list of generated culprit IDs based on provided commit information.AddIssueId): Simulates the linking of a culprit to an external issue tracker ID. This is crucial for testing the integration between Perf's internal culprit tracking and external bug reporting systems.Get, GetAnomalyGroupIdsForIssueId): Facilitates testing of UI endpoints or reporting tools by returning pre-defined v1.Culprit protobuf messages or mapping issue IDs back to internal anomaly groups.When utilizing this module, a test typically follows the “Setup-Expect-Verify” pattern:
Test Component Mock Store Internal Logic
| | |
|-- 1. Setup Mock ----->| |
| | |
|-- 2. Set Expectations | |
| (On "Get" return X) | |
| | |
|-- 3. Call Method ---->|---------------------->|
| | |
| |<-- 4. Call "Get" -----|
| | |
| |--- 5. Return X ------>|
| | |
|-- 6. Verify Mocks ----| |
The NewStore function simplifies this by automatically registering a cleanup function on the provided testing.T instance, ensuring that AssertExpectations is called when the test completes.
The go.skia.org/infra/perf/go/culprit/notify module is responsible for orchestrating the notification process when performance regressions (anomalies) or their root causes (culprits) are identified. It acts as a bridge between the internal detection logic and external communication platforms, such as issue trackers.
The module abstracts the “how” of notification by separating the content generation (formatting) from the delivery mechanism (transport).
The module follows a “Strategy” pattern to handle different notification environments and requirements.
CulpritNotifier interface. This allows the system to switch between real notifications and no-op (no-operation) modes easily, which is essential for local development or testing environments where sending real bugs is undesirable.GetDefaultNotifier function acts as a factory that inspects the InstanceConfig. It determines whether to instantiate a functional IssueNotify system or a NoneNotify (noop) system based on the deployment configuration.The primary contract for the module. It defines two main entry points:
NotifyAnomaliesFound: Triggered when a group of regressions is first detected.NotifyCulpritFound: Triggered when an automated analysis has narrowed down a specific commit as the cause of a regression.This is the standard implementation of the CulpritNotifier. It does not contain formatting or transport logic itself; instead, it coordinates the two. It fetches the content from the formatter, passes it to the transport, and returns the resulting identifier (e.g., a Bug ID).
This file handles the lifecycle of a notification. It ensures that if a Subscription (the configuration defining who should be alerted) is missing, the system fails gracefully or logs the omission rather than attempting to send a malformed alert.
The following diagram shows how the DefaultCulpritNotifier coordinates the flow of information from a detected event to an external system:
[ Caller ] [ DefaultCulpritNotifier ] [ Formatter ] [ Transport ]
| | | |
|-- NotifyCulpritFound() ---->| | |
| |-- GetSubjectAndBody() ->| |
| |<-- (subject, body) -----| |
| | | |
| |-- SendNewNotification() ------------------->|
| | | |
| |<--------- (bug_id) -------------------------|
|<----- (bug_id, err) --------| | |
The module includes a mocks sub-package (generated via mockery). This is used by other parts of the Perf system to simulate the notification layer. By using these mocks, developers can verify that the detection pipeline correctly triggers notifications with the right metadata without actually creating tickets in an issue tracker.
The go.skia.org/infra/perf/go/culprit/notify/mocks module provides automated mock implementations for the culprit notification system within Perf. Its primary purpose is to facilitate unit testing for components that depend on the notification logic—such as anomaly detection pipelines or culprit analysis engines—without triggering actual external notifications (e.g., creating real bug reports or sending emails).
The module is built using testify/mock, which was chosen to provide a consistent, type-safe way to assert that notification events occur with the expected parameters.
A key design choice in this module is the use of mockery for code generation. By generating the CulpritNotifier mock automatically from an interface definition (presumably located in the parent notify package), the project ensures that the test infrastructure stays in lockstep with the production API. This prevents “stale” tests where a mock might satisfy an old version of an interface that has since changed.
The implementation focuses on two distinct stages of the Perf alerting lifecycle:
This file contains the CulpritNotifier struct, which implements the interface required to simulate the notification subsystem. It manages the lifecycle of notifications through two primary mocked methods:
NotifyAnomaliesFound: This method simulates the process of alerting users about a new AnomalyGroup. It accepts the group details, the associated Subscription (which contains routing/alerting metadata), and a list of specific Anomaly objects. In a test environment, this allows developers to verify that the system correctly identifies which subscription should be notified when a set of regressions is detected.NotifyCulpritFound: This method simulates the final stage of an investigation where a specific Culprit (a commit) has been identified. It validates that the notification logic correctly associates a culprit with the right subscription and returns a simulated notification ID (like a bug URL).The file also includes a constructor, NewCulpritNotifier, which leverages Go's Cleanup interface. This is a critical design pattern here as it automatically registers AssertExpectations to run at the end of a test, ensuring that no expected notification calls were missed without requiring the developer to manually call assertion methods.
The following diagram illustrates how this mock integrates into a typical test suite workflow to validate the notification logic:
[ Test Case ] [ Component Under Test ] [ Mock CulpritNotifier ]
| | |
|-- Register Expectation -> |
| (NotifyCulpritFound) | |
| | |
|---- Execute Action ---->| |
| |---- Call NotifyCulprit() -->|
| | |-- Record Call
| |<------- Return Mock ID -----|
| | |
| <--- Verify Results ----| |
| | |
| (Test Cleanup) | |
|------------------------------------------------------>|-- AssertExpectations()
| (Fails if not called)
The go/culprit/proto module defines the communication interface and data schema for the Culprit Service. This service acts as the central authority for managing “culprits”—commits definitively identified as the cause of performance regressions—within the Skia Perf ecosystem.
By providing a unified gRPC interface, this module bridges the gap between the bisection engine (which discovers culprits), the storage layer (which persists them), and the notification systems (which alert developers).
The Anomaly data structure in this module is a local definition rather than a reference to external proto files. While this creates some duplication with services like anomalygroup, it is a deliberate architectural choice to ensure service independence. If the anomaly grouping logic changes its internal representation, the Culprit Service remains stable, preventing breaking changes from cascading through the microservice architecture.
A key design feature of the Culprit message is the group_issue_map. In large-scale performance monitoring, a single problematic commit (a culprit) often triggers multiple regressions across different platforms or benchmarks, which might be tracked in different anomaly groups. This mapping allows the service to:
The Commit message is designed to be repository-agnostic. By explicitly requiring host, project, and ref alongside the revision, the service can handle culprits across the diverse set of repositories monitored by Skia (e.g., Chrome, Skia, V8, Angle). This allows a single instance of the service to manage regressions originating from different source control providers.
culprit_service.proto)The CulpritService defines the lifecycle management of a regression:
PersistCulprit method transforms the results of a bisection (a commit) into a permanent record linked to an anomaly group.NotifyUserOfAnomaly is used for initial “regression found” alerts, while NotifyUserOfCulprit is used for “culprit found” alerts, allowing the system to provide immediate feedback followed by precise root-cause analysis.Anomaly: Captures the state of the world at the time of regression. It stores the “before” and “after” medians, which are critical for calculating the magnitude of the impact, and the dimensions (test name, bot name) to identify the specific environment affected.Culprit: Represents a validated performance regression. It serves as an audit log, containing the commit metadata and a history of the notifications sent to developers.The following diagram illustrates how the Culprit Service coordinates between the engine that finds bugs and the trackers that manage them:
Detection/Bisection Engine Culprit Service Database / External API | | | |--- PersistCulprit ---->| | | (Commit + GroupID) |---- Store Culprit ----->| | | | |-- NotifyUserOfCulprit -| | | (CulpritID) |---- Fetch Metadata ---->| | | | | |---- Create/Update Bug ->| | |<--- Return Issue ID ----| | | | |<------- Success -------|---- Update Map/Link --->|
culprit_service.proto: The primary definition file. It defines the gRPC service and all message types used for requests and responses.culprit_service.pb.go & culprit_service_grpc.pb.go: The compiled Go code. These files provide the concrete types and client/server boilerplate used by other Go services in the repository to interact with the Culprit Service.generate.go: The automation hook that ensures the generated Go code stays in sync with the proto definitions.This module defines the gRPC interface and data structures for the Culprit Service, a component of the Skia Perf ecosystem responsible for managing performance regression culprits and user notifications. It serves as the contract between the bisection engine (which identifies culprits) and the storage/notification layers.
The Culprit Service handles the lifecycle of a “culprit”— a specific commit identified as the cause of a performance regression. The module's primary responsibilities include:
The data structures for Anomaly are intentionally duplicated from other services (like anomalygroup_service.proto). This redundancy allows the Culprit Service to evolve its definition of an anomaly independently of the grouping service, preventing tight coupling in a microservices environment where different teams might own different parts of the pipeline.
The Culprit message includes a group_issue_map. This design choice recognizes that a single commit (culprit) might cause regressions across multiple different test suites or “anomaly groups.” By mapping anomaly_group_id to issue_id, the service can track which bugs were filed for which specific performance regressions associated with the same culprit.
The Commit message provides a normalized way to identify changes across different repositories. By including host, project, ref, and revision, it ensures that the service can uniquely identify commits even when Skia Perf monitors multiple disparate Git repositories (e.g., Chromium, V8, Skia).
The gRPC service definition (culprit_service.proto) defines the following core operations:
PersistCulprit: Called after a bisection process identifies a culprit. It links a list of commits to an anomaly_group_id.GetCulprit: Used by the UI or other services to fetch detailed metadata about identified culprits.NotifyUserOfAnomaly: Triggered when a regression is first detected. This typically results in the creation of a tracking issue.NotifyUserOfCulprit: Triggered when bisection finishes. It updates existing issues or creates new ones to alert developers that a specific commit they authored caused a regression.Anomaly: Contains the statistical context of a regression, including “before” and “after” medians and the specific dimensions (bot, benchmark, measurement) where the regression occurred.Culprit: The record of a confirmed regression-causing commit, maintaining links to the anomaly groups it affected and the issues filed in response.The following diagram illustrates how the Culprit Service interacts with the bisection and notification flow:
Bisection Engine Culprit Service Storage / Issue Tracker | | | |-- PersistCulprit ---->| | | (Commits + GroupID) |---- Save to DB ----------->| | | | |-- NotifyUser ---------| | (Culprit IDs) |---- Create/Update Issue -->| |<--- Return Issue ID -------| |<-- Return Success ----| |
culprit_service.proto: The source of truth for the service interface and message definitions.culprit_service.pb.go: Generated Go structures for messages.culprit_service_grpc.pb.go: Generated gRPC client and server interfaces.generate.go: Contains the go:generate directives used to rebuild the protobuf and gRPC code via Bazel.The go.skia.org/infra/perf/go/culprit/proto/v1/mocks module provides mock implementations of the Culprit Service gRPC server. Its primary purpose is to facilitate unit testing for components that depend on the CulpritService. By using these mocks, developers can simulate various service behaviors—such as successful culprit persistence or notification failures—without requiring a running gRPC backend or database.
The module relies on the testify/mock framework to provide a flexible, programmable interface for defining expected behaviors during tests.
A critical implementation detail in this module is the manual handling of gRPC interface requirements. In standard Go gRPC implementations, a server must embed an Unimplemented... struct to ensure forward compatibility with the interface. Since many auto-generation tools (like mockery) may fail to include this embedding, it has been manually added to the CulpritServiceServer struct. This ensures the mock remains a valid implementation of the v1.CulpritServiceServer interface defined in the parent proto package.
This is the central mock type. It mimics the behavior of the CulpritService by allowing tests to “stub” responses for specific RPC calls. It covers the following key service responsibilities:
GetCulprit and PersistCulprit allow tests to simulate the retrieval and storage of performance regression culprits.NotifyUserOfAnomaly and NotifyUserOfCulprit enable verification of the notification logic, ensuring that the system correctly attempts to alert users when regressions or specific culprits are identified.The module provides a NewCulpritServiceServer constructor. This function is designed to integrate tightly with Go's testing.T. It automatically registers a cleanup function that calls AssertExpectations, which ensures that all programmed mock behaviors (e.g., “expect this function to be called exactly once”) were actually executed before the test finishes.
When testing a component that interacts with the Culprit Service, the workflow generally follows these steps:
1. Setup Mock : Create mock instance using NewCulpritServiceServer(t).
2. Set Expectations: Define what inputs are expected and what should be returned.
(e.g., On("PersistCulprit", ...).Return(&v1.PersistCulpritResponse{}, nil))
3. Injection : Pass the mock into the component being tested.
4. Execution : Run the logic of the component under test.
5. Verification : The Cleanup function automatically verifies that the
component called PersistCulprit as expected.
CulpritServiceServer.go: Contains the mock struct and method definitions for the gRPC service. This is where the manual embedding of v1.UnimplementedCulpritServiceServer resides to satisfy gRPC interface constraints.The culprit/service module provides a gRPC implementation for managing culprits and automating the notification process when performance regressions (anomalies) are identified in the Perf system. It acts as the orchestration layer between the storage of anomaly data and the external notification systems (e.g., bug trackers).
The primary purpose of this service is to handle the lifecycle of a “culprit”—a specific commit or set of commits identified as the cause of a performance change. It bridges several domains:
The service is designed to be used by backend components that perform bisection and need to report their findings, or by systems that detect anomalies and require immediate user notification.
The service coordinates with culprit.Store and anomalygroup.Store to ensure that when a culprit is identified, the relationship between the problematic commit and the group of affected traces is maintained.
PersistCulprit: This workflow ensures atomicity at the application level. It first saves the culprit commits to the culpritStore and then updates the corresponding AnomalyGroup to include these new Culprit IDs. This bidirectional link is essential for tracking which regressions were caused by which commits.GetCulprit: Provides a standard interface to fetch culprit metadata by ID.The service handles two types of notifications via the notify.CulpritNotifier interface. Both workflows rely on “Subscriptions” to determine where and how to file reports (e.g., which bug component, labels, or CC list to use).
NotifyUserOfCulprit): Triggered typically after a successful bisection. It loads the culprit details, identifies the relevant subscription associated with the anomaly group, and files a bug specifically for that culprit. It also records the resulting Issue ID back into the culprit record.NotifyUserOfAnomaly): Triggered when a group of anomalies is identified but a specific culprit may not yet be confirmed (or is being reported as a set). This files a broader report based on the anomaly group's characteristics.A unique aspect of this service is the PrepareSubscription logic. Because performance alerts can be noisy or sensitive, the service includes a safety mechanism to prevent accidental notifications to end-users during testing or when onboarding new “Sheriff” configurations.
InstanceConfig.SheriffConfigsToNotify list. If a subscription‘s name is not in this list, the service overwrites the bug destination (labels, components, CCs) with “mock” values. This ensures that even if a notification is triggered in a staging environment or for an unverified config, the bug is routed to a safe, internal hotlist rather than the actual team’s queue.When a bisection tool finds a culprit, the following process occurs:
Bisection Tool -> PersistCulprit(Commits, GroupID) | v [ Culprit Store ] <--- Save Commits | v [ AnomalyGroup Store ] <--- Link Culprit IDs to Group | +------> Response (Culprit IDs) Bisection Tool -> NotifyUserOfCulprit(CulpritIDs, GroupID) | +-----> Load Subscription (via Group Name) | +-----> PrepareSubscription (Safe-guarding/Mocking) | v [ Culprit Notifier ] ---> EXTERNAL: File Bug | v [ Culprit Store ] <--- Record Issue ID
backend.BackendService, this module easily integrates into the Skia Perf backend infrastructure, inheriting standard service registration and (eventually) centralized authorization policies.PrepareSubscription function is an intentional “shim” in the implementation. It allows the team to run the full service logic in production-like environments while ensuring that experimental anomaly groups do not spam developers until their configurations are explicitly added to the allowlist.The sqlculpritstore module provides a persistent SQL-based implementation for managing “Culprits” within the Skia Perf ecosystem. A Culprit represents a specific commit (defined by its host, project, ref, and revision) that has been identified as the root cause of one or more performance regressions.
The primary challenge in managing culprits is the N:M relationship between code changes, diagnostic clusters (Anomaly Groups), and tracking systems (Issue Trackers). A single commit can cause regressions in multiple tests, and a single bug report might track several related regressions.
To address this, the store is designed around the following principles:
AnomalyGroupIDs) and where it is being tracked (via IssueIds).GroupIssueMap, the store maintains a JSONB-encoded link between specific anomaly groups and their corresponding issue IDs. This allows the system to determine exactly which regression triggered a specific bug report without complex join operations.sqlculpritstore.go)The main struct implementing the storage interface. It handles the translation between Go protobuf messages (pb.Culprit) and the underlying SQL schema.
Upsert method is a critical path. It identifies if a culprit already exists based on its commit coordinates. If it exists, the method appends the new anomaly_group_id to the existing list and updates the last_modified timestamp. If not, it generates a new UUID and creates a record. This ensure that a single commit is never duplicated in the store, regardless of how many regressions it causes.AddIssueId method enforces data integrity by ensuring an issue can only be linked to a culprit if the associated group_id is already recognized as being caused by that culprit./schema)Defines the table structure and indexing strategy. A notable implementation choice is the by_revision composite index: INDEX by_revision (revision, host, project, ref)
By leading with the revision (a high-entropy hash), the database avoids “hotspots” and distributes data more evenly across partitions compared to leading with low-entropy strings like host.
When the system identifies a set of suspect commits for a regression:
Discovery Engine -> [Anomaly Group ID + Commits] | v CulpritStore.Upsert() | +-- Check if (Host/Project/Ref/Revision) exists? | | | +-- YES: Append Anomaly Group ID to array; Update LastModified | | | +-- NO: Generate UUID; Create new record v [ Database Updated ]
When a user or automated system files a bug for a specific regression:
Issue Tracker -> [Culprit ID + Issue ID + Anomaly Group ID] | v CulpritStore.AddIssueId() | +-- Verify: Is Anomaly Group ID linked to this Culprit? | | | +-- NO: Return Error (Prevents orphaned/incorrect links) | +-- Update: Append Issue ID; Update GroupIssueMap (JSONB) v [ Database Updated ]
last_modified field (Unix timestamp) to allow external caches or services to synchronize and identify updated culprit records efficiently.Upsert method performs a validation check to ensure that all commits in a single batch belong to the same repository (Host, Project, and Ref), preventing accidental cross-pollination of repository metadata.GroupIssueMap is stored as JSONB to provide flexibility for future metadata expansion while allowing the system to retrieve the full context of a culprit's impact in a single query.The schema package defines the foundational data structure for persisting “Culprits” within the Perf system's SQL storage. A Culprit represents a specific commit identified as the root cause of a performance regression.
In a performance monitoring ecosystem, a single commit might trigger multiple regressions across different subsystems or test suites. Conversely, multiple anomaly groups might eventually point to the same underlying code change.
To handle this N:M relationship, the schema is designed to treat the Culprit as a central entity that tracks its associations across various diagnostic contexts (Anomaly Groups) and tracking systems (Issue Trackers).
1. The Culprit Identity A Culprit is uniquely identified by its source control coordinates: Host, Project, Ref, and Revision. While the system generates a UUID for primary key lookups, the business logic primarily interacts with the commit hash.
2. Relational Mapping and the Group-Issue Link The schema manages the relationship between regressions and their resolutions through three specific fields:
AnomalyGroupIDs: Tracks which diagnostic clusters have flagged this commit.IssueIds: Tracks which bug reports are associated with this commit.GroupIssueMap: A JSONB field that explicitly maps a specific Anomaly Group to a specific Issue ID.The inclusion of GroupIssueMap as a JSONB object allows the system to maintain the context of why a bug was filed (i.e., which regression group triggered it) without requiring complex join tables for metadata that is frequently accessed together. Note: There is a planned refactoring to consolidate AnomalyGroupIDs and IssueIds into this map to reduce data redundancy.
3. Performance and Indexing Strategy The schema implements a composite index by_revision to optimize for the most common query pattern: “Is this specific commit already known as a culprit?”
The ordering of the index is a deliberate choice for database performance: INDEX by_revision (revision, host, project, ref)
By placing the revision (a high-entropy git hash) at the leading edge of the index, the storage engine can effectively distribute data across nodes and avoid “hotspots” that occur when sequential or low-entropy data (like a Host name) is used as the primary index prefix.
Commit Hash (Revision) | v [ Culprit Record ] <-----------+ | | +--[ Anomaly Group A ] --+--> [ Issue 123 ] | | +--[ Anomaly Group B ] --+--> [ Issue 456 ] | | +-- [ GroupIssueMap ] ---+ (Stores the explicit links)
The schema currently supports LastModified as a Unix timestamp to facilitate cache invalidation and synchronization workflows, ensuring that external services can efficiently poll for updates to culprit statuses.
The culprit/transport module provides a unified abstraction for dispatching notifications regarding identified culprits in the Skia Perf system. By decoupling the notification logic from the culprit detection engine, the system can support diverse communication channels—starting with automated issue tracking—while maintaining a consistent interface for the rest of the application.
The module is designed around the Transport interface, which abstracts the “where” and “how” of message delivery.
Subscription metadata (defined in subscription/proto/v1) to determine routing details like component IDs, priorities, and CC lists.Defined in transport.go, this interface contains a single method: SendNewNotification. It returns a threadingReference (typically a bug ID or message URL) which allows the calling system to track the notification or perform follow-up actions (like posting comments on an existing thread).
The primary production implementation of the Transport interface. It bridges Skia Perf with the Google Issue Tracker (Buganizer).
secret module to retrieve API keys and uses OAuth2 for authorized requests to the issuetracker service.BugComponent, BugPriority, Hotlists) into the specific data structures required by the Issue Tracker API.BugComponent, is present before attempting to create an issue, preventing orphaned or unroutable notifications.A “No-Operation” implementation found in noop.go. This is used in environments where notifications are undesirable (e.g., local development or dry-run modes). It satisfies the interface by returning a successful result without performing any network I/O or side effects.
The following diagram illustrates how the IssueTrackerTransport processes a notification request:
+----------------+ +------------------------+ +-------------------+
| Culprit | | IssueTrackerTransport | | Google Issue |
| Service | | | | Tracker API |
+-------+--------+ +-----------+------------+ +---------+---------+
| | |
| 1. SendNewNotification() | |
|--------------------------->| |
| (Subscription, Subj, Body) | 2. Map Proto to Issue |
| | (Priority, CCs, etc.) |
| | |
| | 3. POST /v1/issues |
| |----------------------------->|
| | |
| | 4. Return Issue ID |
| |<-----------------------------|
| | |
| 5. Increment Success Metric| |
| 6. Return Issue ID String | |
|<---------------------------| |
| | |
IssueTrackerTransport maintains two counters: perf_issue_tracker_sent_new_culprit and perf_issue_tracker_sent_new_culprit_fail. These are essential for monitoring the health of the alerting pipeline.FormattingMode: "MARKDOWN", allowing the culprit detector to send rich text, links, and tables to the issue tracker for better readability.The culprit/transport/mocks module provides a programmatic double for the Transport interface used within the Skia Perf culprit detection system. Its primary purpose is to facilitate unit testing of components that handle culprit notifications—such as anomaly detection engines or alert managers—without triggering actual external side effects like sending emails or filing issue tracker tickets.
The module relies on the stretchr/testify/mock framework. This choice allows developers to write declarative tests that specify exactly how the notification system should be invoked. By using a mock rather than a fake or a manual stub, the system ensures that:
subject and body contain the expected metadata (e.g., commit hashes, regression magnitudes) before they are sent to a real user.The Transport struct is an autogenerated mock implementation. It mirrors the methods required to dispatch culprit information to various communication channels.
Subscription proto). In this mock implementation, it captures the context.Context, the Subscription configuration, and the message content. It returns a mockable string (typically representing a message ID or URL) and an error.The mock is designed to be integrated into Go tests via the NewTransport constructor, which automatically handles test cleanup and expectation assertions.
+------------------+ +----------------------+ +------------------+
| Unit Test | | Component Under | | Mocks Transport |
| (Logic/Policy) | | Test | | (This Module) |
+---------+--------+ +----------+-----------+ +---------+--------+
| | |
| 1. Setup Expectations | |
|------------------------------>| |
| (Expect SendNewNotification) | |
| | |
| 2. Execute Action | |
|------------------------------>| 3. Trigger Notification |
| |---------------------------->|
| | | 4. Record Call
| | | 5. Return Mock
| | <---------------------------| Values
| | |
| 6. Assertions (Auto-Cleanup) | |
| <-----------------------------| |
The implementation of SendNewNotification uses type assertion logic to provide flexible return values. It can return static values configured via .Return() or dynamic values generated by a function passed to .Run(). This is particularly useful when the “Message ID” returned by the transport needs to be used in subsequent logic within the test case.
The dataframe module provides a structured, table-like representation of performance measurement data, specifically optimized for the Skia Perf ecosystem. A DataFrame combines a set of time-series traces (TraceSet) with their corresponding commit metadata (ColumnHeader) and a calculated set of searchable attributes (ParamSet).
In the context of Perf, a “Trace” is a series of measurements associated with a unique key (a set of key-value pairs). The DataFrame organizes these traces so they can be visualized or analyzed over a common timeline of git commits.
Unlike a simple collection of data points, a DataFrame represents a cohesive “slice” of performance history. The design choice to include Header, TraceSet, and ParamSet in a single object is driven by the need for self-contained data:
[]float32) while still being linked to rich git history.TraceSet. This is maintained within the object to allow the UI to quickly provide filtering options based only on the data currently loaded.The module treats columns as discrete points in time (commits). Operations like MergeColumnHeaders and Join are implemented to handle the “sparse” nature of performance data, where different traces might have data for different sets of commits.
Trace Key A: [ 1.2, nil, 1.4 ] (Commits 1, 2, 3) Trace Key B: [ nil, 2.2, 2.4 ] (Commits 1, 2, 3) | | | Header[0] Header[1] Header[2]
Compress() method identifies and removes columns that contain no data across all traces. This is vital for reducing the payload size when sending data to a frontend, especially after a query returns a range where many commits might not have produced results for the requested traces.Slice() method enables efficient pagination or windowing of data by creating sub-frames.vec32.MissingDataSentinel to represent gaps in data, ensuring that trace arrays remain a fixed length relative to the Header while explicitly marking missing measurements.The DataFrameBuilder defines how data frames are constructed from underlying storage. It abstracts the complexity of querying the database and joining it with git metadata.
NewFromQueryAndRange handles fetching data matching specific attributes (e.g., “arch=x86”) over a time window.NewFromKeysAndRange is used when specific trace IDs are already known.NewNFromQuery are designed for “overview” or “sparkline” views, where the user wants exactly $N$ points of history leading up to a specific time.The Join and MergeColumnHeaders functions are the core of the module's data-alignment logic. They perform an “outer join” on the commit offsets.
The BuildParamSet() method is responsible for reflecting the current state of the data. If traces are filtered out (via FilterOut), the ParamSet must be rebuilt so the UI doesn't display filtering options for data that is no longer present in the frame.
When joining two DataFrames (A and B) that represent different time ranges or different sets of traces:
DataFrame A Headers: [C1, C2, C4] DataFrame B Headers: [C3, C4, C5] 1. Merge Headers -> [C1, C2, C3, C4, C5] 2. Map A indices -> 0->0, 1->1, 2->3 3. Map B indices -> 0->2, 1->3, 2->4 4. Resulting Trace for Key X: [ValA(C1), ValA(C2), ValB(C3), ValA/B(C4), ValB(C5)]
dataframe.go: Defines the primary DataFrame and ColumnHeader structs and implements the logic for merging, joining, and filtering data.dataframe_test.go: Contains logic for validating the complex index-mapping required during joins and ensuring that ParamSet calculations correctly reflect the trace data.mocks/: Provides a mock implementation of the DataFrameBuilder for testing higher-level components (like the Perf API handlers) without requiring a database.The /go/dataframe/mocks module provides auto-generated mock implementations for the DataFrameBuilder interface. These mocks are primarily used to facilitate unit testing in the Perf system by simulating complex data retrieval and frame construction processes without requiring a live database or the actual heavy-duty dataframe implementation.
The core of this module is the DataFrameBuilder mock, which is generated using mockery. The decision to provide these mocks in a dedicated sub-package allows other modules within the Skia infrastructure to write deterministic tests for components that depend on data loading, such as UI handlers, alert systems, or analysis pipelines.
By using these mocks, developers can:
query.Query objects or time ranges to the builder.This file contains the DataFrameBuilder struct, which embeds mock.Mock from the testify framework. It implements the dataframe.DataFrameBuilder interface, covering several data retrieval patterns:
NewFromQueryAndRange and NewNFromQuery allow tests to simulate fetching data based on structured queries.NewFromKeysAndRange and NewNFromKeys simulate fetching specific traces when the exact keys are already known.NumMatches and PreflightQuery allow testing of the “dry run” or “count” functionality often used in the Perf UI to tell a user how many traces a query will return before they execute it.The mock is designed to be integrated into Go tests using the testify pattern.
NewDataFrameBuilder(t). This automatically registers cleanup functions to assert that all expected calls were actually made.dataframe.DataFrameBuilder interface).testify framework handles the verification of calls during the test's cleanup phase.+-------------------+ +-----------------------+ +-------------------------+ | Unit Test | ----> | MockDataFrameBuilder | ----> | Component Under Test | +---------+---------+ +-----------+-----------+ +------------+------------+ | | | | 1. Setup Expectations | | |---------------------------->| | | | | | 2. Execute Action | | |--------------------------------------------------------->| | | | | | 3. Call Interface Method | | |<---------------------------| | | | | | 4. Return Mock Data | | |--------------------------->| | | | | 5. Assertions/Cleanup | | |<----------------------------| |
The implementation uses mockery's standard template, providing flexible return value handling. For every method, it checks if a functional return has been provided (allowing for dynamic logic in mocks) or if a static value was registered via .Return().
Special attention is given to the progress.Progress interface, which is passed to most builder methods. The mock allows testers to verify that progress is being tracked or to ignore it using mock.Anything.
The dfbuilder module is responsible for constructing DataFrame objects by querying and aggregating performance trace data from a TraceStore. It acts as the orchestration layer that translates high-level user requests (time ranges, queries, specific keys) into efficient, often parallelized, database operations.
A DataFrame in the Perf system is a matrix of performance data where columns represent commits (ordered by time) and rows represent individual traces (identified by structured keys). The dfbuilder handles the complexity of:
TraceStore's tiled architecture to fetch data efficiently.The TraceStore stores data in fixed-size “tiles” (e.g., 256 commits per tile). When a user requests a large time range, the dfbuilder calculates which tiles are involved and launches parallel goroutines to query each tile simultaneously. This avoids serial bottlenecks and utilizes the horizontal scalability of the underlying database.
Many UI views request the “N most recent points.” Because performance data can be sparse (not every trace has data for every commit), the dfbuilder implements a backward-searching algorithm. It starts from the most recent commit and steps backward through tiles until it has collected exactly $N$ data points for the requested traces, or until it hits a configurable maxEmptyTiles limit.
To provide a responsive “Query” UI, the dfbuilder performs “pre-flight” queries. Instead of fetching all raw data, it:
ParamSet of valid options for the next dropdown based on current selections.When a user requests data for a specific time range and a trace query:
User Request (Range, Query) | v [Git Service] <---- Resolve time range to Commit Numbers/Headers | v [DFBuilder] <---- Calculate required Tile Numbers | +------+------+ (Parallel Tile Queries) | | [Tile N] [Tile N-1] | | +------+------+ | v [TraceSetBuilder] <--- Merge results into a matrix | v [DataFrame] (Compressed & Filtered)
The module includes logic to filter out “parent” traces. In Perf, traces often have a hierarchical structure (e.g., benchmark, bot, test, subtest). If a specific subtest trace exists, the higher-level “parent” trace (which might be an average or aggregate) is often redundant in the same view. The filterParentTraces function uses a TraceFilter to prune the TraceSet to only include the most specific (leaf) nodes.
builderThe primary implementation of the dataframe.DataFrameBuilder interface. It maintains references to:
perfgit.Git: For commit metadata.tracestore.TraceStore: For raw data access.tracecache.TraceCache: An optional caching layer to speed up trace ID lookups.preflightProcessRecentTilesHandles the logic for scanning the most recent data tiles to populate the query UI. It uses an errgroup to query multiple tiles in parallel, ensuring that the “count” and “available parameters” are calculated quickly even if the latest tile is partially empty.
fromIndexRangeA utility that bridges the gap between Git and the DataFrame. It converts a range of commit numbers into ColumnHeader objects containing the Git hash, author, and timestamp required for the DataFrame header.
singleTileQueryTimeout (default 1 minute). This prevents a single “poison” tile or a massive ingestion spike from locking up the server during regression detection.vec32.MissingDataSentinel to represent gaps in traces where no data was recorded for a specific commit, ensuring the matrix alignment remains consistent across all traces.SourceInfo, which points back to the original files (e.g., Google Cloud Storage paths) from which the data was ingested, allowing for “drill-down” features in the UI.The dfiter module provides a high-level abstraction for iterating over performance data stored in DataFrames. Its primary responsibility is to transform raw, potentially sparse data retrieved from a TraceStore into a series of smaller, dense “windows” suitable for regression detection algorithms.
In the Skia Perf ecosystem, regression detection involves analyzing traces over time to find “steps” or shifts in performance. Because these algorithms typically operate on a fixed-sized window of points (defined by an Alert.Radius), the dfiter module acts as a bridge. It manages the complexity of querying the underlying data builders and then “slices” that data into the specific shapes required by the detection logic.
dfiter.go)The module centers around the DataFrameIterator interface. This design allows the regression detection engine to remain agnostic of whether it is processing a single specific commit or scanning a wide range of history.
Domain.Offset is provided, the iterator returns a single DataFrame centered on that specific commit.Offset is zero, the iterator behaves as a sliding window over a larger dataset.dfIterProvider.go)Dataframe generation is an expensive operation involving database lookups and data processing. To optimize this, the DfProvider implements:
golang.org/x/sync/singleflight to prevent “thundering herd” problems. If multiple concurrent requests ask for the same DataFrame (e.g., several different regression tasks triggered by the same alert), only one builder execution occurs; the result is then shared among all callers.traceSlicer.go)The module handles data differently depending on the regression algorithm being used. The choice of slicer is controlled by the DfIterTraceSlicer experiment flag and the Alert.Algo type.
K-Means Slicing (kmeansDataframeSlicer): This is the legacy approach. It treats the entire DataFrame as a unit, creating a sliding window across all traces simultaneously.
Original DataFrame: [C1, C2, C3, C4, C5] (Radius=1, WindowSize=3) Iter 1: [C1, C2, C3] Iter 2: [C2, C3, C4] Iter 3: [C3, C4, C5]
StepFit Slicing (stepFitDfTraceSlicer): This strategy is optimized for individual trace analysis. It iterates through the DataFrame trace-by-trace rather than column-by-column.
MissingDataSentinel values. If a trace is sparse, the slicer collapses it into a dense array of valid points before applying the window. This ensures the regression algorithm always sees a full set of real data points, even if they were originally non-contiguous in time.When NewDataFrameIterator is called, the following logic determines the data source:
User/System Request
|
v
Check Domain.Offset?
|
+--- [!= 0] ---> Find specific Commit Time -> Fetch exactly 2*Radius+1 points
| |
+--- [== 0] ---> Check DfProvider Cache <------+
| |
+--- [Hit] ---> Return Cached DF
| |
+--- [Miss] ---> Call DataFrameBuilder -> Cache Result
|
v
Select Slicer Implementation
(K-Means vs. StepFit)
|
v
Return DataFrameIterator
The stepFitDfTraceSlicer provides a more granular iteration than traditional time-based slicing:
Trace A: [1.1, nan, 1.2, 1.3, nan, 1.4] Trace B: [5.0, 5.1, 5.2, 5.3, 5.4, 5.5] StepFit Filtering (Radius 1): 1. Collapse Trace A -> [1.1, 1.2, 1.3, 1.4] 2. Slice A (Win 1) -> [1.1, 1.2, 1.3] 3. Slice A (Win 2) -> [1.2, 1.3, 1.4] 4. Move to Trace B... 5. Slice B (Win 1) -> [5.0, 5.1, 5.2] ...and so on.
dfiter.go: Entry point for creating iterators; handles “settling time” logic and metadata metrics.dfIterProvider.go: Implements the caching layer and concurrency controls.traceSlicer.go: Contains the logic for the different sliding window implementations.traceSlicer_test.go: Provides extensive examples of how missing data is handled during slicing.The dryrun module provides the functionality to test Perf Alerts against historical data. This allows users to validate and tune alert configurations by observing which regressions would have been detected had the alert been active over a specific range of commits.
The primary purpose of a dry run is to simulate the regression detection pipeline without triggering side effects like filing bug reports or sending notifications.
The module is designed around an asynchronous execution model. Because scanning historical data for regressions can be a long-running process, the module uses a “tracker” pattern. When a dry run is initiated, it returns a progress handle immediately, while the actual computation continues in a background goroutine.
Requests struct: The central coordinator that manages dependencies required for regression detection, including access to Git data, trace shortcuts, and data frame builders.StartHandler: The entry point for HTTP requests. It decodes a RegressionDetectionRequest, validates the alert configuration, and kicks off the background processing.detectorResponseProcessor: A specialized callback function defined within the start handler. It acts as the glue between the raw clustering results and the user-facing progress updates. Its responsibilities include:Regression objects.progress.Tracker so the UI can display real-time results.A single dry run may execute multiple queries (e.g., if the alert uses a “group by” clause). This can result in multiple detections for the same commit across different sub-queries. The implementation uses a map (foundRegressions) keyed by CommitNumber to aggregate these results. As the detector finds new clusters, it merges them into the existing regression record for that commit, ensuring the user sees a consolidated view of all issues found at a specific point in time.
The following diagram illustrates the lifecycle of a dry run request:
User Request (POST) | v [StartHandler] ------------------------+ | | | 1. Validate Alert Config | 2. Return Progress ID | 3. Add to Progress Tracker | to Frontend immediately | 4. Launch Goroutine | | +-----> [HTTP Response (JSON)] v [Background Goroutine] | | calls regression.ProcessRegressions(...) | +-----> [detectorResponseProcessor] (Callback) | | A. Convert cluster results to Regression objects | B. Lookup Git commit details | C. Merge results for same commits | D. Update Tracker with current findings | v [Progress Tracker] <------- (Frontend polls this)
context.Background() for the background goroutine instead of the request context (r.Context()). This prevents the dry run from being cancelled when the user's initial HTTP request terminates.RegressionAtCommit struct is used to package the raw regression data with the provider.Commit metadata. This ensures the frontend has all the information necessary to display a human-readable list of results without performing additional lookups.req.Progress object. This allows the system to report failures (like invalid queries or database timeouts) back to the user through the progress polling mechanism.The go/e2e module provides a specialized test runner and infrastructure for executing end-to-end (E2E) tests within the Skia infrastructure. It serves as a bridge between the high-level Task Driver system and specific test suites (such as Node.js-based Puppeteer tests). The primary goal of this module is to automate the lifecycle of E2E testing: checking out the source code, executing tests via Bazel, capturing results in a standardized xUnit XML format, and persisting those results to Google Cloud Storage (GCS).
The module is designed as a Task Driver, leveraging the task_driver library to ensure that E2E tests can be executed reliably on Swarming bots with full observability and step-by-step logging.
YYYY-MM-DD/HH-MM-SS) and an iterative collision check to ensure every test run has a dedicated, non-overlapping location in the storage bucket.--local and bot-based execution. When running on a bot, it automatically handles complex environment setup, including Gerrit authentication and Git cookie management, which are abstracted away from the actual test logic.--config=mayberemote and --nocache_test_results, it ensures that E2E tests (which are often sensitive to environment state) are executed fresh when requested, while still benefiting from RBE (Remote Build Execution) when available.test_runner.go)This is the core entry point. Its responsibilities include:
checkout and git_steps.TestSuites and TestSuite XML structures.BUILD.bazel)The build configuration defines the e2e-test-runner binary, which bundles the logic required to interact with Google Cloud API, Gerrit, and the internal Task Driver framework. It marks the runner as a public binary, allowing it to be triggered by various CI tasks across the repository.
The following diagram illustrates how the runner coordinates the lifecycle of an E2E test run:
[ Task Driver Start ]
|
v
[ Setup Environment ] ----> [ Initialize Git/Gerrit Auth ]
| [ Create Temp Workdir ]
v
[ Perform Checkout ] -----> [ Ensure Git Clone at Target Revision ]
|
v
[ Execute Bazel ] --------> [ Run Node.js E2E Test Target ]
| [ Capture Stdout and Exit Codes ]
v
[ Process Results ] ------> [ Regex Parse Failures ]
| [ Generate xUnit XML ]
v
[ Archive Artifacts ] ----> [ Generate Unique GCS Path ]
| [ Upload test_result.xml ]
v
[ Task Driver End ]
test_runner.go: Contains the main logic for the task driver, including the GCS upload logic, the Bazel execution wrapper, and the XML result generation.BUILD.bazel: Defines the Go library and binary targets, specifying the dependencies on the cloud storage, Gerrit, and Task Driver libraries.The /go/e2e/tests module provides a framework and suite for end-to-end (E2E) testing of web-based applications. Unlike unit or integration tests that verify isolated logic or API contracts, this module validates the entire system stack by simulating real user interactions within a headless browser. Its primary goal is to ensure that critical user journeys—from page load to UI state changes—function correctly in a production-like environment.
The testing strategy is built around programmatic browser control and behavior-driven assertions. The following design choices guide the implementation:
CHROME_BIN environment variable. This decouples the test logic from the specific installation path of the browser.before hooks) to save overhead, individual test cases (it blocks) utilize fresh browser pages/tabs. This prevents state leakage between tests while avoiding the heavy cost of restarting the browser executable for every assertion.--no-sandbox and --disable-dev-shm-usage. These are intentional design choices to ensure compatibility with containerized execution environments where shared memory may be limited or namespace sandboxing is restricted.The module relies on the Chai assertion library to provide a descriptive and readable syntax for validating application state. The responsibility of a test file is to define a specific user scenario, navigate to the target service, and verify outcomes such as page titles, element visibility, or data consistency.
Each test suite is responsible for managing its own lifecycle. This involves:
after hooks) to prevent memory leaks and orphaned processes in the testing infrastructure.The following diagram illustrates the lifecycle of an E2E test within this module:
[ Test Suite Start ]
|
v
[ Launch Browser ] <--- Uses CHROME_BIN and container-optimized flags
|
+---- [ Setup Page ] <--- Create a new isolated tab/page
| |
| v
| [ Navigate ] <--- page.goto(baseUrl)
| |
| v
| [ Verify ] <--- Assert DOM state or page metadata
| |
| v
| [ Close Page ]
|
v
[ Close Browser ] <--- Teardown process to free resources
|
v
[ Test Suite End ]
example_nodejs_test.ts: Serves as the reference implementation for new E2E tests. It demonstrates how to initialize the Puppeteer instance, manage the page lifecycle, and perform assertions using the Chai library.BUILD.bazel: Defines the test targets. It identifies the necessary dependencies, such as the Puppeteer driver and assertion libraries, ensuring the test runner has access to the required Node.js modules and browser binaries.The go/favorites module defines the core domain model and persistence interface for “Favorites” within the Perf system. A “Favorite” represents a saved configuration—specifically a name, description, and URL—that allows users to bookmark and revisit specific data visualizations or query states.
This module acts as a contract layer, decoupling the business logic of the Perf application from specific storage implementations (such as SQL databases or mock objects used in testing).
The module is designed around a strictly defined interface to ensure that the Perf frontend can manage user preferences consistently, regardless of the underlying infrastructure.
Store as an interface, the system supports pluggable backends. This allows for the sqlfavoritestore implementation in production and the mocks implementation for unit testing.Favorite struct encapsulates all metadata required to reconstruct a saved view. The inclusion of LastModified (as a Unix timestamp) enables the UI to sort or filter favorites by recency without requiring complex timezone handling at the database level.SaveRequest struct separate from the Favorite struct is a deliberate choice to distinguish between input data (what a user provides) and stored records (which include system-generated fields like ID and LastModified).Liveness method. While not strictly a “favorites” function, the Store was selected as a lightweight probe point to verify database connectivity for the entire application. This avoids adding overhead to more performance-critical stores while ensuring the system can monitor its health.store.goThis file defines the data structures and the behavioral contract for favorite management.
Favorite Struct: The primary data model. It includes the UserId (typically an email address) to enforce ownership and a Url which contains the encoded state of the Perf dashboard.Store Interface: Defines the lifecycle of a favorite:Create and Update use the SaveRequest pattern to ensure only mutable fields are passed from the frontend.Get fetches a single record by ID, while List provides a collection of all favorites owned by a specific user.Delete method requires both an id and a userId. This ensures that the storage layer enforces ownership, preventing a user from deleting another person's favorite by simply knowing its ID.The following diagram illustrates the lifecycle of a Favorite configuration from creation to retrieval:
User Action Data Structure Store Interface ----------- -------------- --------------- | | | [ Save Dashboard ] ----> [ SaveRequest ] | | (Name, URL, Desc, User) | | | | | +----------------------> [ Create() ] | | | (ID & Timestamp generated) | | v | v [ View My List ] <------- [ []*Favorite ] <------------ [ List(UserId) ] | (ID, Name, URL, Modified) | | | | | | | [ Delete Entry ] ------- (ID + UserId) ----------------> [ Delete() ]
The go/favorites/mocks module provides a programmatic double of the favorites.Store interface. Its primary purpose is to facilitate unit testing for components within the Perf system that depend on “favorites” functionality—such as saving, retrieving, or listing user-defined favorite configurations—without requiring a live database or a real implementation of the storage layer.
By using these mocks, developers can isolate the business logic of higher-level services, ensuring that tests are deterministic, fast, and do not rely on external infrastructure like Spanner or SQL.
The module utilizes automated mock generation via mockery. This choice ensures that the mock implementation remains perfectly synchronized with the favorites.Store interface definition found in go/favorites.
Update or Delete) were called with the expected arguments and the correct number of times.testifyThe implementation is built on the github.com/stretchr_testify/mock framework. This allows for a fluent API when setting up expectations:
Test Code Mock Store Component Under Test --------- ---------- -------------------- Setup Expectation -> [ On("Get").Return(...) ] Execute Test ----------------------------> Calls Store.Get() | [ Match Arguments ] <-----| [ Return Fake Data ] ----> Result processed by logic Assert Expectations <- [ AssertExpectations ]
This is the core file of the module, containing the Store struct. It implements every method required by the favorites.Store interface:
Create, Get, Update, and Delete methods are implemented to capture arguments and return values defined in the test suite.List method simulates retrieving all favorites associated with a specific user ID.Liveness method is mocked to allow tests to simulate the health status of the underlying storage engine.The NewStore function is the standard entry point for using this module in a test. It takes a testing.T (or compatible interface) which allows it to:
Cleanup phase to call AssertExpectations, ensuring that any configured expectations were actually met before the test finished.This module provides a SQL-based implementation of the favorites.Store interface, enabling the persistence and management of user “Favorites” (saved URLs/configurations) within the Perf application. It bridges the gap between the high-level favorites domain logic and the underlying relational database (typically CockroachDB or Spanner).
The module is built with a focus on performance for user-centric queries and reliability in a distributed database environment.
statements). This separation ensures that the Go logic for scanning rows remains clean while providing a single location to tune or update database queries.FavoriteStore struct is designed to be stateless, holding only a reference to the database connection pool (pool.Pool). This allows the store to be safely shared across multiple goroutines.Liveness check. This is a strategic inclusion for cloud-native deployments, allowing the application's frontend to verify its database connectivity independently of functional queries.LastModified logic at the application level using Unix timestamps. This ensures consistency in how time is recorded regardless of the database's internal time configuration.sqlfavoritestore.goThis is the core of the module, implementing the FavoriteStore and its associated methods. It handles the translation between Go structs (from the favorites package) and SQL parameters.
last_modified timestamp.ID and the UserId to perform a deletion. This is a security-in-depth choice, ensuring a user can only delete their own favorites even if an ID is guessed or leaked.id, name, url, description) for all favorites owned by a user, optimized for summary views.skerr to wrap standard database errors with contextual information (e.g., “Failed to load favorite”), making it easier to trace issues in logs.schema/ (Submodule)Defines the architectural “blueprint” for the Favorites table. It ensures that the database indexes and constraints (like NOT NULL on URLs) align with the requirements of the Go code. It manages the identity strategy (UUIDs) and the by_user_id index critical for performance.
The following diagram illustrates how the store interacts with the database to persist a user's favorite:
[ Caller ] | | 1. Create(ctx, SaveRequest{UserId, Name, Url...}) v [ FavoriteStore ] | | 2. Generate current Unix timestamp | 3. Execute 'insertFavorite' SQL | (UserId, Name, Url, Desc, Timestamp) v [ SQL Database ] | | 4. Generate UUID for 'id' | 5. Persist record & Update 'by_user_id' index v [ FavoriteStore ] | | 6. Return nil (Success) v [ Caller ] | | 7. List(ctx, UserId) | 8. Execute 'listFavorites' SQL | (Uses index for fast lookup) v [ SQL Database ] | | 9. Return matching rows v [ FavoriteStore ] | | 10. Scan rows into []*favorites.Favorite v [ Caller ]
This module defines the data architecture for persisting user “Favorites” within a SQL database. It serves as the single source of truth for the database schema, ensuring that the Go representation of a favorite aligns with the underlying storage constraints and indexing requirements.
The schema is designed to support a multi-user environment where favorites are frequently queried by ownership but accessed via unique identifiers for updates and deletions.
ID uses a UUID generated at the database level (gen_random_uuid()). This choice provides a globally unique identifier that prevents ID exhaustion and allows for future decentralization or data migration without key collisions.UserId), sourced from uber-proxy authentication. By using a string-based email as the primary key for user association rather than an internal integer ID, the system simplifies integration with the authentication layer and avoids an extra lookup table for user metadata.by_user_id is defined on the user_id column. This is a critical performance choice, as the primary access pattern for the application is expected to be “fetch all favorites for the current logged-in user.” Without this index, user-specific queries would require a full table scan.LastModified field uses a standard Unix timestamp (integer). This provides a lightweight, timezone-agnostic way to handle sorting by recency or implementing cache invalidation logic on the client side.schema.goThis file contains the FavoriteSchema struct, which uses specialized sql struct tags to define the DDL (Data Definition Language) properties of the table.
Url) from descriptive metadata (Name, Description). While the Url is mandatory (NOT NULL), the name and description are optional, allowing users to save links quickly without mandatory labeling.PRIMARY KEY and NOT NULL directly in the struct tags, the module ensures that the database enforcement layer matches the application's data requirements.The schema is structured to optimize the flow from authentication to data display:
[ Authentication Layer ] | | Provides User Email v [ sqlfavoritestore ] | | Query: SELECT * FROM Favorites WHERE user_id = {email} | (Uses 'by_user_id' Index for O(log n) lookup) v [ SQL Database ] | | Returns set of FavoriteSchema records v [ UI / Consumer ]
The file module defines the core abstraction for data ingestion within the Skia Perf system. It provides a common interface and data structure that decouples the ingestion logic (the “how” of processing data) from the storage and transport layers (the “where” of the data).
By standardizing how files are represented and discovered, this module allows the system to seamlessly switch between local development environments, automated testing, and production cloud-scale ingestion.
File StructThe file.File struct is the primary data transfer object. It encapsulates both the data and the metadata required for a single unit of ingestion.
Name: The identifier for the file, typically a path or URI.Contents: An io.ReadCloser that provides the raw data. This allows the ingestion pipeline to stream data rather than loading entire files into memory, which is critical when handling large performance traces.Created: A timestamp indicating when the file was originated.PubSubMsg: An optional reference to a pubsub.Message. This is included to allow downstream consumers to manually acknowledge or negatively acknowledge the message if the source is backed by a messaging service like Google Cloud Pub/Sub.Source InterfaceThe Source interface defines a unified mechanism for discovering files. It follows a “push-based” model through a Go channel:
type Source interface {
Start(ctx context.Context) (<-chan File, error)
}
This design choice allows the ingestion engine to remain reactive. Whether the files are being walked on a local disk or arriving via real-time cloud notifications, the consumer simply listens to the channel until it is closed.
io.ReadCloser instead of a byte slice for file contents ensures that the memory footprint of the ingestion process remains low even if individual files are large.Start method is designed to be called once per instance. This prevents race conditions and ensures a predictable lifecycle for the background workers (goroutines) that different implementations (like dirsource or gcssource) use to monitor their respective backends.PubSubMsg in the File struct is a design compromise that breaks total isolation in favor of reliability. It allows the ingestion pipeline to signal to the underlying transport layer exactly when a file has been successfully persisted or if it needs to be retried.The following diagram illustrates how the file module acts as the contract between various data providers and the main ingestion engine:
+-------------------+ +-------------------+ +-------------------+ | Local Directory | | GCS Bucket | | Other Providers | | (dirsource) | | (gcssource) | | (Future) | +---------+---------+ +---------+---------+ +---------+---------+ | | | | implements | implements | implements +--------------------------+--------------------------+ | +-------v--------+ | file.Source | +-------+--------+ | | Start(ctx) returns <-chan file.File v +----------------+ | Perf Ingestion | | Engine | +----------------+
While the file package defines the contract, its submodules provide the concrete implementations used across different environments:
dirsource: A filesystem-based implementation. It walks a local directory and streams the files found. It is primarily used for local development and unit tests.gcssource: A production-ready implementation that listens to Google Cloud Pub/Sub notifications to ingest files from GCS buckets in real-time.file.go: Defines the Source interface and the File struct. This file serves as the single point of truth for how data enters the Perf system, ensuring that adding a new storage backend only requires implementing the Source interface.The dirsource module provides a filesystem-based implementation of the file.Source interface. Its primary purpose is to abstract a local directory into a stream of data, allowing the Skia Perf system to ingest files directly from the disk.
This implementation is intentionally kept simple and is designed specifically for local development, demonstration modes, and unit testing. It allows developers to run the Perf ingestion pipeline against local files without requiring a cloud storage provider or a complex messaging infrastructure.
The implementation of DirSource prioritizes ease of setup over high-performance production features like real-time file watching or sophisticated metadata tracking.
inotify), the module performs a one-time walk of the directory tree when Start is called. This simplifies the state management of the source, making it predictable for tests.Start launches a background goroutine that pushes files into a buffered channel. This allows the ingestion pipeline to begin processing the first file while the source is still discovering the next.ModTime (Modified Time) to fill the Created field in the file.File struct.Start can only be called once. This prevents accidental duplicate processing of the same directory within the same lifecycle, ensuring data consistency in demo environments.Defined in dirsource.go, this is the core struct implementing the file.Source interface. It maintains the absolute path to the target directory and tracks whether the ingestion process has already been initiated.
The Start method performs the following actions:
channelSize = 10) to hold file.File objects.filepath.Walk.file.File containing the path, the open reader, and the modification timestamp.The following diagram illustrates how DirSource transforms a filesystem structure into a stream of data for the ingestion pipeline:
+--------------+ +-----------------------+ +-------------------+ | Ingestion | | DirSource | | Local Filesystem | | Engine | | (Background) | | (Disk) | +--------------+ +-----------------------+ +-------------------+ | | | | (1) Start(ctx) | | |----------------------------->| | | | (2) filepath.Walk(dir) | | (Returns Channel) |------------------------------>| |<-----------------------------| | | | (3) Open & Read Metadata | | |<------------------------------| | | | | (4) Receive file.File{} | | |<-----------------------------| | | | (5) Repeat for all files | | |------------------------------>| | | | | (6) Channel Closed | | |<-----------------------------| |
dirsource.go: Contains the logic for scanning the filesystem and mapping os.FileInfo to the common file.File structure used by Perf.dirsource_test.go: Validates that the directory walker correctly identifies files, handles multiple files in a single pass, and properly errors out if Start is invoked more than once.testdata/: A collection of static JSON files used during testing to ensure the source correctly reads file contents and handles path resolution.The testdata directory serves as a controlled environment containing static filesystem artifacts used to validate the behavior of the dirsource module. Rather than relying on dynamically generated files or mock objects, this directory provides concrete, predictable JSON structures that allow for end-to-end testing of directory scanning, file reading, and data parsing logic.
The primary design choice here is the use of representative static assets. By storing physical .json files on disk, the test suite can exercise the full I/O stack—ensuring that the module correctly handles file handles, directory pathing, and content deserialization in a way that matches real-world usage.
Key considerations for these test files include:
filea.json, fileb.json) follow a uniform structure ({"status": "..."}). This allows tests to verify that the dirsource implementation can iterate through multiple files and map them to a consistent internal data model or interface.The module is comprised of distinct JSON payloads that represent different data states:
filea.json: Acts as the primary data point for positive testing. It contains a standard “status” string used to verify that the scanner successfully opens a file and extracts its content accurately.fileb.json: Provides a secondary data point. This is used to ensure that the dirsource logic correctly handles collections of files, verifying that the ingestion process doesn't stop after the first successful read and that it maintains data integrity across multiple distinct sources.The typical interaction between the parent module and these files follows this process:
+-------------------+ +-----------------------+ +-------------------+ | dirsource logic | ----> | Read /testdata/ dir | ----> | Map JSON content | +---------+---------+ +-----------+-----------+ +---------+---------+ | | | | (1) Path discovery | (2) File I/O | (3) Validation v v v [ filea.json ] <---------------------+---------------------> [ fileb.json ] [ "A test..." ] [ "just another" ]
This workflow ensures that the system can navigate the filesystem and translate raw disk bytes into structured application data.
The gcssource module provides an implementation of the file.Source interface specifically for Google Cloud Storage (GCS). It is designed to enable real-time ingestion of performance data files as they are uploaded to GCS buckets.
The primary purpose of this module is to bridge GCS storage events with the Perf ingestion pipeline. It leverages GCS Pub/Sub notifications to detect new file arrivals, filters those files based on configuration, and streams the file contents for processing.
By using an event-driven approach rather than polling, the module ensures that the system reacts immediately to new data while minimizing unnecessary API calls to GCS.
The central struct GCSSource manages the lifecycle of the ingestion source. Its responsibilities include:
io.ReadCloser for each discovered file, allowing the consumer to read data directly from GCS.The module implements a multi-stage filtering process to determine if a file should be ingested:
filter.Filter (configured via AcceptIfNameMatches and RejectIfNameMatches) to include or exclude files based on regex-like patterns.SourceConfig.Sources list. This prevents the ingestion of files from unauthorized or unrelated directories within the same bucket.The module carefully manages Pub/Sub message acknowledgments (Ack/Nack) to ensure no data is lost:
The module defaults to a low number of parallel receives (controlled by maxParallelReceives). This is an intentional design choice to maintain a predictable load on the ingestion system and to simplify the management of GCS read streams.
The module disables automatic deadline extensions by the Pub/Sub library (MaxExtension = -1). This forces the system to either process a message or let it time out quickly. This prevents the “stuck ingestor” problem where a single problematic message holds up the entire queue because its deadline is being automatically extended indefinitely.
GCS Bucket (New File) | v Cloud Pub/Sub Topic | v GCSSource.Receive (Pub/Sub Message) | |--[ Deserialize JSON ]--> (If invalid: Ack & Drop) | |--[ Apply Filter ]------> (If rejected: Ack & Drop) | |--[ Check Sources ]-----> (If no match: Ack & Drop) | |--[ Get GCS Reader ]----> (If GCS error: Nack & Retry) | v file.File channel (Streamed to Ingestor)
gcssource.go: Contains the core logic for the GCSSource struct, the Pub/Sub message listener, and the GCS object retrieval logic.gcssource_manual_test.go: Provides integration tests using GCP emulators to verify the end-to-end flow from Pub/Sub message publication to file channel output.The filestore module provides a unified abstraction for interacting with different storage backends (Local and Google Cloud Storage) through the standard Go io/fs interface. It is designed to allow the Skia Perf system to remain storage-agnostic, enabling high-level components to consume data—primarily ingestion files—without needing to know whether the source is a physical disk or a cloud bucket.
The module focuses on a read-only, stream-oriented pattern, prioritizing simplicity and compatibility with the Go standard library over exhaustive implementation of filesystem metadata features.
fs.FS)The decision to wrap storage backends in fs.FS rather than creating a custom storage interface allows the system to leverage Go’s rich ecosystem of standard library tools (e.g., io.ReadAll, bufio.NewScanner). This abstraction ensures that unit tests can swap out a production GCS backend for a local directory or an in-memory filesystem without changing any business logic.
While both backends implement the same interface, they handle path resolution differently based on the nature of the underlying storage:
gs://<bucket>/<path>). It treats the entire GCS namespace as a single virtual filesystem. It parses these URLs on the fly to determine which bucket and object to fetch using the Google Cloud Storage API.Both implementations are optimized for data consumption.
Stat() are often left unimplemented or return minimal information. This reflects the design goal: the Perf ingestion engine cares about the content of the files (the JSON or Proto data) rather than the filesystem-level metadata like permissions or modification times.gcs SubmoduleBridges the storage.Client from the Google Cloud SDK to fs.FS.
gs:// paths into API requests.*storage.Reader within a custom file struct. This allows the module to satisfy the fs.File interface automatically, as storage.Reader already provides the necessary Read and Close methods.local SubmoduleWraps the os package's filesystem capabilities.
os.DirFS internally. The primary value-add of this submodule is the path sanitization and translation logic that allows the rest of the application to use standardized paths while the module maps them to the correct local disk location.The following diagram shows how the filestore abstraction allows the application to remain indifferent to the underlying storage type:
[ Application Logic ] | | Requests "path/to/data.json" v [ filestore (fs.FS) ] | +-----------------------+-----------------------+ | | | v v v [ Local Implementation ] OR [ GCS Implementation ] | | | 1. Resolve relative path | 1. Parse gs:// URL | 2. Access local disk | 2. storage.NewReader() v v [ Physical File ] [ GCS Object ]
gcs/gcs.go: Contains the logic for parsing GCS URIs and the implementation of the fs.FS and fs.File interfaces for cloud storage.local/local.go: Contains the logic for anchoring file access to a local root directory and implementing the fs.FS interface for disk-based storage.The gcs module provides a bridge between the standard Go io/fs interface and Google Cloud Storage (GCS). It allows systems—specifically the Skia Perf backend—to treat GCS objects as if they were files in a standard filesystem.
The primary motivation for this module is to abstract the complexities of the GCS client (authentication, bucket management, and reader handling) behind the ubiquitous fs.FS interface. This allows higher-level components to remain storage-agnostic, facilitating testing and potential migrations to other storage backends.
The module implements the fs.FS and fs.File interfaces. However, it follows a “minimal implementation” philosophy tailored to the needs of the Perf system.
storage.ScopeReadOnly during initialization. This design choice minimizes the security footprint of the service, ensuring it can only consume data and never accidentally modify or delete ingestion files.Stat() on the file struct are intentionally not implemented and return ErrNotImplemented. This decision was made because the Perf ingestion pipeline focuses on streaming data content rather than inspecting metadata (like timestamps or permissions) provided by os.FileInfo.Because fs.FS traditionally expects paths relative to a root, but GCS requires both a bucket name and an object path, the module utilizes a URL-based naming convention: gs://<bucket>/<path>.
The parseNameIntoBucketAndPath function decomposes these strings. It is designed to handle the nuances of URL parsing, such as stripping leading slashes from the URL path to convert them into valid GCS object keys.
gcs.go)This is the central coordinator that satisfies fs.FS. It holds a long-lived storage.Client, which is authenticated using Google Application Default Credentials.
Open(name) calls into GCS readers.Open is called, the filesystem parses the provided string into bucket and object components, then initializes a new storage.Reader using a background context.gcs.go)A thin wrapper around *storage.Reader.
storage.Reader (which provides Read and Close) with the fs.File interface.*storage.Reader, the struct automatically inherits the methods required for reading, keeping the implementation concise.The following diagram illustrates how a request for a file is translated from a standard interface call into a GCS network request:
[ Caller ]
|
| 1. Open("gs://my-bucket/data.json")
v
[ filesystem ]
|
| 2. parseNameIntoBucketAndPath() -> ("my-bucket", "data.json")
| 3. storage.Client.Bucket("my-bucket").Object("data.json").NewReader()
v
[ storage.Reader ] <--- Wrapped in ---> [ file (fs.File) ]
|
| 4. Read() / Close() operations
v
[ Google Cloud Storage API ]
The local module provides an implementation of the standard library's fs.FS interface specifically for the local file system. In the context of the larger Perf system, this module serves as a bridge between high-level file operations and physical storage on disk. By wrapping local file access in the fs.FS interface, it allows other components to remain agnostic about whether they are interacting with local storage, cloud storage, or an in-memory mock.
The implementation focuses on creating a “chrooted” view of the local filesystem. This is achieved by anchoring all file operations to a specific rootDir.
A key design choice is the use of os.DirFS. While os.Open can access any path on the system, os.DirFS restricts access to a specific directory tree. By combining an absolute root path with os.DirFS, the module ensures that callers interact with a controlled environment.
When a file is opened via the Open method:
rootDir.os.DirFS instance.This approach provides a layer of safety and abstraction: the consumer of the local package can provide paths as they exist on the system, and the module handles the translation necessary to satisfy the fs.FS requirements, which typically expect paths relative to the filesystem root.
filesystem StructDefined in local.go, this struct is the core of the module. It maintains two primary pieces of state:
rootDir: The absolute path to the base directory. This is captured during initialization via filepath.Abs to ensure that the base of the filesystem is immutable and clearly defined, even if the process's working directory changes.fs: An internal fs.FS instance (specifically a dirFS from the os package). This handles the actual low-level directory traversal and file reading.Open WorkflowThe Open method acts as a translation layer. Instead of directly opening a path, it enforces the “local root” logic.
Input Path (name) | v [ filepath.Rel ] <--- compares 'name' against 'rootDir' | +---- Error (if name is outside rootDir) | v Relative Path | v [ f.fs.Open ] <--- os.DirFS handles actual I/O | v fs.File
This workflow ensures that even if a full system path is passed to Open, the module correctly identifies the segment relative to its configured root, preventing accidental access to files outside the intended scope of the perf storage directory.
The /go/frontend module serves as the central web server and orchestration layer for the Skia Perf application. It is responsible for serving the Web UI, managing user authentication, and coordinating communication between various backend services such as trace stores, regression detection engines, and issue trackers.
The frontend service is designed as a controller-based system that abstracts complex performance data operations into user-facing API endpoints. It acts as the “glue” that binds together the telemetry data (traces), the version control history (Git), and the automated analysis tools (clustering and regression detection).
A key design philosophy of this module is asynchronous data handling. Performance dataframes can be massive, and generating them often exceeds the duration of a standard HTTP request. Consequently, the frontend utilizes a “Start-Status-Result” pattern, allowing the UI to poll for progress while a background worker processes the data.
frontend.go)This file acts as the primary initializer for the service. It performs several critical roles:
TraceStore, AlertStore, RegressionStore) based on the provided configuration.getPageContext method serializes the application's state into a window.perf JavaScript object. This ensures that the frontend UI has immediate access to instance-specific settings, feature flags (like FetchAnomaliesFromSql), and environment metadata (like the ImageTag)./api sub-module)The logic is partitioned into specialized API structs, each responsible for a functional domain of the application:
proxy.go)To circumvent Cross-Origin Resource Sharing (CORS) limitations when the browser needs to fetch data from external sources (e.g., googlesource.com), the module includes a specialized proxy handler. It forwards GET requests while carefully stripping security-sensitive headers like Origin and Referer to ensure the request is accepted by the destination server.
The module integrates with the alogin package to provide identity management. It uses a decorator pattern (RoleEnforcedHandler) to wrap sensitive endpoints, ensuring that only users with specific roles (e.g., Admin or Bisecter) can access administrative or resource-intensive functions.
When the frontend starts, it doesn't just wait for requests; it initiates several background synchronization tasks to ensure the data served is fresh.
Startup Sequence | |-- Load & Validate Config JSON |-- Initialize Trace & Metadata Stores |-- Start ParamSet Refresher (Periodic refresh of available trace keys) |-- Start Continuous Clustering (If enabled, runs background regression detection) |-- Initialize Notifiers (Email/Issue Tracker integrations) | V Serve HTTP
gotoHandler)A common workflow involves navigating from a specific Git commit hash to its representation in the Perf UI. The gotoHandler manages this translation:
CommitNumber using the perfGit provider.The module enforces strict validation on startup via the testdata fixtures. This ensures that misconfigurations (such as an empty instance_name or invalid connection strings) result in an immediate failure during deployment rather than subtle runtime errors or UI breakage.
To support CI/CD and staging environments, the system includes logic to override hostnames and strip environment-specific suffixes (like -autopush). This allows staging instances to use production-like configurations without requiring a complete duplication of the networking and authentication infrastructure.
The frontend is built to be “backend agnostic” regarding anomaly storage. It can be configured to fetch data from legacy Chromeperf APIs or the modern SQL-based (Spanner/CockroachDB) implementation. This is managed via the FetchAnomaliesFromSql flag, which determines which implementation of the TriageBackend is injected into the API controllers.
The /go/frontend/api module defines the HTTP interface for the Perf application. It serves as the orchestration layer between the web frontend and various backend services, including trace stores, regression detection engines, issue trackers, and the Chromeperf legacy system.
This module follows a controller-based pattern where specific functional areas (Alerts, Anomalies, Graphs, Triage, etc.) are encapsulated in individual API structs. Each struct implements a RegisterHandlers method to attach its endpoints to a central Chi router.
The design emphasizes:
Store, IssueTracker) allowing the system to switch between Skia-native implementations and Chromeperf-compatible backends.common.go ensures that requests from non-production environments (staging, autopush) are correctly routed or identified, facilitating a seamless CI/CD flow.graphApi.go, mcpApi.go)Responsible for fetching and formatting performance trace data for visualization.
graphApi manages the “Start-Status-Results” lifecycle for building dataframes. This allows the UI to poll for progress while the backend processes large volumes of trace data.ingestedFS.mcpApi provides a specialized endpoint for LLM/Agentic tools to query trace data within specific time ranges and query parameters.regressionsApi.go, anomaliesApi.go)Handles the detection, listing, and lifecycle of performance regressions.
anomaliesApi is designed to support both the legacy Chromeperf backend and the modern Skia-native storage. It uses preferLegacy flags to determine whether to proxy requests to Chromeperf or query the local regStore and subStore.triageApi.go, triageBackend.go)Facilitates the workflow of turning detected anomalies into actionable bugs.
TriageBackend interface, the system can either file bugs directly into the Issue Tracker (Skia-native) or proxy the request to Chromeperf's triage service.regression.Store and the commit history.alertsApi.go, sheriffConfigApi.go)Manages the configuration that drives automated regression detection.
alertBugTryHandler and alertNotifyTryHandler.sheriffConfigApi provides metadata and validation endpoints used by LUCI Config to ensure that configuration changes in external repositories are valid before being ingested.shortcutsApi.go, favoritesApi.go)Enhances user experience through personalization and shareability.
shortcutsApi maps complex trace selections (lists of keys) to short IDs, enabling shareable URLs for specific graph views.The following process is used for operations like /v1/frame/start:
User Request API Layer Progress Tracker Backend Worker | | | | |-- POST /start -->| | | | |-- Create Progress ID --> | | |<-- Return ID ----| | | | |-- Start Go Routine --------------------------->| | | | |-- Fetch Data --| |-- GET /status -->| | | | |<-- Progress % ---| <---- Query Status ------| | | | | | |-- Process -----| |-- GET /results ->| | | | |<-- DataFrame ----| <---- Get Results -------| <--- Mark Done -----|
The API handles triage by coordinating between the UI, the internal regression store, and external trackers:
[UI] ----(EditAnomaliesRequest)----> [triageApi] | [TriageBackend] / \ (Skia Native) / \ (Chromeperf Proxy) / \ [regStore.SetBugID] [Chromeperf Client] | | [DB Update] [External API POST]
In common.go, the function getOverrideNonProdHost is used to strip suffixes like -autopush or -staging. This was implemented to allow testing environments to interact with production-like service configurations without needing to replicate the entire networking and authentication stack for every environment variant.
anomaliesApi.go includes logic to “clean” test names. This is a defensive implementation against malformed trace IDs that might contain characters incompatible with URL parsing or specific database query engines. It uses a configurable regex (InvalidParamCharRegex) to ensure consistency across different Perf instances (e.g., Fuchsia vs. Skia).
The subscriptionsHandler in alertsApi.go is designed to provide a flat list of all monitoring subscriptions. This is critical for the “Sheriff” view in the frontend, where users need to filter regressions based on their team's ownership rather than individual alert IDs.
The /go/frontend/api/testdata directory serves as a controlled environment for testing the Perf frontend API and its configuration parsing logic. It provides a canonical example of a complete system configuration, allowing developers to verify how the application interprets complex settings without relying on a live production environment.
The primary component of this module is config.json. This file is designed to simulate a realistic application state for integration tests and local development mocks. By centralizing these values, the project ensures that API endpoints—which often behave differently based on the underlying data store or authentication headers—can be tested against a predictable “ground truth.”
A key design choice reflected in this data is the use of a local, file-based environment that mirrors production complexity. For instance, the configuration specifies a cockroachdb datastore type and a local directory for data ingestion. This allows the testing suite to validate the frontend's ability to handle SQL-backed data flows and ingestion triggers in an isolated sandbox.
The data within this module covers several critical functional areas of the Perf frontend:
chrome-perf-test) and how it handles user identity. The auth_config specifies X-WEBAUTH-USER as the source of truth for identity, which is essential for testing authorization middleware and audit logging.use_regression2_schema: true, the test data forces the application into a specific architectural path for regression detection, facilitating tests for the newer data schema.arch, config, or bot) are indexed for queries and provides a “Favorites” structure. This part of the configuration is used to verify that the frontend correctly renders navigation links and filters based on the configuration file rather than hardcoded values.The following diagram illustrates how the configuration data in this module influences the behavior of the API during testing:
+-----------------------+ +-------------------------+ +----------------------+ | /testdata/config.json | ---> | API Configuration Layer | ---> | Mock/Test Handlers | +-----------------------+ +-------------------------+ +----------------------+ | | | | (Defines Auth) | (Defines Storage) | (Defines UI) v v v [Header: X-WEBAUTH-USER] [Conn: cockroachdb/demo] [Favorites & Links]
Developers use this module to:
frontend/api package align with the expected JSON format.backend_host_url and git_repo_config values to simulate cross-service communication.invalid_param_char_regex provides a standard for testing input validation across various API endpoints to prevent injection or malformed queries.The /go/frontend/mock module provides a self-contained, high-fidelity mock server for the Perf application. Unlike simple unit tests or component-level demos, this server renders the actual production HTML templates and JavaScript bundles while simulating the entire backend API.
It is primarily used for:
test_on_env Bazel rule.The mock server uses the real frontend.Frontend logic to load and execute Go HTML templates found in perf/pages/production. It injects a specialized mockContext (defined in frontend_mock_for_demo.go) into these templates. This context mimics the global configuration usually provided by the production server, enabling features like the Pinpoint bisect button or specific chart tooltips that are toggleable via feature flags.
The backend is simulated using a set of hardcoded data structures in frontend_mock_api_impl.go. The implementation focuses on mimicking the behavior of the Perf API rather than just returning static JSON:
nextParamListHandler simulates the hierarchical filtering of trace keys (e.g., selecting an “arch” narrows down the available “os” values)./frame/start and /status/{id} pattern. It stores a “pending” query in memory and returns a “Finished” status with a mock dataframe when polled.test_on_env)The server includes specific logic to support automated testing environments:
ENV_DIR environment variable is detected, the server listens on port :0 to avoid collisions during parallel test execution.| Component | Responsibility |
|---|---|
frontend_mock_for_demo.go | Entry point. Configures the chi router, handles static asset serving (JS/CSS/Images), and defines the render helper that injects mock global state into HTML templates. |
frontend_mock_api_impl.go | Contains the mock logic for all /_/ API endpoints, including alerts, regressions, triage, favorites, and trace data retrieval. |
BUILD.bazel | Defines the mock_dist_files filegroup, which collects all production UI assets (CSS, JS, Maps, Images) required to make the server functional without an external CDN. |
The following diagram illustrates how the mock server simulates the asynchronous data fetching process used by the frontend to render graphs:
Browser Mock Server (frontend_mock_server) | | |-- POST /_/frame/start -->| 1. Stores Queries in m.currentQueries | | 2. Returns status: "Running", url: "/_/status/demo-req" |<-- JSON {status, url} ---| | | |-- GET /_/status/demo-req | 3. Retrieves stored queries | | 4. Filters mockTraceData based on queries |<-- JSON {status: "Finished", results: {dataframe}} | | | (Browser renders plot) |
The server maintains a small amount of in-memory state (protected by sync.Mutex) to track the “current” query, but otherwise acts as a functional simulator of:
arm, x86_64) and OSs (Android, Ubuntu, etc.).user@google.com with admin and bisecter roles.The testdata module serves as a centralized repository for configuration fixtures used during the unit testing and integration testing of the Perf frontend. It focuses on simulating various application states and ensuring that the configuration parser and validator logic are robust against malformed or boundary-pushing inputs.
The primary design goal for this module is to provide repeatable, immutable data structures that represent both valid and invalid application states. Rather than programmatically generating complex configuration objects in Go code, these JSON files allow developers to see the exact structure the frontend expects to ingest from environment-specific configuration maps.
Using separate files for specific failure modes (e.g., missing or excessive string lengths) allows the test suite to execute table-driven tests that specifically target the validation logic within the frontend initialization phase. This approach ensures that the application fails gracefully with descriptive errors when provided with an invalid configuration, rather than encountering runtime panics.
Standard Configuration Fixture The config.json file represents a complete, valid configuration. It defines the operational environment for the frontend, including:
Validation Boundary Fixtures The remaining files in this module are designed specifically to test the constraints of the instance_name field. This field is a critical identifier used in telemetry, logging, and external service integration.
config_empty_instance_name.json and config_no_instance_name.json are used to verify that the system correctly identifies missing required fields or prevents the use of empty strings where a unique identifier is expected.config_long_instance_name.json provides a value that exceeds standard character limits (typically 64 characters). This is used to test that the frontend validation logic prevents data that might be rejected by downstream cloud services or cause UI layout breakage.The following diagram illustrates how these files are utilized during the application lifecycle testing:
+-----------------+ +-----------------------+
| Test Runner | ----> | Load JSON fixture |
| (Frontend Unit) | | from /testdata/ |
+-----------------+ +-----------+-----------+
|
v
+-----------------+ +-----------------------+
| Assert Expected | <---- | Execute Unmarshal and |
| Error/Success | | Validation Logic |
+-----------------+ +-----------------------+
By checking against these predefined files, the frontend ensures that changes to the configuration structures in the Go source code do not silently break compatibility with existing configuration formats used in production environments.
fuchsia_to_skia_perf is a command-line utility designed to bridge the gap between Fuchsia‘s performance testing infrastructure and the Skia Performance monitoring system. It transforms performance data from Fuchsia’s native JSON format into a schema that the Skia Perf ingestion pipeline can parse and visualize.
The tool operates as a specialized data ETL (Extract, Transform, Load) pipeline. It extracts raw metrics from Fuchsia build artifacts, transforms them by calculating statistical aggregates and normalizing units, and loads them into a destination suitable for Skia Perf—either a local directory or a Google Cloud Storage (GCS) bucket.
The primary goal of this tool is to ensure that performance regressions in Fuchsia can be tracked using Skia's visualization tools, which require specific metadata (like “improvement direction”) and a flat, trace-based data structure.
Fuchsia test results often bundle multiple test suites into a single large JSON file. However, Skia Perf is optimized for partitioning data based on specific benchmarks. To align these two systems, the converter splits a single Fuchsia input record into multiple Skia Perf JSON files, keyed by the test suite name. This allows Skia to treat each suite as a distinct entity, improving query performance and visualization clarity.
Rather than just reporting raw values, the converter automatically generates two distinct types of entries for every metric:
_avg): A specialized entry focusing on the mean and error (standard deviation).This dual-entry approach is a deliberate choice to support different visualization needs in Skia Perf: the _avg series provides a clean trend line for dashboards, while the base entry allows for deep-dive analysis into the variance and distribution of test results.
Fuchsia and Skia use different conventions for units and “improvement direction” (i.e., whether a higher or lower number is better). The module implements a mapping logic that:
nanoseconds, ns) to a canonical format (e.g., ms) and scales values accordingly (e.g., dividing by 1,000,000).smallerIsBetter or biggerIsBetter flag based on the unit. It supports overrides within the input data, allowing developers to explicitly mark a metric (e.g., ms_biggerIsBetter) if the default assumption is incorrect./convert/lib.go)This is the core logic of the module. It handles the lifecycle of a conversion run:
build_id and commit_id.SkiaPerfResult schema.Processing Workflow: [Fuchsia JSON] -> [Unmarshal] -> [Group by Benchmark] | v [Calculate Stats] <--- [Map Units/Direction] <--- [Scale Values] | v [Construct SkiaPerfResult] -> [Write Local File] -> [Upload to GCS (Optional)]
/convert/types.go)This component defines the structural contract between the two systems.
FuchsiaPerfResults: Models the input, focusing on build metadata and raw measurement arrays.SkiaPerfResult: Models the output, which includes a Key map (defining the trace's identity—bot, master, benchmark) and the calculated result items.main.go)The CLI wrapper handles environment-specific configuration. It manages:
ingest/YYYY/MM/DD/ structure to facilitate efficient discovery by the Skia Perf ingester.The tool generates output filenames using a specific pattern: <build_id>-<benchmark>-<bot>-<master>.json. This naming convention ensures uniqueness and provides enough context for administrators to manually inspect the ingestion bucket if necessary.
When calculating the “error” metric for the average results, the tool utilizes a sample standard deviation. This provides a statistically sound representation of variance, which Skia Perf uses to render error bars in its UI.
The convert module provides the logic for transforming performance test results from the Fuchsia JSON format into the Skia Perf format. This conversion allows performance data generated by Fuchsia builders to be ingested and visualized by Skia's performance monitoring tools.
The module functions as a data pipeline that reads a specific Fuchsia performance schema, normalizes units and improvement directions, calculates statistical aggregates, and outputs files compatible with Skia Perf's ingestion requirements.
nanoseconds, ns, milliseconds) to a canonical set of Skia units (e.g., ms). It also handles value scaling, such as converting nanoseconds to milliseconds or bytes to MiB._avg item that specifically focuses on the mean and error (standard deviation), which is often the primary metric for visualization.ms defaults to smallerIsBetter) but allows the input data to override this via a suffix (e.g., ms_biggerIsBetter).lib.go)This file contains the core processing engine. The conversion process follows this workflow:
Fuchsia JSON Input | v Unmarshal & Validate (Check BuildID, CommitID, etc.) | v Group results by Test Suite (Benchmark) | +------> Calculate Stats (Min, Max, Avg, StdDev) | +------> Map Units & Improvement Direction | v Construct SkiaPerfResult Object | +------> Write to Local Disk (if configured) | +------> Upload to GCS (if client provided)
Run(cfg Config): The entry point that orchestrates the file reading, grouping, and output generation.PopulateResults(perfResults): Maps the raw measurements into the structured SkiaResultItem format, applying the dual-item (base + average) generation strategy.MapUnitAndDirection(input): Resolves the final string used by Skia to determine both the unit type and the visualization polarity (up/down).CalculateStats(results): Performs numerical analysis and unit conversion (e.g., ns to ms). It uses a sample standard deviation calculation for the error metric.types.go)Defines the structure of both the input and output formats.
FuchsiaPerfResults: Represents the input schema, which is a list of build records, each containing metadata like build_id, builder, and commit_id, alongside an array of performance measurements.SkiaPerfResult: Represents the output schema. It includes a Key map (defining the trace identity) and a Results array containing the actual measurements and their associated metadata.The conversion process is controlled via the Config struct, which specifies:
ingest/YYYY/MM/DD/).The perf/go/git module provides a high-level abstraction and persistence layer for Git repository data within the Perf system. Its primary responsibility is to bridge the gap between non-linear Git history (identified by hashes) and Perf's internal requirement for a linear, integer-based timeline (identified by CommitNumber).
In the Perf ecosystem, performance data is plotted against a continuous x-axis. Because Git hashes are non-deterministic and non-sequential, this module maps every relevant Git commit to a monotonically increasing CommitNumber.
The module performs three core functions:
provider abstraction) to find new commits.A fundamental design choice in Perf is the use of CommitNumber as the primary coordinate for data. This module supports two ways of determining this number:
Cr-Commit-Position: refs/heads/master@{#727989}), the module can be configured with a regex to extract this number directly. This ensures that the CommitNumber in Perf matches the official project revision.To minimize database load, the implementation utilizes an LRU (Least Recently Used) cache for commit details. Given that a typical commit entry is approximately 400 bytes, the cache is capped at 25,000 entries (roughly 10MB). This significantly speeds up the rendering of dashboards and alerts where the same recent commits are requested frequently.
The module follows a “sync-and-cache” pattern. It does not query the Git provider (Gitiles or local CLI) for every user request. Instead, it runs a background goroutine that pulls updates into the local SQL database. This ensures that even if the Git backend is temporarily slow or unavailable, the Perf UI remains responsive using the cached metadata.
interface.goDefines the Git interface, which is the contract for all Git-related operations in Perf. This includes methods for range lookups (CommitSliceFromTimeRange), history traversal (PreviousGitHashFromCommitNumber), and file-specific auditing (CommitNumbersWhenFileChangesInCommitNumberRange).
impl.goThe primary implementation of the Git interface. It manages the lifecycle of the background updater and contains the SQL logic for both PostgreSQL and Spanner.
Update method identifies the delta between the most recent hash in the database and the current HEAD of the repository, then streams and inserts the missing commits.ON CONFLICT DO NOTHING clauses to ensure that multiple service instances or rapid update cycles do not result in duplicate entries.gittest/A specialized test harness that bootstraps a complete environment for integration testing. It creates a real Git repository with deterministic timestamps, initializes a test database, and provides a pre-populated set of hashes. This ensures that logic involving time-to-hash mapping can be tested without flakiness.
The following diagram illustrates how the module synchronizes the database with the remote repository:
[ Background Poller ] [ SQL Database ] [ Git Provider ] | | | |--- 1. Get Most Recent ---->| | | Commit from DB | | |<----- (Hash, Number) ------| | | | | |--- 2. Fetch New Commits ---------------------------->| | since <Hash> | | |<---------------------------- (Stream of Commit Objs) | | | | |--- 3. Extract Number ----> | | | (Regex or Incr) | | | | | |--- 4. INSERT INTO Commits >| | | (Hash, Meta, No.) | |
provider/: Defines the low-level interface for fetching raw Git data.providers/: A factory module that selects between git_checkout (local CLI) and gitiles (REST API) based on the instance configuration.schema/: Defines the database table structure used to persist commit metadata.mocks/: Provides autogenerated mocks for unit testing components that depend on the Git interface.The gittest module provides a high-level test harness for the Perf system‘s Git integration. It is designed to bootstrap a realistic environment for integration tests, bridging the gap between raw Git repositories and the Perf service’s data structures.
The primary goal of this module is to abstract away the repetitive setup required to test Git-based performance monitoring. Testing the Perf Git logic requires a complex state consisting of:
By providing a single constructor (NewForTest), this module ensures that tests across the Perf codebase use a consistent dataset, making it easier to verify algorithms that traverse history or map timestamps to commit hashes.
The core of the module is the NewForTest function. It manages the orchestration of several distinct subsystems:
testutils.GitBuilder to initialize a temporary Git repository. It populates this repository with a predefined sequence of commits starting at StartTime (Unix 1680000000), spaced exactly one minute apart. This predictability allows test authors to write assertions based on relative time offsets.sqltest.NewSpannerDBForTests. This provides the persistence layer where Perf stores metadata associated with Git commits.git_checkout.Provider. This is the component responsible for actually interacting with the Git binary and the local filesystem, ensuring that the test environment behaves identically to a production deployment.t.Cleanup to ensure that temporary directories, database connections, and background processes (like the Git builder) are torn down immediately after a test completes, preventing resource leaks in the test runner.When NewForTest is called, the following process occurs:
[ GitBuilder ] --(Creates Repo)--> [ Local .git Dir ]
| |
| (Commits files at 1min intervals) |
v v
[ Commit Hashes ] <----------- [ git_checkout.Provider ]
| |
| (Configuration Links Them) |
v v
[ InstanceConfig ] <---------- [ Spanner DB Instance ]
foo.txt, bar.txt).InstanceConfig is generated, pointing the GitRepoConfig.URL to the GitBuilder's directory and setting a temporary path for the local checkout.git_checkout provider is initialized, which effectively “clones” the builder's repo into the temporary directory.The returned hashes slice is critical for testing. Since Git commit hashes are non-deterministic (based on authorship and exact time of creation), the module returns the generated hashes in chronological order. Tests use these hashes to verify that the Git provider correctly identifies the “revision” associated with specific performance data points.
The go.skia.org/infra/perf/go/git/mocks module provides a mock implementation of the Git interface used within the Perf system. This module is essential for unit testing components that interact with Git history, commit data, and repository metadata without requiring a live Git repository or network access to a Git provider.
The primary goal of this module is to enable predictable, isolated testing of Perf's business logic. In the Perf system, the Git interface (defined in perf/go/git/provider) acts as the bridge between performance data and the source code history. Many operations—such as calculating regression ranges, mapping timestamps to commits, or identifying when specific files changed—rely on this interface.
By using these mocks, developers can:
git.Git StructThe core of this module is the Git struct found in Git.go. It is an autogenerated mock produced by mockery, utilizing the testify/mock framework. It implements every method required by the Perf system's Git provider interface, including:
types.CommitNumber (Perf's internal sequential index), Git hashes, and timestamps (e.g., CommitNumberFromGitHash, CommitNumberFromTime).CommitFromCommitNumber, CommitSliceFromTimeRange).CommitNumbersWhenFileChangesInCommitNumberRange).Update, StartBackgroundPolling).When writing a test, you instantiate the mock using NewGit, which automatically registers cleanup functions to verify that all expected calls were made before the test finishes.
Test Setup Phase ---------------- 1. Call mocks.NewGit(t) 2. Define expectations using .On(...).Return(...) Execution Phase --------------- 3. Pass the mock into the component being tested 4. Component calls Git methods (e.g., GitHashFromCommitNumber) 5. Mock returns the pre-defined values Verification Phase ------------------ 6. Test finishes 7. Cleanup function calls AssertExpectations
The mock is tightly coupled with:
go.skia.org/infra/perf/go/types: For internal Perf types like CommitNumber.go.skia.org/infra/perf/go/git/provider: For the Commit data structure and the interface definition.github.com/stretchr/testify/mock: For the underlying mock engine.Because the code is generated, the logic within Git.go focuses on checking types and returning values provided during the “Setup” phase of a test. If a method is called that wasn‘t expected, or if a return value wasn’t specified for a called method, the mock will trigger a panic to alert the developer of an incomplete test configuration.
The provider module establishes a uniform abstraction layer for interacting with Git repositories within the Skia infrastructure. Rather than forcing downstream consumers to handle the specifics of local Git CLI operations versus remote Gitiles API calls, this module defines a common interface and data structure for retrieving commit history, metadata, and file-specific changes.
By decoupling the data source from the data consumption, the module allows Perf and other services to remain agnostic about how repository data is physically fetched or stored.
The primary design goal is to provide a consistent view of repository history that is optimized for consumption by performance monitoring systems.
CommitProcessor pattern used in CommitsFromMostRecentGitHashToHead is designed for efficiency when processing large ranges of history. Instead of loading thousands of commits into memory simultaneously, the provider streams commits to the caller. This minimizes memory overhead during initial repository indexing or catch-up tasks.Provider interface is intentionally minimal. It assumes that the underlying implementation (whether it's a local git checkout or a network-based service like Gitiles) handles the complexities of authentication, caching, and network protocols.Commit struct serves as a bridge between Git's raw output and the internal database schema. It includes CommitNumber (a monotonic offset used for indexing in Perf) and utilizes JSON annotations that maintain compatibility with legacy CommitDetail structures.Commit struct where the Body field is kept for parsing metadata (such as extracting specific commit numbers or Gerrit footers), but is explicitly noted as not being intended for database storage. This saves significant storage space while still providing the necessary context during the ingestion phase.Defined in provider.go, this interface outlines the essential operations required to sync and query repository state:
Update ensures the local view of the repository is current.CommitsFromMostRecentGitHashToHead enables “incremental” ingestion. By passing the last known Git hash, the provider can determine the delta between the database and the current HEAD, processing only new commits in chronological order.GitHashesInRangeForFile allows the system to filter noise by identifying exactly when specific configuration or data files were modified within a range, rather than scanning every commit in the repository.The Commit struct represents the canonical version of a Git commit within the Skia ecosystem. It includes helper methods for human-readable output:
Display: Generates a standardized short-form string (e.g., 7abc123 - 2 days ago - Fix memory leak) used in UI logs and CLI outputs.HumanTime: Leverages the go/human package to convert Unix timestamps into relative durations, providing a more intuitive sense of “when” a change occurred compared to raw epoch values.The typical interaction between a consumer and the provider follows a “sync-and-stream” pattern to keep internal databases up to date with the remote repository.
[ Consumer ] [ Provider ] [ Git Backend ] | | | |--- 1. Update() ----->| | | |--- 2. git pull / API | | |<-- 3. New Commits ---| | | | |--- 4. CommitsFrom(lastHash, callback) ----->| | | | | |--- 5. Parse Commits -| | | | |<-- 6. Invoke callback(Commit) [Repeated] ---| | | | |--- 7. Store in DB -->| |
This workflow ensures that the consumer only processes what is necessary and that the logic for “what has changed” remains encapsulated within the provider implementation.
The providers module serves as a factory and abstraction layer for obtaining Git data in the Perf system. Its primary purpose is to instantiate a provider.Provider based on the system's configuration.
By abstracting the source of Git information, the rest of the Perf application can remain agnostic of whether it is interacting with a local disk-based repository or a remote web-based Gitiles instance. This flexibility allows Perf to scale across different infrastructure environments—from high-performance local setups to cloud-native deployments where persistent disk management is undesirable.
The module implements a single factory function, New, which encapsulates the logic for selecting and initializing the appropriate Git provider. This centralizes the dependency management for Git access, ensuring that the calling code does not need to know about authentication scopes, HTTP clients, or local filesystem paths.
The choice of provider is driven by the GitRepoConfig.Provider setting in the InstanceConfig:
git_checkout): Selected if the provider is explicitly set to CLI or left empty (default). This uses a local Git binary and a directory on disk.gitiles): Selected when the provider is set to Gitiles. This bypasses the local disk and interacts with repositories via the Gitiles Web API.A key responsibility of this module is preparing the necessary credentials for remote communication.
google.DefaultTokenSource with the auth.ScopeGerrit scope. It then wraps this in a standard httputils client before passing it to the Gitiles implementation. This ensures that the provider is ready to perform authenticated API calls immediately upon creation.git_checkout module (via gitauth and cookie files), but the factory ensures the correct environment configuration is passed down.builder.goThis is the entry point for the module. It manages the imports for all supported backend implementations (git_checkout and gitiles). It acts as the “glue” that translates configuration strings into functional Go objects.
provider.Provider InterfaceWhile defined in an external package (//perf/go/git/provider), this interface is the “contract” that this module fulfills. Any provider returned by the factory is guaranteed to support:
The following diagram illustrates how the New function determines which implementation to return:
[ InstanceConfig ] | v Check GitRepoConfig.Provider | +--- [ empty ] or "CLI" ----> [ Initialize git_checkout ] | | | v | Check/Clone local directory | | | v | Return git_checkout.Impl | +--- "Gitiles" -------------> [ Initialize Gitiles ] | | | Create OAuth2 Token | | | Setup HTTP Client | | | v | Return gitiles.Gitiles | +--- [ Other ] -------------> Return Error (Invalid Type)
The git_checkout module provides a Git repository provider for the Perf system. It implements the provider.Provider interface by wrapping a local Git checkout and executing git commands via system calls. This module is designed for environments where a persistent, on-disk Git clone is preferred for performance or local tool integration, allowing Perf to synchronize with a remote repository and query its history.
The primary design choice is “shelling out” to the system's git executable rather than using a pure Go Git implementation. This ensures full compatibility with all Git features, including complex authentication schemes (like Gerrit/git-cookie) and standard performance optimizations that the native Git binary provides.
The module manages a local directory (specified in the InstanceConfig).
git clone.Update method performs a git pull to bring the local checkout up to date with the remote tracking branch.startCommit configuration. This acts as a logical “horizon”; the provider can be configured to ignore history preceding this commit, which is useful for large repositories where only recent history is relevant to performance tracking.The module integrates with Google Cloud's google.DefaultTokenSource to handle Gerrit authentication. When enabled, it uses gitauth to manage a /tmp/git-cookie file, ensuring that the background git processes have the necessary credentials to interact with protected remote repositories.
To avoid loading large amounts of git history into memory at once, the module uses a streaming parser (parseGitRevLogStream). It pipes the output of git rev-list directly into a scanner that processes commits one-by-one, invoking a callback for each. This design allows the system to process thousands of commits with a constant memory footprint.
Impl StructThe core implementation of the provider.Provider interface. It maintains the absolute path to the Git executable and the repository's location on disk.
CommitsFromMostRecentGitHashToHead)Retrieves new commits since a given hash. It utilizes git rev-list with a range (e.g., hash..HEAD) and specific formatting flags to extract the author, subject, and Unix timestamp.
startCommit.HEAD.GitHashesInRangeForFile)Finds all commits within a range that modified a specific file. This is crucial for Perf's “blame” or “trace” features where changes to specific configuration or data files need to be tracked. It translates the request into a git log --format=%H -- <filename> command.
LogEntry)Provides the full, human-readable commit message and metadata for a specific hash using git show -s. This is typically used for UI display when a user inspects a specific point on a performance trace.
The following diagram illustrates the workflow when New() is called:
Config -> [ Auth Check ] -> ( Gerrit Auth via gitauth/git-cookie ) | v [ Find Git Binary ] -> ( Resolve Absolute Path ) | v [ Repo Check ] ------> ( Directory Exists? ) | | | No | Yes v v ( Run git clone ) ( Use existing dir ) | | +----------+----------+ | v Return Impl{}
The workflow for identifying new work typically follows this pattern:
Caller -> Update() -> [ git pull ] | +-> CommitsFromMostRecentGitHashToHead(last_hash) | +-> [ git rev-list last_hash..HEAD --pretty ] | +-> ( Stdout Pipe ) | v [ parseGitRevLogStream ] -> ( Callback for each Commit )
The gitiles module provides an implementation of the provider.Provider interface that interacts with Git repositories via the Gitiles Web API. In the context of the Perf system, this module is responsible for discovering new commits, retrieving commit metadata, and filtering history for specific files without requiring a local checkout of the repository.
By using Gitiles, the system can operate in environments where local disk space is at a premium or where maintaining a constantly updated local git clone is inefficient. It acts as a bridge between the high-level performance tracking logic and the remote version control system.
Gitiles StructThe core of the module is the Gitiles struct, which encapsulates the logic for communicating with a remote Gitiles instance. It stores configuration such as the repository URL, the target branch, and an optional starting commit to limit the scope of history processing.
The primary responsibility of this module is to stream commits from a known point up to the current HEAD. This is handled by CommitsFromMostRecentGitHashToHead.
LogFnBatch to fetch commits in batches (defaulting to 100). This reduces round-trip overhead while maintaining a manageable memory footprint.gitiles.LogReverse() option. This ensures that commits are processed in chronological order (oldest to newest), which is critical for the Perf system to build its internal representation of history linearly.HEAD. For other branches, it uses the fully qualified branch name and a starting commit offset to ensure it tracks the correct line of development.The GitHashesInRangeForFile method allows the system to query history for specific paths. This is used when the system needs to determine which commits actually modified a specific configuration file or test suite, allowing it to skip irrelevant commits during analysis.
The LogEntry method provides a standardized way to retrieve a formatted string containing commit details (Author, Date, Subject, Body). This is used for displaying commit information in the Perf UI.
When the Perf system needs to update its view of the world, it triggers a discovery process through this provider:
[Perf System] -> Call CommitsFromMostRecentGitHashToHead(last_known_hash)
|
v
[Gitiles Provider] -> Determine branch expression (e.g., "refs/heads/main")
|
v
[Gitiles Provider] -> Request batch of commits from Gitiles API (Reversed)
|
+--< [Batch Received]
| |
| v
| [CommitProcessor Callback] -> (Perf system stores/indexes commit)
| |
+----------+ (Repeat until HEAD reached)
|
v
[Perf System] -> Update complete
provider.Provider interface at compile time: var _ provider.Provider = (*Gitiles)(nil).Update method is a no-op in this provider. Unlike local git providers that need to perform a git fetch to update local state, the Gitiles provider is naturally up-to-date as it queries the remote API directly for every request.go.skia.org/infra/go/skerr to wrap errors from the Gitiles client, providing context on whether a failure occurred during batch loading, log retrieval, or callback processing.The schema module defines the foundational data structures and database schema for tracking Git commits within the Perf system. This module serves as the “source of truth” for how commit metadata is persisted and mapped between the relational database and Go types.
A core requirement of the Perf system is the ability to map linear Git history to a continuous range of integers, referred to as CommitNumber. While Git natively identifies commits via non-linear hashes, the Perf database requires a strictly increasing integer key to efficiently handle time-series data, range queries, and regressions.
The Commit struct acts as the bridge between these two worlds. It pairs the immutable Git metadata (hash, author, timestamp) with a monotonically increasing CommitNumber that defines the commit‘s position in the Perf system’s timeline.
The Commit struct is designed to be used directly with an SQL-based ORM or schema generator. Its fields are chosen to satisfy the requirements of both the UI (showing author and subject) and the backend analytical engines (filtering by time or commit range).
CommitNumber.The schema facilitates a workflow where raw Git data is ingested and “indexed” into the Perf database:
Git Repository Ingestion Process Perf Database (Schema) +------------+ +-----------------------+ +---------------------------+ | SHA: a1b2c | ------> | 1. Assign CommitNumber| ----> | PK: CommitNumber (e.g. 5) | | Author: .. | | 2. Extract Metadata | | Hash: a1b2c | | Subject:.. | | 3. Insert into DB | | Timestamp: 1672531200 | +------------+ +-----------------------+ +---------------------------+
The struct includes JSON annotations designed to maintain serialization parity with the legacy cid.CommitDetail types. This implementation choice allows the backend to transition to the new schema-driven database approach without breaking existing frontend consumers that expect a specific JSON shape when requesting commit details.
The graphsshortcut module provides the core data structures and interfaces for managing graph shortcuts within the Perf system. A “shortcut” is a persistent snapshot of a user's dashboard configuration—including multiple graph definitions, queries, and formulas—represented by a unique, content-addressed ID.
This module acts as the domain layer for the “permalink” and “multigraph” features, allowing complex visualizations to be shared via short URLs. By decoupling the shortcut definition from the specific storage implementation, it enables the system to support diverse backends like SQL databases for production and in-memory caches for local development.
The module implements a deterministic ID generation strategy in the GetID() method. The ID is an MD5 hash derived from the contents of the GraphsShortcut object.
Queries and Formulas within each GraphConfig. This ensures that two shortcuts representing the same data but created with different UI selection orders remain functionally and identifies as identical.Graphs array itself is preserved in the hash. This is a deliberate choice because the sequence of graphs on a dashboard is part of the user's intended layout.The module defines a Store interface rather than a concrete implementation. This allows the core logic of Perf to remain agnostic of the underlying database technology. It provides a contract for two primary operations:
InsertShortcut: Storing a configuration and returning its unique ID.GetShortcut: Retrieving a configuration based on its ID.graphsshortcut.go)GraphConfig: Represents the parameters for a single visualization. It bundles Queries (trace filters), Formulas (mathematical transformations), and Keys (specific trace identifiers).GraphsShortcut: A container for one or more GraphConfig objects. This allows a single shortcut to represent an entire dashboard of multiple graphs rather than just a single trace.The Store interface serves as the gateway to persistence. Implementations of this interface (found in the sibling graphsshortcutstore module) handle the complexities of JSON serialization and database interactions. By keeping the interface in this base module, the system avoids circular dependencies between the storage logic and the domain objects.
The following diagram illustrates how the module ensures that shortcut IDs are generated consistently, regardless of how the user interacted with the UI to create the queries.
[ User Input ] [ GraphsShortcut.GetID() ] [ Result ] | | | | 1. Create Shortcut | | | Graph A: | | | - arch=arm | | | - config=8888 | | +---------------------------->| | | 2. Sort Queries/Formulas | | (arch, config) | |-------------------+ | | | | |<------------------+ | | | | 3. MD5 Hash Content | |-------------------+ | | | | |<------------------+ | | | | 4. Return Hex String | +--------------------------->| "c21e3c..."
The graphsshortcutstore module provides implementations for persisting and retrieving graph shortcuts in the Perf system. A graph shortcut is essentially a saved state of multiple graphs—such as trace filters, queries, and display settings—that can be referenced via a unique ID.
This module fulfills a critical role in the “permalink” and “multigraph” features of Perf, allowing users to share or revisit complex dashboard configurations without encoding the entire state into a URL.
The module is designed around the graphsshortcut.Store interface, with two distinct implementations tailored for different environment constraints:
cacheGraphsShortcutStore, uses an in-memory or distributed cache rather than a database. This was specifically designed to solve the “breakglass” problem: when developers connect a local instance to a production database for debugging, they often lack write permissions to the production SQL tables. By routing shortcut writes to a local cache, developers can still use features like “multigraph” and shortcut generation without needing elevated database privileges.GraphsShortcut struct into JSON before storage. This avoids the need for a complex relational schema for the graph configurations themselves, which are frequently subject to UI-driven changes. By storing them as blobs, the store remains agnostic to the internal structure of the graph data.Located in graphsshortcutstore.go, this is the primary SQL-backed implementation. It manages the lifecycle of shortcuts using two main operations:
INSERT ... ON CONFLICT (id) DO NOTHING. The “Do Nothing” strategy is used because shortcut IDs are typically derived from the hash of their content; if the ID already exists, the content is identical, and no update is necessary.Located in cachegraphsshortcutstore.go, this implementation wraps a cache.Cache interface. It mirrors the logic of the SQL store—serializing data to JSON—but directs the output to a cache. This is the preferred implementation for local development environments.
The module utilizes a suite of subtests defined in graphsshortcuttest. This allows the SQL implementation to be verified against a real database instance (via sqltest.NewSpannerDBForTests) while ensuring it adheres to the standard behavior expected by the rest of the Perf application.
The following diagram illustrates how a shortcut moves from the application into persistent storage:
[ Perf UI / Logic ] [ graphsshortcutstore ] [ Storage (SQL or Cache) ] | | | | 1. Create GraphsShortcut | | |----------------------------->| | | | 2. Serialize to JSON | | |------------------+ | | | | | | |<-----------------+ | | | | | | 3. Store (ID, JSON) | | |---------------------------->| | 4. Return ID (Hash) | | |<-----------------------------| | | | | | 5. Request ID (Permalink) | | |----------------------------->| | | | 6. Fetch JSON by ID | | |<----------------------------| | | | | | 7. Deserialize JSON | | |------------------+ | | | | | | 5. Return Shortcut Object |<-----------------+ | |<-----------------------------| |
The schema module defines the structural contract for persisting shortcut data within the Graphs Shortcut Store. Its primary purpose is to provide a unified Go representation of the underlying SQL table structure used to store and retrieve serialized graph configurations.
In the context of the Perf system, a “shortcut” is a durable reference to a specific state or collection of graphs. This module ensures that both the application code and the database schema remain synchronized regarding how these shortcuts are identified and stored.
The schema is designed around a simple key-value paradigm optimized for content-addressable storage or unique identifier lookups.
ID field serves as the unique handle for a set of graphs. By using a TEXT type with a UNIQUE NOT NULL PRIMARY KEY constraint, the system enforces data integrity at the database level, preventing collision and ensuring that every shortcut can be retrieved with $O(1)$ complexity via its primary key.Graphs field is defined as a TEXT type rather than a structured relational set of tables. This implementation choice favors flexibility and performance for the following reasons:The core structure GraphsShortcutSchema utilizes struct tags (sql:"...") to bridge the gap between Go types and the SQL dialect used by the persistence layer.
ID: Acts as the immutable identifier for the shortcut. In practice, this is often a hash of the content or a generated UUID, allowing the application to generate permalinks to specific graph views.Graphs: Contains the payload of the shortcut. This is the “source of truth” for the graph configurations, including parameters like trace filters, time ranges, or specific visualization settings.The following diagram illustrates how the schema acts as the intermediary between the application logic and the physical storage:
[ Application Logic ] [ schema.GraphsShortcutSchema ] [ SQL Database ] | | | | 1. Construct Schema Object | | |--------------------------------->| | | (Set ID and Serialized JSON) | | | | 2. Execute INSERT/SELECT | | |-------------------------------->| | | (Uses struct tags for SQL) | | | | | 3. Receive Hydrated Object | <-------------------------------| |<---------------------------------| | | (Deserialize Graphs field) | |
The graphsshortcuttest module provides a standardized test suite for validating implementations of the graphsshortcut.Store interface. By centralizing these tests, the system ensures that different storage backends (e.g., SQL-based, In-memory) behave consistently regarding data persistence, normalization, and error handling.
The module is designed around the concept of “contract testing.” Instead of each implementation of a Store writing its own basic functional tests, they import and run the SubTests defined here. This approach ensures:
Store acts as a canonicalization layer.Store instance to this suite.SubTests)The primary entry point is the SubTests map. It maps descriptive test names to SubTestFunction signatures. This structure allows implementation-specific tests to iterate over the map and run each test as a subtest:
For each name, func in SubTests: t.Run(name, func(t, myStoreInstance))
InsertGet)The InsertGet function validates the primary lifecycle of a shortcut. A key implementation detail tested here is query normalization. When a GraphsShortcut is provided with queries in an arbitrary order, the Store is expected to return them in a sorted, deterministic state. This is crucial for deduplication and predictable UI rendering.
Input Shortcut Storage Backend Output Shortcut +--------------+ +--------------+ +--------------+ | Queries: | | | | Queries: | | - arch=x86 | ----> | Persist & | -----> | - arch=arm | | - arch=arm | | Normalize | | - arch=x86 | +--------------+ +--------------+ +--------------+
GetNonExistent)This ensures that the Store implementation correctly propagates errors when a requested ID does not exist, rather than returning an empty or partially initialized object.
SubTestFunction type, which abstracts the testing.T and graphsshortcut.Store dependency, allowing the tests to be decoupled from the actual storage driver.testify/assert and testify/require to enforce strict equality between what is sent to the store and what is retrieved, ensuring that no fields (like Keys or the list of Graphs) are dropped or corrupted during the serialization/deserialization process.The /go/graphsshortcut/mocks module provides automated mock implementations of the graphsshortcut.Store interface. These mocks are designed to facilitate unit testing of components that depend on graph shortcut persistence without requiring a live database or storage backend.
The module utilizes testify/mock to provide a flexible, programmable implementation of the storage layer. This approach allows developers to:
The code is autogenerated using mockery, ensuring that the mock implementation remains strictly synchronized with the graphsshortcut.Store interface definition. This eliminates the maintenance overhead of manually updating test doubles when the primary interface changes.
This file defines the Store struct, which embeds mock.Mock. It provides mock implementations for the primary persistence operations:
GetShortcut(ctx, id): Simulates retrieving a serialized graph configuration. It allows tests to return a specific graphsshortcut.GraphsShortcut object or an error based on the provided ID.InsertShortcut(ctx, shortcut): Simulates the creation of a new shortcut. In a test environment, this is typically used to return a pre-defined ID string, allowing the caller to proceed as if a database write succeeded.The NewStore function is the entry point for utilizing these mocks. It integrates directly with the Go testing lifecycle by registering a cleanup function that automatically asserts expectations.
+-------------------+ +-----------------------+ | Unit Test | | Mocks.Store (Mock) | +---------+---------+ +-----------+-----------+ | | | 1. NewStore(t) | +-------------------------------->| | | | 2. On("GetShortcut").Return(...)| +-------------------------------->| | | | 3. Invoke System Under Test | +-------------------------------->| | | | 4. AssertExpectations (Auto) | |<--------------------------------+
By using NewStore(t), the mock is bound to the test's lifespan. If the code under test fails to call a method that was “set up” or calls it with the wrong arguments, the test will fail during the Cleanup phase.
The go/ingest module serves as the primary entry point and configuration layer for the Skia Perf ingestion system. It provides the high-level logic to instantiate and connect the various sub-modules—filtering, formatting, parsing, and processing—into a cohesive pipeline that transforms raw benchmark files into indexed, searchable performance traces.
The module's main responsibility is to bridge the gap between human-readable configuration (usually provided via JSON or command-line flags) and the specialized internal engines that handle data. It defines the Config structure, which acts as the blueprint for an ingestion instance, specifying everything from where data is sourced (e.g., Google Cloud Storage) to how it should be validated and where it should be stored.
The design is heavily configuration-driven, centered around the Config struct. This allows a single binary to support vastly different ingestion workflows (e.g., internal Skia benchmarks vs. external Chrome performance tests) simply by changing the configuration file. This decoupling ensures that the core logic in process or parser remains agnostic of the specific environment.
Because ingestion is the “front door” of the Perf system, the module is designed for high reliability:
NewConfig and subsequent component initializers validate inputs (like regex patterns or database connection strings) early. This “fail-fast” approach prevents the system from starting in a broken state.metrics2 across all sub-components, providing real-time visibility into ingestion rates, error frequencies, and latency.The module doesn't just pass data; it manages the lifecycle of dependencies. For example, it coordinates the setup of the git.Git connector used by the process module to resolve hashes, ensuring that the local git cache is initialized before workers start processing files.
config.go)This is the central definition of an ingestion instance. It categorizes configuration into several key areas:
SourceConfig (GCS bucket, file prefixes) and PubSubConfig for event-driven ingestion.Filter (which files to ignore) and the Parser (which branches to accept).TraceStore (where data goes) and the Git repository (how data is mapped to commits).The following diagram illustrates how the ingest module assembles the sub-modules into a functional pipeline:
[ Config File / JSON ] | v +------------------------+ | Ingest Module | | (Initialization) | +-----------+------------+ | +--> [ Filter ] (Rules for file selection) | +--> [ Parser ] (Rules for data transformation) | +--> [ Git ] (Connector for commit mapping) | v +------------------------+ | Process Module | | (The Execution) | +-----------+------------+ | +----[ Workers ]<----( Source: GCS/PubSub ) | | | +--> [ Parse & Map ] | | | +--> [ Write to TraceStore ] | | | +--> [ Notify Downstream ] v [ Persistent Storage ]
filter: Invoked early in the process to discard irrelevant files before they consume CPU time in the parser.format: Provides the structural definitions and JSON schemas that the parser uses to validate incoming blobs.parser: Utilized by the workers in the process module to turn raw bytes into standardized trace IDs and values.process: The active engine started by the ingest module to manage the actual flow of data and interaction with databases.The go/ingest/filter module provides a mechanism for determining whether a file should be processed or ignored during ingestion based on its name. This is a critical component for performance monitoring pipelines that ingest data from large-scale storage (like Google Cloud Storage), where filtering out irrelevant files or transaction logs early prevents unnecessary resource consumption and reduces processing noise.
The filtering logic is built around two optional regular expressions: accept and reject. The design follows a “deny-by-default” approach when an accept pattern is provided, and an “allow-by-default” approach otherwise, provided the file doesn't match a reject pattern.
The evaluation logic follows these rules:
accept regex is defined, the filename must match it. Failure to match results in immediate rejection.reject regex is defined, the filename must not match it. A match results in immediate rejection.The Filter struct caches the compiled *regexp.Regexp objects to ensure that performance is optimized for high-volume ingestion where thousands of filenames may be evaluated against the same ruleset.
The following diagram illustrates the decision flow when Filter.Reject(name) is called:
[ Input Filename ] | v +-------------------+ | Is Accept Regex |---- No ----+ | Defined? | | +---------+---------+ | | Yes | v | +---------+---------+ | | Does it Match? |---- Yes ---+ +---------+---------+ | | No | v | [ REJECTED (true) ] | v +-------------------+ | Is Reject Regex |---- No ----+ | Defined? | | +---------+---------+ | | Yes | v | +---------+---------+ | | Does it Match? |---- No ----+ +---------+---------+ | | Yes | v v [ REJECTED (true) ] [ ACCEPTED (false) ]
This file contains the core Filter implementation.
New(accept, reject string): Validates and compiles the provided regex strings. It returns an error if the regex syntax is invalid, ensuring that ingestion processes fail fast during configuration rather than during runtime execution.Reject(name string) bool: The primary interface for the module. It returns true if the file should be discarded and false if it should be processed. By returning true for a “reject” action, it allows callers to write clean guard clauses like if filter.Reject(name) { continue }.The go/ingest/format module defines the data structures and validation logic for performance data files ingested into the Perf system. It serves as the formal specification for how external processes and test runners should format their results to be correctly indexed and visualized.
The primary goal of this module is to provide a flexible yet strictly validated schema that maps raw performance measurements to Trace IDs. A Trace ID is a comma-separated string of key-value pairs (e.g., ,arch=x86,config=8888,test=draw_circle,units=ms,stat=min,) used by Perf to identify a unique time series of data points.
The module supports two formats:
nanobench (Skia's internal microbenchmarking tool) which relies on nested maps.The design favors a flat key-value structure for identification. The Format struct and its sub-components (Result, SingleMeasurement) are structured so that keys defined at the top level (global to the file) are merged with keys defined at the result level and specific measurement level. This allows for efficient data representation where common metadata (like git_hash or arch) is defined once, while specific metrics (like min, max, median) are defined locally.
A single test run often produces multiple related values (e.g., different statistical aggregations of the same test). The Result struct allows a single entry to contain multiple measurements. This avoids duplicating the entire metadata block for every statistical variation of a single test, reducing file size and improving readability.
To prevent “schema drift” between the Go implementation and external data producers, the module uses an embedded JSON Schema (formatSchema.json). This schema is programmatically generated from the Go structs (via the generate submodule). This ensures that validation logic used during ingestion is identical to the documentation provided to contributors.
format.go)This is the primary entry point for modern ingestion. It defines the Format struct, which includes:
GitHash, Issue (CL), and Patchset to associate results with specific code versions.Key map containing parameters that apply to every measurement in the file (e.g., hardware configuration).Result objects. Each result can either be a simple Measurement (float32) or a complex Measurements map.The following diagram illustrates how keys are aggregated from different levels of the Format struct to form a final Trace ID:
File Level:
{"key": {"arch": "x86", "config": "8888"}}
|
v
Result Level:
{"key": {"test": "draw_circle", "units": "ms"}}
|
v
Measurement Level:
{"measurements": {"stat": [{"value": "min", "measurement": 1.2}]}}
|
+---------------------------------------+
| Resulting Trace ID: |
| ,arch=x86,config=8888,stat=min,test=draw_circle,units=ms, |
+---------------------------------------+
The module provides robust utilities to ensure data integrity:
Parse: Decodes JSON into the Format struct and enforces version checking.Validate: Performs a two-pass check. First, it ensures the JSON is syntactically valid and matches the internal Go types. Second, it validates the blob against the embedded JSON Schema to catch logic errors (e.g., missing required fields like git_hash).GetLinksForMeasurement: A helper function that resolves all URLs associated with a specific trace. It merges global links (file-level) with measurement-specific links, allowing users to jump from a specific data point in the Perf UI to external logs or artifacts.leagacyformat.go)This file maintains compatibility with older Skia tooling. It defines the BenchData struct, which uses a more nested structure: Results -> [Config/State] -> [TestName] -> [Metric]. Unlike the standard format, which is versioned and schema-validated, the legacy format is handled as a map[string]interface{} to accommodate the highly dynamic nature of older benchmark outputs.
formatSchema.json)The schema file is embedded into the Go binary using //go:embed. This allows the Validate function to perform schema validation without requiring external file dependencies at runtime. It defines the strict requirements for the Version 1 format, such as mandatory fields and allowed data types for measurements.
The generate module is a utility designed to maintain consistency between the Go-based implementation of the Perf ingestion format and its external documentation. Its primary responsibility is to act as a code-to-schema compiler that ensures the structural definition of data ingested into Perf remains synchronized across all tools and external data producers.
The core design decision behind this module is to treat the Go source code as the single source of truth for the ingestion protocol. In a complex data pipeline, the format used to describe performance results can evolve. Manually maintaining a separate JSON Schema file is error-prone and leads to “schema drift,” where the documentation or validation rules fail to match the actual parsing logic in the Go codebase.
By programmatically generating the schema from the format.Format struct, the system guarantees that:
The module leverages the go.skia.org/infra/go/jsonschema package to perform structural reflection on the format.Format struct.
json tags of the target struct.formatSchema.json file located in the parent directory. This output is used by other parts of the system to validate incoming JSON blobs before they are processed by the ingestion engine.The following diagram illustrates how this module fits into the development workflow:
+--------------------------+
| perf/go/ingest/format/ |
| (Go Structs) |
+------------+-------------+
|
| Source of Truth
v
+------------+-------------+
| /format/generate/main.go | <-- This Module
+------------+-------------+
|
| Reflection & Generation
v
+------------+-------------+
| formatSchema.json |
| (Machine-readable Spec) |
+------------+-------------+
|
+----------------------------> External Data Producers
|
+----------------------------> Validation Middlewares
main.go): The entry point that executes the generation logic. It specifically references the format.Format struct and directs the output to a static file path relative to the module. Its simplicity is intentional, as it acts strictly as a bridge between the internal type system and the file system.The parser module provides the logic for transforming raw performance data files into a standardized format suitable for storage in a trace-based database. It serves as the translation layer between external benchmarking tools—which may produce data in various JSON schemas—and the internal Perf system.
The module is designed to handle both a “Legacy” format and a modern “Version 1” format, ensuring backward compatibility while supporting newer features like explicit commit positions and complex measurement maps.
The parser's implementation is guided by the need for data integrity and system stability in a high-volume ingestion pipeline:
Parser attempts to decode files using the Version 1 schema first. If that fails, it falls back to the Legacy parser. This allows a single ingestion pipeline to handle a heterogeneous mix of data sources., or =) that would break the internal string-based representation of traces. It uses configurable regular expressions and a “force valid” approach to replace illegal characters, preventing database corruption or query errors.GL_ (internal OpenGL constants) in legacy files, as these are considered too verbose for high-level performance tracking.parser.go)The central coordinator of the module. It maintains the state necessary for ingestion, including:
invalidParamCharRegex to sanitize incoming metadata.ErrFileShouldBeSkipped.metrics2 to track successful parses, failures, and files with no data, providing operational visibility into the ingestion pipeline.Handles the modern schema which supports:
CP:nnnnnn prefix in the git_hash field to treat Git hashes as sequential commit positions.Measurements map, which allows a single result entry to contain multiple named metrics (e.g., min_ms, max_rss) without duplicating the common metadata.Maintains compatibility with older benchmarking outputs. Its primary task is flattening deeply nested JSON structures (Test Name -> Config -> Results) into a flat list of parameters and float values. It also handles the extraction of “Samples” (multiple runs of the same test) which are specifically aggregated for the min_ms sub-result.
Functions like buildInitialParams and getParamsAndValuesFromVersion1Format are responsible for merging “Global” keys (describing the machine/environment) with “Local” keys (describing the specific test run). This creates the unique identity for every performance trace.
The following diagram illustrates how the Parse method processes a file from raw input to standardized trace data:
Input: file.File (Name, Contents) | V [ Read all contents into memory ] (Allows multiple passes for format detection) | V [ Try Version 1 Extraction ] ---- Success? ----> [ Sanitize Keys ] | | Fail? | | | [ Try Legacy Extraction ] ------- Success? ----> [ Sanitize Keys ] | | Fail? | | | [ Return Error ] [ Filter by Branch ] | [ Skip if excluded? ] | V Standardized Output: - []paramtools.Params (Trace IDs) - []float32 (Values) - Hash (Commit ID)
parser.go: Contains the primary Parser implementation and logic for both schema versions.parser_test.go: Defines the behavioral contract of the parser using a wide array of test fixtures to ensure stability across edge cases like malformed JSON or special character collisions.testdata/: An authoritative collection of JSON fixtures representing different data scenarios (success, failure, different schemas) used to validate the parser's logic.The /go/ingest/parser/testdata module serves as the authoritative collection of test fixtures for the performance data ingestion system. Its primary role is to define the operational boundaries of the ingestion logic, ensuring that the system remains resilient across schema evolutions, handles data corruption gracefully, and correctly identifies performance metrics from various benchmarking sources.
The directory is structured to separate data by schema version (legacy and version_1). This separation reflects a fundamental design choice to maintain strict backward compatibility while allowing for the evolution of the ingestion format.
The fixtures are designed around three core testing principles:
,, =) that might otherwise collide with the internal delimiters used by the time-series database.GL_ prefixes or experimental branch data) through negative test cases, ensuring that only relevant metrics enter the long-term storage./legacy)The files within this component represent a historical, more permissive JSON schema. The parser‘s responsibility here is heavily focused on traversal and filtering. Because the legacy format lacks strict enforcement, the test data validates the parser’s ability to:
results -> configuration -> metrics.unknown_branch.json to verify that data from specific development paths is discarded./version_1)The Version 1 fixtures represent the modern, more structured ingestion format. The focus shifts from filtering noise to identifying metadata and handling special cases. Key responsibilities demonstrated here include:
CP: prefix in git_hash fields to distinguish between traditional Git SHAs and sequential commit numbers, as seen in with_commit_number.json.with_comma_in_param.json).The following diagram illustrates how the ingestion logic uses these fixtures to transform raw JSON input into a standardized record:
Raw JSON File (Test Data) | V [ Format Detection ] -----------------------+ | | +--> [ Legacy Parser ] +--> [ Version 1 Parser ] | (Filters GL_ prefixes, | (Handles CP: prefixes, | Maps nested metrics) | Resolves Identity Keys) | | V V [ Key Normalization ] <---------------------+ | |-- Check for delimiter collisions (",", "=") |-- Merge global and local key blocks | V [ Value Extraction ] | |-- Convert numeric strings to float64 |-- Validate sample arrays (ignore non-numeric) | V Standardized Ingestion Record (Used for Database Write)
These files are not merely static examples; they are the inputs for the parser_test.go suite. The system compares the output of the parser against “golden” expectations derived from these files.
success.json) ensure that valid data is correctly parsed into the internal data model.invalid.json, invalid_commit_number.json) ensure that the parser returns explicit errors or handles exceptions without crashing, which is critical for a high-volume ingestion pipeline where malformed data is a common occurrence.The /go/ingest/parser/testdata/legacy directory serves as a comprehensive suite of test fixtures designed to validate the ingestion and parsing logic for legacy performance result formats. These files are used to ensure that the parser correctly handles various edge cases, data structures, and validation rules inherent in the older JSON schema used by benchmarking systems.
The primary goal of these data files is to define the expected boundaries of the legacy ingestion system. Because legacy formats often lack strict schema enforcement, these files document through example how the parser should interpret nested objects, handle missing keys, and filter out noise.
Key design considerations reflected in these files include:
results -> test_name -> configuration -> metrics).one_measurement.json and zero_measurement.json verify the parser's ability to extract single data points and handle boundary values like zero, which might otherwise be misinterpreted as missing data.samples_success.json demonstrates how the system handles arrays of raw performance measurements (samples) alongside aggregated statistics like min_ms. It also tests the parser's ability to ignore non-numeric values within these arrays.success.json showcase complex results containing a mix of key (identifying the environment), options (contextual metadata), and meta (test-specific metrics like max_rss_mb).Several files contain keys specifically named SHOULD_NOT_APPEAR_IN_RESULTS... (e.g., in samples_success.json and unknown_branch.json). These are used to test the parser's filtering logic:
GL_.unknown_branch.json tests the ingestion engine's ability to ignore data from specific development branches (e.g., “ignoreme”).invalid.json provides a baseline for how the parser handles non-JSON content.no_results.json and samples_no_results.json ensure the system doesn't fail when valid metadata is present but no actual performance metrics are included.The following diagram illustrates how the ingestion logic typically processes a file from this test data set:
JSON Input File | V [ Schema Validation ] ----> If Invalid (invalid.json) -> Reject | V [ Extract Global Keys ] --> gitHash, issue, patchset, system | V [ Iterate Results ] | +-- [ Filter Keys ] --> Ignore "GL_" prefixes or "ignoreme" branches | +-- [ Parse Metrics ] | | | +-- Numeric values (min_ms, samples) -> Store | +-- Non-numeric/Strings in Metrics -> Discard | +-- [ Map Metadata ] -> Map 'options' and 'meta' to result tags | V Processed Ingestion Record
success.json / samples_success.json: The gold standard for valid legacy data, covering diverse configurations (8888, 565, gpu) and metric types.invalid.json: Tests resilience against syntax errors.no_results.json / samples_no_results.json: Tests handling of empty result sets with valid headers.one_measurement.json / zero_measurement.json: Tests specific numerical edge cases.unknown_branch.json: Tests environmental filtering logic.This directory serves as a comprehensive suite of test cases for the “Version 1” ingestion format. It contains JSON files designed to validate the robustness, edge-case handling, and schema compliance of the ingestion parser.
The primary goal of these data samples is to define the boundaries of what the parser should accept, reject, or transform. The design of these files reflects real-world ingestion scenarios where data might be messy, incomplete, or formatted using specific conventions (such as commit position markers).
By providing these samples, the module ensures that the parser can:
The test data can be categorized into three functional groups:
These files demonstrate the flexibility of the Version 1 schema:
success.json, one_measurement.json): Demonstrate the standard structure. Results can contain a top-level measurement or a nested measurements map containing arrays of values (e.g., different configs like 8888 or gpu).with_commit_number.json): Shows the use of the CP: prefix in the git_hash field to represent a “Commit Position” rather than a standard Git SHA.zero_measurement.json) or entirely empty results lists (no_results.json), which the parser must handle without crashing.Ingested data often contains characters that could conflict with internal storage formats (like key-value pair delimiters).
with_special_chars.json): Tests a wide range of symbols (e.g., !~@#$%^&*()) within both keys and values.with_comma_in_param.json, with_equal_in_param.json): Specifically tests strings containing , and =, which are often used as separators in time-series databases. These files verify that the parser correctly escapes or encapsulates these values.These files define the failure modes of the parser:
invalid.json): Plain text that is not valid JSON.invalid_commit_number.json): Identifies cases where fields like git_hash contain malformed prefixes (e.g., CP:727A901 where the number format might be incorrect).The following diagram illustrates how the parser uses these files to determine the final identity of a performance measurement:
JSON Input File | V +-----------------+ +-----------------------+ | Global Keys |----->| Common Metadata | | (arch, os, etc) | | (applied to all) | +-----------------+ +-----------+-----------+ | V +-----------------+ +-----------------------+ +--------------------+ | Result Keys |----->| Unique Series ID |----->| Final Ingested | | (test, config) | | (Global + Result Keys)| | Data Point | +-----------------+ +-----------------------+ +--------------------+ | +------------------------------+ | V +-----------------+ +-----------------------+ | Measurements |----->| Value (float64) | | (single or map) | | | +-----------------+ +-----------------------+
git_hash / version: Every valid file includes these to identify the schema version and the point in time the data represents.key block: Found at both the root level (global params) and within individual results (test-specific params). The parser must merge these to create a full set of dimensions for the data.links: Demonstrated in with_commit_number.json, showing how external references (like documentation or build logs) are attached to the ingestion record.The process module is the core execution engine of the Skia Perf ingestion pipeline. It coordinates the lifecycle of performance data from its raw state in a source (like Google Cloud Storage or a local directory) to its indexed state within a TraceStore.
The module provides a multi-threaded ingestion worker system that handles parsing, commit mapping, data normalization, and storage. It is designed to be resilient, utilizing retries for database operations and supporting Google Cloud Pub/Sub for both input signaling and downstream event notifications.
The entry point is the Start function, which initializes the necessary infrastructure components—source monitors, trace stores, metadata stores, and git connectors—and launches a configurable number of parallel worker goroutines.
The system operates using a producer-consumer model:
file.Source (e.g., GCS bucket listener) produces file.File objects onto a channel.parser.Parser to handle the transformation of raw bytes into structured performance data.Ingested files typically contain a Git hash. The process module is responsible for resolving this hash into a monotonic types.CommitNumber.
git.Git interface to look up commit numbers.Once parsed and mapped to a commit, the data is prepared for the tracestore.TraceStore:
ParamSet, which serves as the index for searching traces.MetadataStore.After a successful write, the module can notify other services via a Pub/Sub topic defined in FileIngestionTopicName.
To optimize downstream processing (like clustering), the module includes logic to prune redundant trace IDs. Many performance tests report multiple statistics for the same logical test (e.g., test_name, test_name_avg, test_name_min).
The getTraceIdsForClustering function implements a “canonicalization” check:
_avg, _min, _max, or _count, the system looks for a “canonical” version of that same trace (the name without the suffix and a matching stat key).[ Source (GCS/Dir) ] | v [ file.File Channel ] | +----[ Worker 1 ]----[ Parser ]----> (Parsed Data) | | | +---------[ Git ]-------> (Commit Number) | | | +---------[ TraceStore ]-> [ Persistent Storage ] | | | +---------[ Pub/Sub ]----> [ Ingestion Events Topic ] | +----[ Worker 2 ]---- ... | +----[ Worker N ]---- ...
The module supports “Dead Letter” semantics via Pub/Sub Nacks. If DeadLetterCollection is enabled in the configuration, processing failures will trigger a Nack(), allowing the message to be redelivered or moved to a dead-letter queue by the infrastructure. If disabled, or if the error is unrecoverable (like a bad git hash), the message is Ack()-ed to clear the pipeline.
A defaultDatabaseTimeout of 60 minutes is applied to file processing. This high threshold accounts for large files that might contain thousands of traces requiring significant indexing time in the database, while still providing a circuit breaker for stalled connections.
The number of parallel ingesters is configurable. This allows the system to scale horizontally based on the CPU/Memory available to the container and the IOPS capacity of the underlying TraceStore.
The ingestevents module provides a standardized data contract and serialization format for communication between Perf ingesters and regression detection components. Within the Perf architecture, ingesters process raw performance data files and store them in the database. Once a file is successfully processed, the system must trigger downstream tasks—specifically regression detection—to analyze the newly arrived data.
This module facilitates this “Event-Driven Alerting” by defining the IngestEvent structure and providing utilities to pass this data through Google Cloud PubSub efficiently.
A key challenge in event-driven architectures is balancing the richness of the event data against transport limits. PubSub has a maximum message size (10MB), and high-volume performance data can easily exceed this if not handled carefully.
To address this, the module implements a mandatory compression strategy:
IngestEvent payloads are Gzipped before being sent to PubSub and must be decompressed upon receipt. This ensures that even files containing thousands of TraceIDs or complex ParamSets remain well below the transport limits.The IngestEvent struct is the core data transfer object. It contains three primary pieces of information that allow downstream clusterers to perform regression detection without re-querying the database for metadata:
TraceIDs: A slice of unencoded trace identifiers found in the ingested file. This tells downstream consumers exactly which series of data points have been updated.ParamSet: A summary of the parameters (key-value pairs) associated with the TraceIDs. This provides immediate context about the hardware, benchmarks, and configurations affected by the new data.Filename: The source of the data, useful for auditing and tracking the ingestion pipeline.The module provides two primary functions to manage the lifecycle of an event:
CreatePubSubBody: Orchestrates the encoding process. It uses a bytes.Buffer coupled with a gzip.Writer to transform an IngestEvent into a compressed byte slice ready for PubSub publishing.DecodePubSubBody: The inverse operation. It handles the Gzip decompression and JSON decoding, returning a pointer to the original IngestEvent.The following diagram illustrates how this module fits into the broader Perf data pipeline:
[ Raw Data File ] | v [ Ingester Service ] ----> ( Writes to Database ) | | ( Creates IngestEvent ) v [ ingestevents.CreatePubSubBody ] | | ( Gzipped JSON ) v [ Google Cloud PubSub ] | v [ Clusterer / Detection Service ] | | ( Receives Message ) v [ ingestevents.DecodePubSubBody ] | | ( Result: TraceIDs, ParamSet ) v [ Regression Detection Logic ]
The implementation leverages go.skia.org/infra/go/util for safe Gzip stream handling and go.skia.org/infra/go/skerr for structured error wrapping. This ensures that failures during decompression or decoding (e.g., due to malformed PubSub messages) provide enough context to identify where the pipeline stalled.
The initdemo module provides a utility for bootstrapping a local development environment for Skia Perf. Its primary purpose is to automate the creation of the required database and the application of the current schema, ensuring developers have a consistent and functional backend for local testing.
This utility is designed for idempotency and speed in local development. Rather than relying on complex migration tools or manual database setup steps, initdemo uses the direct Spanner-compatible schema defined within the Perf codebase.
The choice to use a simple Go binary for this task reflects a preference for:
main.go)The core logic resides in main.go. It performs two distinct phases of setup:
demo). It gracefully handles cases where the database already exists, allowing the tool to be run repeatedly without side effects.perf/go/sql/spanner package. This creates all necessary tables, indices, and constraints required for Skia Perf to function.The module depends on go.skia.org/infra/perf/go/sql/spanner. This dependency is critical because it acts as the “Source of Truth” for the database structure. By importing spanner.Schema, initdemo guarantees that the local environment is always in sync with the production-ready schema definitions used by the main Perf application.
The following diagram illustrates the sequence of operations performed by the utility:
[ Developer ] | V +--------------+ | initdemo run | +--------------+ | | 1. Connects to local DB instance (e.g., CockroachDB/Spanner Emulator) | V +-----------------------+ | CREATE DATABASE demo; |----( If exists, log and continue ) +-----------------------+ | | 2. Fetch Schema from /perf/go/sql/spanner | V +-----------------------+ | Apply SQL Statements |----( Create Tables, Indices, etc. ) +-----------------------+ | V +-----------------------+ | Success / Exit | +-----------------------+
The module supports customization through command-line flags, primarily allowing the user to point the utility at a specific database instance or rename the target database.
database_url: Defines the connection string. Although the tool is used for Spanner-compatible schemas, it utilizes the pgx library, reflecting the common local development pattern of using CockroachDB or similar PostgreSQL-wire-compatible emulators.databasename: Allows the user to override the default “demo” name.The issuetracker module provides a high-level abstraction for interacting with the Google Issue Tracker (Buganizer) API, specifically tailored for the Skia Perf ecosystem. Its primary goal is to automate the lifecycle of performance regressions—from initial detection to filing bugs and updating them with diagnostic data.
By wrapping the lower-level issuetracker/v1 API, this module handles the complexities of authentication, data formatting (Markdown), and the mapping of Perf-specific entities (anomalies, regressions, and subscriptions) into actionable bug reports.
Unlike a simple API client, FileBug relies heavily on internal state from the regression.Store. When a bug is filed, the module does not simply trust the parameters passed from the frontend; instead, it queries the database to:
Subscription linked to the regression.The implementation includes a “test run” mechanism (checkTestRun). If a regression is not linked to a specific internal testing email (e.g., sergeirudenkov@google.com), the module defaults the bug status to NEW, clears the assignee, and removes CCs. This prevents automated systems from accidentally spamming production engineering teams during development or misconfiguration.
The module handles the “Long URL” problem inherent in web-based analysis tools. Performance reports often involve hundreds of individual regression keys. To prevent breaking Issue Tracker or browser limits, the module calculates the length of the generated graph URL. If it exceeds a safe threshold (~2000 characters), it swaps the direct link for a “Link by Bug ID” (e.g., /u?bugID=12345), which leverages Perf's ability to look up regressions associated with a specific tracker ID.
The module supports two distinct operating modes:
go/secret to fetch API keys from GCP Secret Manager and initializes an OAuth2 authorized client.devMode is active, it redirects traffic to a local mockhost (port 8081) and bypasses authentication, allowing for end-to-end UI testing without real API credentials.issuetracker.go)This is the primary entry point. It defines the contract for filing bugs, adding comments, and querying issues. The implementation (issueTrackerImpl) coordinates with several sub-systems:
regression.Store: Provides the underlying data for detected performance shifts.userissue.Store: Tracks issues manually filed by users to prevent duplicates and maintain a history of user-driven triage.anomalygroup/service: Used to rank and select the “Top Anomalies” to include in the bug's summary, ensuring the most impactful data is presented first.The module dynamically constructs Markdown descriptions. The workflow for generating a bug body follows this logic:
[ Regression IDs ] --> [ Fetch Subscription Details ] --> [ Determine P/S Level ]
|
v
[ Fetch Regression Data ] --> [ Aggregate Bot Names ]
|
v
[ Rank Anomalies ] ---------> [ Format Top 10 List ]
|
v
[ Final Markdown ] <--------- [ Construct Graph Links ]
FileUserIssue)While FileBug is often automated, FileUserIssue handles cases where a user manually identifies a regression on a specific trace. This workflow is simpler but critical for manual triage; it creates a bug with a standardized title containing the Trace ID and Commit Position, then persists this relationship in the userissue.Store.
When multiple regressions are grouped into a single bug filing request, the module must decide which metadata to use. It iterates through all associated subscriptions and selects the one with the highest priority (lowest numerical value) and highest severity.
Regressions: [R1, R2, R3] | +--> Sub A (P2, S2) +--> Sub B (P1, S3) | [ Selection: Sub B ] (Priority P1 wins over P2)
Once a bug is created, the module immediately posts a follow-up comment. This comment contains a specialized Perf URL that uses the newly created IssueId as a query parameter. This ensures that anyone viewing the bug can immediately jump back into the Perf UI to see the live, filtered graph of the relevant regressions.
The mockhost module provides a lightweight, standalone HTTP server that emulates a subset of the Issue Tracker API. Its primary purpose is to facilitate local development and testing of services that interact with the issue tracking system, allowing developers to verify request/response handling without requiring access to a live production API or complex authentication setups.
The module is designed for simplicity and predictability. It implements a RESTful interface using the chi router, mimicking the endpoint structure expected by clients of the issuetracker/v1 library.
Instead of maintaining a complex state or an in-memory database, the mock host uses a “static response” strategy. It accepts valid API requests, logs the incoming parameters for visibility during debugging, and returns pre-defined JSON payloads that conform to the issuetracker data structures. This approach ensures that the mock remains low-maintenance and deterministic.
The server handles three primary operations, mapping HTTP methods to specific Issue Tracker behaviors:
[ Client ] [ mockhost (:8081) ]
| |
| GET /v1/issues |
|-------------------->| Log query -> Return static issue list
| |
| POST /v1/issues |
|-------------------->| Decode body -> Return new issue with ID 98765
| |
| POST /v1/issues/{id}/comments
|-------------------->| Parse {id} -> Return comment confirmation
main.go)The main.go file acts as the central coordinator. It initializes a chi router and maps specific URL patterns to handler functions. The server listens on port :8081 by default. Its main responsibility is routing and ensuring the HTTP server's lifecycle is managed.
main.go)The handlers encapsulate the logic for simulating the Issue Tracker API:
listIssuesHandler: Simulates searching for issues. It extracts the query parameter from the URL to log what the client is searching for, then returns a ListIssuesResponse containing a single mock issue (ID 12345). This allows clients to test list-parsing logic.fileBugHandler: Simulates the creation of a new bug. It decodes the incoming Issue object from the request body to reflect the submitted title back to the client, while assigning a static mock ID (98765) to simulate the backend's ID generation.createCommentHandler: Simulates adding a comment to an existing issue. It validates that the issueId in the URL is a valid integer and echoes the comment text back in the response. This is useful for verifying that clients are correctly targeting the right issue resources.The module relies on //go/issuetracker/v1:issuetracker for its data models. By using the same structures as the production client, the mock ensures that the JSON serialization and deserialization remain perfectly compatible with the real service.
Integration with go/sklog ensures that all interactions with the mock host are recorded to the console. This allows developers to inspect the payloads being sent by their services in real-time by simply watching the mockhost output.
The issuetracker/mocks module provides an automated mocking implementation of the IssueTracker interface. Its primary purpose is to facilitate unit testing for components within the Perf system that interact with external issue tracking services without requiring actual network calls or authentication against a real Issue Tracker API.
The module utilizes mockery to generate code based on the issuetracker.IssueTracker interface. This approach ensures that the mock stays in sync with the actual interface definition found in /perf/go/issuetracker. By using the testify/mock framework, it allows developers to:
FileBug or CreateComment) with the expected parameters.This file contains the IssueTracker struct, which embeds mock.Mock. It implements the standard operations required for Perf's integration with bug tracking:
FileBug, FileUserIssue): These methods simulate the creation of new issues. In a test environment, they allow the SUT to receive a mock Issue ID (int) to verify that the ID is correctly stored or referenced in the Perf database.CreateComment): Mocks the addition of comments to existing issues, used for updating status or providing additional data on detected regressions.ListIssues): Simulates querying the tracker for existing issues, returning a slice of v1.Issue objects. This is crucial for testing logic that prevents duplicate bug filing.The mock is designed to be instantiated within a test suite using NewIssueTracker(t). This constructor automatically registers cleanup functions to assert that all defined expectations were met before the test finishes.
+-----------+ +----------------------+ +-----------------+
| Test | | System Under Test | | Mock (this) |
| Routine | | (e.g., Alerter) | | IssueTracker |
+-----------+ +----------------------+ +-----------------+
| | |
|-- 1. Setup Expectations ->| |
| (On FileBug return 123) | |
| | |
|-- 2. Trigger Action ----->| |
| |-- 3. Call FileBug(ctx, req) ->|
| | |
| |<-- 4. Return (123, nil) ------|
| | |
|-- 5. Assert Result -------| |
| | |
|-- 6. Cleanup/Verify (Auto)|----------------------------->|
| (Was FileBug called?)
perf/go/issuetracker: Defines the request and response structures (e.g., FileBugRequest) that the mock must handle.go.skia.org/infra/go/issuetracker/v1: Provides the underlying data models for the issues themselves.github.com/stretchr/testify/mock: The engine driving the programmatic responses and assertions.The kmeans module provides a flexible, generic implementation of Lloyd's Algorithm for k-means clustering. Rather than being tied to a specific data format like 2D coordinates or high-dimensional vectors, it uses a set of interfaces that allow it to cluster any data type where a distance metric and a centroid calculation can be defined.
This is particularly useful in the context of performance monitoring (Perf), where clustering might be applied to different types of trace data or experimental results.
The implementation decouples the clustering logic from the mathematical specifics of the data. This is achieved through three primary abstractions:
Clusterable: An empty interface (interface{}) representing the data points (observations) to be clustered.Centroid: An interface representing the “center” of a cluster. It must provide a Distance method to calculate how far a Clusterable is from itself and an AsClusterable method to allow the centroid to be treated as a data point in results.CalculateCentroid: A function type responsible for generating a new Centroid from a slice of Clusterable observations. This encapsulates the logic of how to “average” a specific group of data points.By using interfaces, the module avoids hardcoding Euclidean distance or vector arithmetic. For example, if clustering time-series data, the Centroid implementation could use Dynamic Time Warping (DTW) for distance, while a categorical dataset might use Hamming distance. The core algorithm remains unchanged regardless of these implementation details.
The module executes the standard iterative k-means process. Each iteration (performed by the Do function) follows these steps:
Distance metric.CalculateCentroid function.Initial Centroids + Observations | v +-----------------------------+ | Do() Iteration Loop | <-----------+ | 1. Find closest centroid | | | 2. Group observations | | Repeat N times | 3. Calculate new centroids | | (iters) +-----------------------------+ | | | +--------------------------------+ | v Final Centroids + Grouped Clusters
The GetClusters function organizes the final output. It produces a two-dimensional slice where each inner slice represents a cluster. By convention, the first element of each inner slice is the Centroid itself (converted via AsClusterable), followed by all observations belonging to that cluster. This provides a clear, grouped view of the algorithm's output.
kmeans.goThis is the core of the module.
Do: Implements a single iteration of the algorithm. It is designed to be called repeatedly. Note that it returns a new slice of centroids and may return fewer than the input if clusters become empty.KMeans: A convenience wrapper that runs Do for a fixed number of iterations.TotalError: A utility to calculate the sum of distances from all observations to their respective centroids, providing a measure of how well the clusters fit the data.kmeans_test.goThe tests serve as the primary documentation for how to implement the required interfaces. They demonstrate a concrete 2D implementation (myObservation) where the same struct satisfies both Clusterable and Centroid interfaces, and a corresponding calculateCentroid function that computes the arithmetic mean of X and Y coordinates.
The maintenance module serves as the central orchestration point for all long-running background processes and administrative tasks within a Skia Perf instance. Instead of handling user requests, this module is responsible for database health, data synchronization, schema migrations, and cache warming.
In a distributed system like Skia Perf, various tasks must occur outside the critical path of the web UI or ingestion engine. The maintenance module consolidates these tasks into a single entry point. It manages the lifecycle of background goroutines that handle:
maintenance.go)The primary responsibility of this module is the Start function. It acts as a switchboard, using configuration flags (MaintenanceFlags) and instance settings to decide which background services to initialize.
Design decisions in this coordinator include:
Start function is designed to run indefinitely (ending in a select {}). This is intended for use in a dedicated “maintenance” microservice or container that runs alongside the main Perf application.gitRepoUpdatePeriod, deletionPeriod). By centralizing these, developers can easily reason about the total background load on the database.The module ensures the database environment is ready before starting other services. It utilizes expectedschema to validate and migrate the core schema. It also handles specialized migrations, such as moving regression data between table formats, which are executed in small, controlled batches (regressionMigrationBatchSize) to avoid locking the database or exhausting resources.
Through the deletion submodule, the maintenance process enforces a data retention policy. It targets “Shortcuts” (temporary trace groupings) and “Regressions” that have aged out (currently 18 months).
To prevent the first user of the day from experiencing slow queries, the maintenance module performs “cache warming.” It initializes a ParamSetRefresher which scans the TraceStore and populates Redis. This ensures that the available query parameters (keys and values) are always pre-calculated and ready for the UI.
When the maintenance service starts, it follows a specific sequence to ensure dependencies are met before background loops begin:
Start(ctx, flags, config) | |-- 1. Initialize Tracing (Observability) |-- 2. Connect to Database & Validate/Migrate Schema |-- 3. Initialize Git Provider & Start Polling | |-- 4. Launch Concurrent Goroutines (if enabled): | |--> [Migration] Periodic Regression Migration | |--> [Config] LUCI Config Import Routine | |--> [Cache] Redis ParamSet Refresh Routine | |--> [Deletion] Data Retention / TTL Cleanup | |-- 5. Block (select {})
frontend or ingest processes, heavy operations (like schema migration or massive deletions) do not steal CPU or IO cycles from user-facing requests.The deletion module provides a background maintenance service responsible for enforcing data retention policies within the Skia Perf system. It specifically targets the cleanup of aged regression data and their associated shortcuts to ensure the database remains performant and focused on relevant recent history.
In the Perf system, regressions (detections of performance changes) and shortcuts (references to specific sets of traces) accumulate over time. To maintain database health, this module implements a Time-To-Live (TTL) policy. Currently, the system is hardcoded to a 18-month retention period.
The module operates by periodically scanning the database for regressions older than this TTL, identifying the specific database keys (commit numbers and shortcut IDs), and removing them in atomic batches.
deleter.go)The Deleter is the central coordinator. It interacts with both the regression.Store and the shortcut.Store. Its primary responsibility is to bridge the two stores; since shortcuts are often referenced by regression entries, they should be cleaned up together to prevent orphaned data or broken references in the UI.
The deletion logic uses the timestamp of the “step point” (the point in time where a performance shift occurred) within a regression‘s cluster summary to determine eligibility. If the timestamp of a regression’s Low or High cluster is older than 18 months relative to the current time, it is marked for deletion.
Instead of a single massive delete operation—which could lock database tables and degrade performance—the module uses a “batching” approach.
shortcutBatchSize is met.The RunPeriodicDeletion method establishes a long-running goroutine. It uses a ticker to trigger DeleteOneBatch at a regular iterationPeriod. This allows the maintenance to run continuously in the background at a slow, steady pace, eventually catching up to the TTL window without causing spikes in database load.
The following diagram illustrates how the background process manages the steady cleanup of data:
RunPeriodicDeletion(period, batchSize) | | (Wait for 'period') |-----> DeleteOneBatch(batchSize) | |-- 1. Get Oldest Commit Number |-- 2. Scan Range [oldest, oldest + batchSize] |-- 3. Filter for regressions older than 18 months |-- 4. If Batch not full, extend range and repeat step 2 | |-- 5. Open Database Transaction |-- 6. Delete Regressions by Commit ID |-- 7. Delete Shortcuts by ID |-- 8. Commit Transaction | | (Wait for next 'period') |-----> ...
deleter.go: Contains the core logic for calculating the 18-month cutoff, scanning the regression store, and executing the transactional deletes.deleter_test.go: Provides integration tests using a test database (Spanner) to verify that only data older than the TTL is removed and that the batching logic correctly identifies eligible records.The notify module is a high-level orchestration layer responsible for transforming detected performance regressions into human-readable alerts and delivering them to various destinations. It decouples the “what” of a regression (statistical data and commit history) from the “how” (formatting and transport).
The notification system follows a pipeline where raw detection data is first gathered into a common metadata format, then passed to a provider to be formatted into a specific message (e.g., HTML or Markdown), and finally handed off to a transport layer for delivery (e.g., Email or Issue Tracker).
This modular design allows the Perf instance to support diverse workflows:
notify.go)The Notifier interface is the primary entry point. The defaultNotifier implementation manages the flow of data. When a regression is found (or goes missing), it:
RegressionMetadata object.tracestore and filesystem to find “source” links (e.g., links to the raw JSON or log files that produced the data point).NotificationDataProvider to turn metadata into a subject and body.Transport.The system distinguishes between how data is gathered and how it is styled:
NotificationDataProvider: Determines what fields are available for the message.android_notification_provider.go) adds specialized logic for Android-specific metadata, such as extracting Build IDs from commit subjects and formatting test class/method strings.Formatter: Handles the template rendering.html.go): Used primarily for rich emails.markdown.go): Used for issue trackers and includes custom template functions like buildIDFromSubject to parse specific URL structures.Transports are the final leg of the journey, abstracting the I/O required to reach the user:
email.go): Sends multi-part emails. It supports “threading references,” allowing “Regression Missing” notifications to appear as replies to the original “Regression Found” alert.issuetracker.go): Creates and updates bugs via the Google Issue Tracker (Buganizer) API. It automatically sets priorities, severities, and components based on the alert configuration.chromeperfnotifier.go): A specialized transport that doesn't send a message to a human, but instead reports the anomaly to the Chromeperf service for cross-platform tracking.noop.go): A null-object pattern implementation for environments where notifications should be suppressed.This workflow illustrates how a statistical anomaly becomes a developer-facing bug:
[ Detection Engine ] -> RegressionFound(commit, alert, cluster) | v [ defaultNotifier ] |-- getRegressionMetadata() --> Fetches Git hashes & source links |-- GetNotificationData() --> Executes Go Templates (HTML/Markdown) |-- SendNewRegression() --> Calls Transport (Email/API) v [ Transport Layer ] |-- IssueTracker: Creates Bug #1234 |-- Email: Sends message with Message-ID <abc@perf> v [ Persistence ] --> Notification ID (#1234 or <abc@perf>) is saved to track history
To avoid “alert fatigue” and keep histories clean, the system uses a threadingReference.
1. Initial Regression Found -> Transport returns "ID-123" 2. Performance recovers -> RegressionMissing(threadingReference="ID-123") 3. Transport uses ID-123 -> Adds a comment to Bug #123 OR Sends a Reply-To Email
The use of Go‘s text/template and html/template allows instance administrators to customize notification content without changing Go code. The config.NotifyConfig allows specifying custom body and subject templates in the instance’s JSON configuration.
Because different projects use different git mirrors (e.g., Gerrit, GitHub, internal Gitiles), the commitrange.go logic uses a configurable commitRangeURITemplate. This allows the notification to link to a side-by-side diff (using {begin} and {end} placeholders) rather than just a single commit landing page.
By defining common structures in the /common submodule, the system ensures that the detection logic remains pure and doesn't need to know if the final output is a Markdown table or an HTML list. This also simplifies testing, as mocks can return standard NotificationData regardless of the transport being tested.
The notify/common module defines the core data structures used across the Perf regression notification system. It acts as a bridge between the detection engine and the various notification delivery mechanisms (such as email, issue trackers, or chat platforms).
By centralizing these structures, the system ensures that different notification formatters have access to a consistent set of metadata regardless of the specific alert configuration or the final destination of the message.
This structure is the primary data container passed to notification formatters. It is designed to encapsulate the full context of a detected performance change, allowing for the generation of rich, actionable reports.
The inclusion of both the RegressionCommit and the PreviousCommit is critical for providing a “diff” view, enabling users to see exactly what changed in the codebase to cause the regression.
Key components of the metadata include:
InstanceUrl provides a direct path back to the Perf instance for deep-flow analysis.Cl (Cluster Summary) and Frame (UI Frame Response) contain the statistical backing for the regression, allowing notifications to include high-level summaries of the data points involved.TraceID and specific commit links. This allows notifications to pinpoint exact changes in high-cardinality data environments.While RegressionMetadata contains the raw information about a performance change, NotificationData represents the output of the formatting process. It separates the presentation layer from the delivery layer.
The common module facilitates the transition of data through the following conceptual pipeline:
[ Detection Engine ] | | Identifies anomaly and collects: | - Alert Config | - Commit Range | - Cluster Data v [ RegressionMetadata ] <--- (Defined in notify/common) | | Passed to a Formatter (e.g., HTML/Markdown) v [ NotificationData ] <--- (Defined in notify/common) | | Passed to a Transport (e.g., Email/Issue Tracker) v [ Final Recipient ]
This separation ensures that the logic for what a regression is (metadata) is kept distinct from how it is described to a human (notification data), allowing the system to easily support new notification channels by simply implementing new formatters that consume these common structures.
The go/notify/mocks module provides a suite of autogenerated mock implementations for the core interfaces used within the Perf notification system. These mocks are built using testify/mock and are designed to facilitate unit testing of components that handle regression alerts, data formatting, and message delivery without requiring live connections to external services (like email servers or issue trackers).
The notification system in Perf follows a decoupled architecture where data retrieval, message construction, and transport delivery are handled by distinct components. This mock package allows developers to:
Notifier implementation by mocking the Transport layer.The module mirrors the primary interfaces found in the parent notify package:
The Notifier mock simulates the high-level orchestration of notifications. It is responsible for deciding what content should be sent based on regression events.
RegressionFound and RegressionMissing, which typically involve complex arguments such as ClusterSummary, FrameResponse, and Commit data. This allows tests to verify that the notification system receives the correct metadata when a performance anomaly is detected.The Transport mock represents the delivery mechanism (e.g., Email, Monorail/Issue Tracker).
body, subject, and threadingReference (used for message chaining/threading) to ensure the outgoing message is formatted correctly.This mock handles the assembly of the data payload required for a notification.
NotificationData based on RegressionMetadata. This is crucial for testing how different regression scenarios (found vs. missing) are transformed into user-facing information.The following diagram illustrates how these mocks are typically used in a unit test for a component that manages regression life cycles:
[ Test Suite ] | | 1. Setup expectations on Notifier Mock v [ System Under Test (e.g., Regression Detector) ] | | 2. Detects anomaly -> Calls RegressionFound() v [ Notifier Mock ] | | 3. Returns a canned "NotificationID" v [ Test Suite ] | | 4. Assert that Notifier was called with | the expected Commit and Alert objects.
All mocks in this package include a New[InterfaceName] helper function. These helpers automatically register a cleanup function with the *testing.T instance, ensuring that AssertExpectations is called at the end of the test to verify that all defined mock calls were actually executed.
The notifytypes module serves as the central source of truth for defining how the Perf system communicates regression alerts and performance data to external consumers. Rather than scattering string constants or logic throughout the codebase, this module provides a typed schema that dictates both the medium of notification (the “how”) and the context of the data being sent (the “what”).
The module is built around two primary type definitions that decouple the notification logic from the underlying alert detection systems.
Type)The Type abstraction defines the destination and format of a notification. This is used by the system to instantiate the correct notification client. The design supports a variety of delivery methods:
HTMLEmail and MarkdownIssueTracker cater to human consumption, specifying not just the destination but the markup language required for clear presentation.ChromeperfAlerting and AnomalyGrouper represent automated workflows. Instead of sending a message to a human, these types signal the system to push structured data into external tracking services or internal grouping logic for further automated analysis.None type allows for a “dry-run” or silenced state where regressions are detected and logged but no external side effects are triggered.NotificationDataProviderType)While the Type defines the transport, the NotificationDataProviderType defines the source-specific schema of the data.
In a multi-tenant environment like Perf, different projects (e.g., standard Skia vs. Android) require different metadata to be included in an alert. For example, an AndroidNotificationProvider might bundle specific build IDs or device characteristics that are irrelevant to other projects. By using this type, the notification engine can select the appropriate data formatter to bridge the gap between generic regression data and project-specific requirements.
The constants in this module act as the glue between alert configuration and the notification dispatcher:
[ Alert Configuration ] | v [ Notification Dispatcher ] <--- [ notifytypes.Type ] | (e.g., HTMLEmail) | +---------------------> [ Notification Data Provider ] | (e.g., AndroidNotificationProvider) | v [ External Systems ] (Email, Issue Tracker, Chromeperf)
By centralizing these types, the system ensures that adding a new notification destination or a new specialized data provider only requires an update to this registry, providing a consistent interface for all alerting components in the Perf ecosystem.
The perf-tool is a comprehensive command-line interface (CLI) designed for administrative and diagnostic interactions with Skia Perf. It serves as the primary tool for managing Perf instances, providing capabilities that span database maintenance, data lifecycle management, and infrastructure provisioning.
The tool bridges the gap between local configuration files and remote cloud resources (GCS, PubSub, CockroachDB), allowing developers and SREs to perform complex operations like re-ingesting historical data, migrating alerts between instances, and debugging specific trace data without needing to write custom scripts.
The project is structured to separate the CLI definition (routing and flags) from the business logic.
application module. By defining an Application interface, the CLI implementation in main.go remains clean and highly testable. This abstraction allows the CLI to focus on flag parsing and environment setup while delegating complex workflows to the application layer.--config_filename flag. The tool is designed to treat the InstanceConfig (JSON/TOML) as the definitive source of truth for the environment it is interacting with. This ensures that operations like PubSub creation or Database restores are always scoped to the correct instance.gob inside .zip files) rather than raw SQL dumps. This choice provides:main.go)This file defines the user interface of the tool using the urfave/cli framework. Its responsibilities include:
sklog) and instantiating the TraceStore or InstanceConfig based on the provided flags.application module./application)This module contains the heavy lifting for all functional areas:
Alerts, Shortcuts, and Regressions. It manages the complexity of batching large datasets and maintaining referential integrity (e.g., ensuring a regression backup includes the shortcuts it references).validate sub-command to check ingestion files against the schema and parser logic locally.TraceStore. This allows users to list trace IDs matching a specific query or export raw performance data for specific commit ranges into JSON files for external analysis.This workflow demonstrates how the tool extracts data from the storage layer for external use.
[ CLI: traces export ] -> [ Instance Config ] -> [ TraceStore (BigTable/CockroachDB) ]
| | |
|-- 1. Parse Query -| |
| |
|------- 2. Query Commits [Begin, End] ------->|
|
[ Local JSON File ] <--- 4. Encode & Write <--- 3. Retrieve Trace Values
When setting up a new Perf instance or updating an existing one, the tool synchronizes the cloud environment.
[ InstanceConfig ] [ Google Cloud PubSub ] [ Local State ]
| | |
1. Read Topics Config ------------>| |
| | |
2. Check Existence <---------------| |
| |
3. Create Missing Topics/Subscriptions ------------------------>|
| |
4. Set Dead Letter Policies/ACK Deadlines --------------------->|
The application module serves as the central orchestration layer for the perf-tool CLI. It encapsulates the high-level business logic and complex workflows required to manage a Skia Perf instance, acting as a bridge between the command-line interface and the underlying storage, ingestion, and cloud infrastructure systems.
By centralizing these operations, the module ensures that administrative tasks—such as database migrations, data re-ingestion, and trace debugging—are executed consistently and safely across different environments (local vs. production).
The module is designed around the Application interface, which promotes testability and provides a clean abstraction for the CLI handlers.
gob encoding wrapped in .zip archives. This choice allows for versioned, structured backups that include necessary metadata and allow for targeted restoration.IngestForceReingest logic uses hourly directory partitioning to efficiently scan large buckets, and it leverages PubSub to trigger the standard ingestion pipeline, ensuring that “forced” data follows the same processing path as live data.IngestValidate component performs a two-stage check: first against the schema to ensure structural correctness, and second through the actual parser to verify that keys, measurements, and links are generated as expected before a user commits to a large-scale ingestion.Managed through functions like DatabaseBackup* and DatabaseRestore*, these components interact with builders to instantiate the appropriate stores (Alert, Shortcut, or Regression) based on the provided InstanceConfig.
The module provides tools to inspect the TraceStore directly from the command line.
TracesList: Performs queries against specific tiles to debug trace IDs and values.TracesExport: Facilitates data extraction for external analysis. It maps query strings to internal trace names and exports the resulting values as JSON, supporting both file output and standard output.ConfigCreatePubSubTopicsAndSubscriptions automates the creation of the ingestion infrastructure. It handles complex configurations like Dead Letter Policies and acknowledgement deadlines, ensuring the cloud environment matches the local configuration file.IngestForceReingest allows for “time-traveling” data. By scanning GCS objects within a date range and republishing their metadata to the ingestion topic, it triggers the system to re-process historical data (e.g., after a parser bug fix).This workflow illustrates how the module triggers the reprocessing of historical performance data.
[ User Input ] [ GCS Bucket ] [ PubSub Topic ] [ Perf Ingestor ]
| | | |
1. Start/End Dates --------> | | |
| 2. List Objects | |
| <--------------------| | |
| | |
3. Path Filter Apply ----> (Filter Files) | |
| | |
4. Publish Message (Object Metadata) --------------> | |
| ---- 5. Notify ----> |
|
6. Re-parse File
Backing up regressions requires a “lookup and include” strategy for shortcuts.
[ Regression Store ] [ Perf Git ] [ Shortcut Store ] [ ZIP Archive ]
| | | |
1. Fetch Regressions (Batch) | | |
| ---- 2. Get Dates -> | | |
| <--------------------| | |
| | |
3. Extract Shortcut IDs --------------------------------> | |
| | ---- 4. Fetch -----> |
| | <--------------------|
| |
5. Encode Regressions + Encoded Shortcuts -------------------------------------> |
The /go/perf-tool/application/mocks module provides mock implementations of the core application logic interfaces for the perf-tool CLI. These mocks are generated using mockery and are built upon the testify framework, facilitating unit testing of command-line interactions and high-level workflows without requiring a live database, cloud infrastructure, or real file system mutations.
The primary goal of this module is to decouple the CLI's user interface (command parsing and flag handling) from the actual execution of heavy operations like database backups, trace exports, and ingestion management.
By using mocks, developers can:
--start, --stop, or --dryrun) are correctly parsed and passed to the underlying application logic.perf-tool management commands in milliseconds, avoiding the overhead of connecting to BigTable or SQL backends.Application.go)The Application struct is the central mock in this package. It mirrors the interface used by the perf-tool application layer, covering several functional domains of the Perf system:
IngestForceReingest and IngestValidate. This is critical for testing the logic that triggers data reprocessing across specific time ranges or validates ingestion file formats.TracesExport and TracesList. These facilitate testing how the tool queries TraceStore and writes results to output files or standard output, utilizing types.CommitNumber and types.TileNumber for range-based logic.ConfigCreatePubSubTopicsAndSubscriptions mock allows testing the initialization commands that provision Google Cloud PubSub resources based on the provided InstanceConfig.When testing a new command in perf-tool, the mock is used to intercept calls from the command-line handlers.
[ CLI Command ] ----> [ Application Interface (Mock) ] ----> [ Test Assertions ]
| | |
1. User runs: 2. Mock records call: 3. Test verifies:
"perf-tool ingest..." "IngestForceReingest(true, ...)" - Was it called?
- Were flags correct?
The NewApplication function simplifies this by automatically registering the mock with the testing.T cleanup routine, ensuring that AssertExpectations is called when the test finishes to verify that all expected calls were made.
The perfclient module provides a standardized interface for sending performance benchmarking data to Skia's Perf ingestion system. It functions as a specialized wrapper around Google Cloud Storage (GCS), abstracting the complexities of file naming conventions, data compression, and directory structuring required by the Perf ingestion engine.
The module is designed around the principle of deterministic, time-series organization. The Perf ingestion system expects data to be organized in GCS using a specific hierarchy based on time and task metadata. By centralizing this logic in perfclient, different Skia services can ensure that their performance results are stored in a way that the ingestion service can automatically discover and process them.
Key implementation choices include:
Content-Encoding: gzip header, allowing the data to be served uncompressed if requested while remaining compressed at rest.YYYY/MM/DD/HH folder structure. This allows the ingestion engine to poll specific time-based slices of data efficiently rather than scanning the entire bucket.The primary entry point is the ClientInterface. It defines the contract for pushing data to Perf. This abstraction allows other modules to use a MockPerfClient during unit testing, avoiding actual GCS network calls.
The concrete implementation of the interface. It holds a reference to a gcs.GCSClient and a basePath (the root directory in the bucket where all performance data should reside).
The PushToPerf method executes the following logic:
format.BenchData struct into JSON.objectPath to determine the exact destination in GCS.Content-Encoding and Content-Type).Data Flow:
[BenchData Struct]
|
v
[JSON Marshaling] -> [MD5 Hashing]
| |
v v
[GZIP Compression] -> [Path Construction]
| |
+----------+---------+
|
v
[GCS Upload (with gzip headers)]
objectPath)This function is critical for maintaining compatibility with the Perf ingestion system. It constructs paths following this pattern: [basePath]/[YYYY]/[MM]/[DD]/[HH]/[folderName]/[filePrefix]_[hash]_[timestamp].json
The perfresults module is a Go library and set of tools designed to bridge the gap between Chromium's distributed build/test infrastructure (LUCI) and the Skia Perf ingestion system. Its primary responsibility is the automated discovery, retrieval, and parsing of performance benchmark results—typically stored as JSON files in Content Addressed Storage (CAS)—produced by Swarming tasks.
The module provides a unified interface to navigate the hierarchy of Buildbucket builds and Swarming tasks to extract telemetry data for long-term storage and trend analysis.
The architecture follows a “discovery-to-normalization” pipeline. Instead of requiring a direct path to a result file, the module starts with a high-level Build ID and programmatically resolves the underlying storage locations.
[ Buildbucket ID ] | | (Lookup Build Metadata) v [ Swarming Parent Task ] | | (Identify Shards/Children) v [ Child Task IDs ] | | (Query CAS Outputs) v [ RBE CAS Digests ] | | (Fetch & Merge JSONs) v [ Internal PerfResults ] ----> [ Ingestion / CLI / Workflows ]
perf_loader.go)The loader is the central orchestrator. It encapsulates the logic required to communicate with multiple LUCI services in the correct sequence.
rbeProvider to generate RBE clients on the fly based on the specific CAS instance identified in the task metadata, ensuring it can fetch data across different infrastructure silos (e.g., chrome-swarming vs chromium-swarm).perf_results_parser.go)This component handles the “Histogram Set” JSON format. Because these files can be large (10MB+), the parser is designed for efficiency:
json.Decoder to process entries one by one, reducing memory footprint compared to loading the entire file into a byte slice.PerfResults struct, which maps a TraceKey (comprising Chart, Unit, Story, Architecture, and OS) to a Histogram (a collection of raw sample values).mean, max, min, std, and count.buildbucket.go, swarming.go, rbecas.go)These files provide specialized wrappers around LUCI and RBE protocol buffer clients:
BuildInfo, including the Git revision and “Machine Group” (e.g., ChromiumPerf). This metadata is critical for placing the results on the correct timeline in Skia Perf.parent_task_id.perf_results.json, even if they are nested within benchmark-specific subdirectories.The project is extended by several specialized submodules that handle specific parts of the performance lifecycle:
ingest: Translates the internal PerfResults structures into the specific JSON schema and Google Cloud Storage (GCS) path hierarchy required by the Skia Perf ingester.cli: A command-line tool that allows developers or CI scripts to manually trigger the loading and transformation of results for a given Buildbucket ID.workflows: Contains Temporal workflow definitions for managing long-running, fault-tolerant ingestion jobs. It ensures that if a network call fails during the multi-step discovery process, the job can resume without losing state.testdata: A comprehensive suite of recorded gRPC/HTTP interactions and sample JSON files, allowing for deterministic testing of the entire pipeline without live infrastructure access.Loader automatically merges them. If two histograms share the same TraceKey, their SampleValues are concatenated. This treats sharded test execution as a single logical benchmark run.ingest submodule, if a builder's configuration cannot be explicitly verified as “public,” the system defaults to storing results in non-public internal buckets to prevent accidental data leaks.TraceKey includes Architecture and OSName because these are derived from the Swarming bot dimensions. This ensures that even if two different machines run the same benchmark story, their results are stored as distinct traces if their hardware/OS profiles differ.The perfresults/cli module provides a command-line tool designed to bridge the gap between Buildbucket task execution and the Skia Perf ingestion system. Its primary purpose is to retrieve raw performance data associated with a specific Buildbucket build, transform it into a standardized format suitable for Skia Perf, and persist it as local JSON files.
This tool is particularly useful in CI/CD pipelines where performance benchmarks are executed as sub-tasks of a main build, and those results need to be extracted and prepared for long-term storage and analysis.
The CLI acts as an orchestrator between the perfresults loading logic and the ingest formatting logic. The design favors a “pull and transform” model:
perfresults package to abstract the complexity of communicating with Buildbucket and locating relevant benchmark artifacts.ingest package.[ Buildbucket ID ] | v +--------------+ +-----------------------+ | perfresults |----->| Raw Benchmark Results | | Loader | | (Memory Structures) | +--------------+ +-----------------------+ | v +--------------+ +-----------------------+ +-----------------+ | ingest |<-----| Add Metadata: | | Output Files: | | Converter | | - Git Revision |----->| bench_123.json | +--------------+ | - Buildbucket Link | | bench_456.json | +-----------------------+ +-----------------+
main.go)The entry point handles command-line flag parsing and coordinates the execution flow. It is responsible for:
perfresults.Loader) with specific benchmark data.stdout, the CLI allows parent scripts or automation tools to easily identify and process the resulting JSON files.The CLI serves as the glue between several specialized modules:
perf/go/perfresults: Provides the Loader which handles the heavy lifting of finding and downloading artifacts from Buildbucket.perf/go/perfresults/ingest: Contains the logic to translate internal Go structures into the specific JSON schema required by the Skia Perf ingestion pipeline.The perfresults/ingest module provides the logic necessary to transform raw performance results into a structured format suitable for the Skia Perf ingestion pipeline and determines the appropriate storage locations within Google Cloud Storage (GCS).
It acts as a bridge between the data structures defined in the perfresults module (which represent the raw telemetry/benchmark output) and the format.Format expected by the Skia Perf ingester.
The ingestion process involves two primary responsibilities:
The perfresults format often contains a collection of sample values for a single measurement. However, for charting and time-series analysis, these samples need to be reduced to specific statistical points (e.g., mean, max, min).
Instead of choosing a single representative value, the module utilizes perfresults.AggregationMapping to generate multiple traces for a single histogram. Each aggregation (like “avg” or “std”) is converted into a format.SingleMeasurement. This allows users to toggle between different statistical views of the same benchmark data in the Perf UI.
Security and visibility are handled at the path generation level. The module distinguishes between “public” and “non-public” buckets based on the builder name.
bot_configs, the module defaults to the internal bucket (chrome-perf-non-public). This “secure-by-default” approach prevents accidental exposure of sensitive performance data.json.go)This file handles the structural mapping between the perfresults package and the ingest/format package.
ConvertPerfResultsFormat: This is the entry point for data transformation. It maps histogram keys (Chart, Unit, Story, Arch, OS) into the Key metadata map used by the ingester for filtering.toMeasurement: Processes the raw SampleValues from a histogram. It filters out invalid numerical values (Inf, NaN) before they reach the ingestion pipeline to ensure database integrity.gcs.go)This file defines the organizational hierarchy of the performance data in GCS. The path structure is designed to be easily browsable and predictable for the ingester:
gs://<bucket>/ingest/<YYYY>/<MM>/<DD>/<HH>/<MachineGroup>/<BuilderName>/<Benchmark>
convertTime function flattens the precision to the hour, grouping results into hourly “buckets” to optimize file discovery and ingestion batching.ChromiumPerf as the default Machine Group and BuilderNone for missing builders) to ensure the path remains valid and consistent.The typical flow of data through this module can be visualized as follows:
[Raw PerfResults] -> ConvertPerfResultsFormat() -> [format.Format Object]
|
v
[Build Metadata] -> convertPath() --------------> [GCS URI Destination]
+ |
[Timestamp] v
(Ready for Upload/Ingest)
The resulting JSON object and GCS path are then used by higher-level services to write the data to GCS, where the Skia Perf ingester will eventually pick it up for processing into the trace database.
This module serves as a centralized repository of deterministic test inputs and recorded network interactions used to verify the functionality of performance result processing. It enables the testing of complex workflows—such as fetching task metadata, parsing performance histograms, and merging result sets—without requiring active connections to external services like Buildbucket or Swarming.
The data within this module is structured to support three primary testing objectives:
API Replay and Service Mocking: The module contains recorded pRPC/gRPC interactions (captured in .json and .rpc files). These files allow the perfresults clients to simulate communication with infrastructure services. By providing pre-recorded request/response pairs, the tests can verify how the system handles various states—such as a successful build lookup, a non-existent task ID, or a complex task hierarchy—under stable, repeatable conditions.
Data Schema and Parsing Verification: Files like full.json, empty.json, and valid_histograms.json represent the expected internal schema for performance data. These are used to ensure that parsers correctly translate raw JSON inputs into internal Go structures (e.g., GenericSet, DateRange, and Histogram objects) and that diagnostic metadata is correctly associated with specific samples.
Aggregation and Logic Validation: Specialized datasets like merged.json and merged_diff.json are designed to test higher-level logic. These files provide the “before” and “after” states required to validate that the module can successfully combine multiple results or calculate differences between distinct performance runs.
The test data is organized into functional groups to reflect the multi-stage nature of performance result processing:
FindTaskID_..., SwarmingClient_...): These files mock the discovery phase. They contain the specific metadata—such as Swarming instance names and CAS (Content Addressed Storage) digests—needed to locate where performance results are actually stored after a build completes.LoadPerfResults_...): These datasets cover the edge cases of the loading logic. This includes scenarios where a build exists but contains no performance data (“NoChildRuns”) or where the build identification is entirely invalid.perftest group): These files represent the final performance metrics. They include complex diagnostic maps that link specific measurements to bot IDs, operating systems, and benchmark versions.The data in this module facilitates the testing of the following automated discovery and parsing pipeline:
[ Build ID ] --> ( Mock Buildbucket ) --> [ Swarming Task ID ]
|
v
[ CAS Digest ] <-- ( Mock Swarming ) <--- [ Task Result ]
|
+--------> ( Mock RBE/CAS ) ------> [ histogram.json ]
|
v
[ Internal Result Set ] <--------------- ( Parser Logic )
By providing static files for every step in this chain, the module ensures that logic changes in the parser or client can be verified for correctness and backward compatibility with historical data formats.
The perfresults/workflows module contains the business logic for orchestrating the ingestion and processing of performance data within the Skia infrastructure. It leverages the Temporal framework to manage long-running, distributed tasks that require strong guarantees on state persistence and fault tolerance.
Performance data ingestion is a multi-step process involving data retrieval, validation, storage in the trace store, and triggering downstream analysis (such as regression detection). The module is built on the following principles:
The module is structured to support both the high-level workflow definitions and the granular activities they invoke.
The relationship between the orchestrating workflow and the underlying infrastructure is depicted below:
[ Trigger Event ] | v +-----------------------------+ | Temporal Workflow | | (Orchestration & State) | +--------------+--------------+ | +--------+--------+ | | | v v v +----------+ +----------+ +----------+ | Activity | | Activity | | Activity | | (Fetch) | | (Parse) | | (Store) | +----+-----+ +----+-----+ +----+-----+ | | | +------------+------------+ | v [ Result Finalization ]
While the workflows module defines the logic, it relies on the worker submodule to provide the execution environment. The workflows are registered with the worker at startup, allowing the worker to “claim” tasks from the Temporal task queue that match the workflow and activity names defined here. This decoupling allows the workflow logic to be updated and deployed independently of the worker's infrastructure configuration, provided the interfaces remain compatible.
The perf-upload-worker serves as the execution engine for Temporal workflows related to performance result ingestion and processing. In the context of the Perf results subsystem, this worker acts as the bridge between the Temporal orchestration engine and the actual execution of tasks, such as uploading data to storage or triggering indexing processes.
The worker is designed around the principle of externalized orchestration. Instead of embedding business logic directly into a monolith, the worker provides the compute resources to execute workflows defined elsewhere. This decoupling allows for:
The primary responsibility of this module is the lifecycle management of the Temporal worker process, managed within main.go.
--host_port and --namespace). It uses a single, heavyweight client.Client instance to minimize resource overhead, as per Temporal best practices.--task_queue. By default, it dynamically generates a queue name based on the current system user (e.g., localhost.username), which facilitates local development and testing without interfering with production workflows.MetricsHandler to export Temporal-specific SDK metrics to Prometheus. This is crucial for monitoring the health of the workflow execution environment, such as worker pollers and activity execution rates.The worker operates in a continuous loop, polling the Temporal server for work. The high-level interaction between the worker and the broader system is illustrated below:
+------------------+ +------------------+ +-----------------------+ | Temporal Server | <----1----> Perf Worker | <----2----> Workflow/Activity | | (Orchestrator) | | (Execution) | | Implementations | +------------------+ +------------------+ +-----------------------+ ^ | | | 3. Export Metrics | v | +------------------+ +------------------+ Prometheus/Skia | | Monitoring | +------------------+
:8000).The worker is packaged as a containerized application (perf_upload_worker). It relies on command-line flags to determine its environment:
--task_queue: Defines which set of tasks this specific worker fleet will handle.--namespace: Segregates workflow execution within the Temporal cluster (e.g., separating “prod” from “staging”).The perfserver module provides a unified entry point for all long-running processes required to operate a Skia Perf instance. It is designed as a multi-command CLI tool that encapsulates disparate operational roles—web serving, data ingestion, maintenance, and regression detection—into a single binary.
The design of perfserver follows a “sidecar” or “micro-service” compatible architecture where a single codebase can fulfill different roles depending on the command-line arguments. This approach simplifies deployment and configuration management: instead of managing multiple distinct binaries, the same container image or executable is deployed across different service tiers (e.g., Kubernetes deployments), with only the entrypoint command changing.
The functionality is divided into several sub-commands, each targeting a specific area of the Perf lifecycle:
frontend)The frontend command launches the primary web server. It is responsible for serving the user interface and handling API requests for data visualization. While it primarily focuses on the “read” path of the system, it acts as the central hub for user interaction with performance traces.
ingest)The ingest command starts the data processing pipeline. Its responsibility is to monitor configured sources (such as cloud storage buckets), parse incoming performance files, and populate the TraceStore.
cluster)Despite its name, the cluster command is essentially a specialized instance of the frontend logic configured specifically for background analysis. It focuses on the “alerting” path—continuously scanning newly arrived data against configured alert definitions to identify performance regressions or improvements.
maintenance)The maintenance command runs background tasks that are required for the long-term health of the database and application state. These tasks are typically “singleton” operations, meaning only one instance of the maintenance process should run per Perf instance to avoid data contention or redundant processing.
The perfserver coordinates these components through a shared configuration validation logic. Every sub-command (excluding documentation generators) follows a similar initialization pattern:
perf/go/frontend or perf/go/ingest/process).[ Configuration File ]
|
v
+-----------------------------+
| perfserver |
+-----------------------------+
|
+--- [ frontend ] ----> Serves Web UI & API
|
+--- [ ingest ] ------> Monitors Storage -> Populates TraceStore
|
+--- [ cluster ] ------> Runs Alerting & Regression Detection
|
+--- [ maintenance ] --> Database Cleanup & Singleton Tasks
go/urfavecli to map configuration structures directly to command-line flags. This ensures that the CLI interface stays in sync with the underlying configuration objects defined in perf/go/config.sklog), error handling (via skerr), and metrics initialization are applied consistently across all roles of the Perf system.ingest or maintenance, the server uses validate.InstanceConfigFromFile to catch configuration errors early, preventing partial failures in production.The pinpoint module provides a Go client for interacting with Pinpoint, the performance regression analysis service used by Chrome and Skia. It abstracts the complexities of communicating with legacy Chromeperf and Pinpoint endpoints, allowing Skia Perf to programmatically trigger “Try Jobs” (to test specific patches) and “Bisect Jobs” (to identify the root cause of performance regressions).
This module acts as a bridge between Skia Perf and the Pinpoint service. Its primary responsibility is to translate high-level requests—such as “bisect this anomaly” or “run a try job with this patch”—into the specific URL-encoded POST requests required by Pinpoint's legacy API.
The client manages:
auth.ScopeUserinfoEmail to authorize requests.pinpoint.go)The Client struct is the central entry point. It wraps an http.Client configured with necessary OAuth2 credentials and holds telemetry counters.
CreateTryJob: Initiates a job to compare a base commit/patch against an experimental commit/patch. This is typically used to verify if a proposed fix actually improves performance before landing.CreateBisect: Initiates a bisection to find the specific commit that introduced a performance change. It supports two different backend paths depending on whether the configuration indicates the anomaly was fetched from the new SQL-based system.doPostRequest: A private helper that handles the low-level HTTP execution, response body reading, and error extraction. It specifically knows how to parse Pinpoint's error JSON format to provide actionable error messages.The module uses specific structures to define job parameters:
TryJobCreateRequest: Captures details like BaseGitHash, ExperimentPatch, Benchmark, and Story.BisectJobCreateRequest: Captures regression-specific data like StartGitHash, EndGitHash, ComparisonMagnitude, and AlertIDs.Pinpoint's legacy API consumes parameters via URL query strings even for POST requests. The module handles this through several “build” functions that ensure required fields are present and formatted correctly.
[ Skia Perf ] --(Request Struct)--> [ Client.CreateBisect ] | [ getBisectRequestURL ] / \ (New Anomaly?) (Legacy Anomaly?) / \ [ buildPinpointURL ] [ buildChromeperfURL ] \ / [ buildBisectRequestParams ] | (dotify stories) (add "skia_perf" tags) | [ Pinpoint API ] <---(POST with Params)---'
The module intentionally targets the legacy Pinpoint API endpoints (/api/new and /pinpoint/new/bisect). This decision necessitates the use of URL query parameter encoding for POST bodies, as seen in buildTryJobRequestURL and buildBisectRequestParams.
dotify)Pinpoint internally expects story names to use dot notation (e.g., story.name) rather than underscores (e.g., story_name), which are common in other parts of the Skia ecosystem. The dotify function automatically handles this transformation to prevent job submission failures.
Instead of returning generic HTTP errors, extractErrorMessage attempts to parse the JSON response from Pinpoint to find a specific error field. If Pinpoint returns a 400 or 500 status code with a message like {"error": "benchmark not found"}, this module ensures that specific string is propagated back to the caller.
The function getBisectRequestURL uses the config.Config.FetchAnomaliesFromSql toggle to decide which legacy endpoint to hit. This allows the system to support a transition period between old Chromeperf-managed anomalies and newer SQL-managed anomalies without breaking the bisection workflow.
The pivot module provides functionality to transform and aggregate Performance DataFrames. It allows users to group traces by specific keys and apply mathematical operations to summarize data, similar to a “Pivot Table” in a spreadsheet or a GROUP BY clause in SQL.
In Perf, data is typically represented as a series of traces (floating-point arrays) identified by a set of parameters (e.g., arch=arm, config=8888). The pivot module allows you to “collapse” these traces based on a subset of those parameters.
For example, if you have traces for various configurations across different architectures, you can pivot by arch to see the aggregate performance of arm vs. intel, regardless of the specific configuration.
The transformation is governed by a Request struct which defines three things:
The module supports several mathematical operations for both grouping and summarization:
The pivoting process follows a structured pipeline:
The module identifies all unique combinations of the keys provided in GroupBy that exist in the DataFrame. It then maps every existing trace in the input DataFrame to one of these groups. If a trace does not contain all the keys specified in GroupBy, it is excluded from the result.
For each group, the Operation (e.g., Sum) is applied across all traces in that group. This results in one trace per group. The trace ID for this new trace contains only the keys specified in the GroupBy list.
Input Traces: {arch: arm, config: 8888} -> [1, 0, 0] {arch: arm, config: 565 } -> [0, 2, 0] {arch: intel, config: 8888} -> [1, 1, 1] Pivot (GroupBy: ["arch"], Operation: Sum): {arch: arm} -> [1+0, 0+2, 0+0] -> [1, 2, 0] {arch: intel} -> [1] -> [1, 1, 1]
If Summary operations are provided, the module further transforms the aggregated traces. Instead of a trace representing values over multiple commits, the resulting “trace” contains one value for each operation listed in Summary.
Intermediate Grouped Trace (from above): {arch: arm} -> [1, 2, 0] Summary (Summary: [Avg, Max]): {arch: arm} -> [1, 2] // Avg is 1, Max is 2
opMap to link Operation enums to specific implementation functions from the go/calc and go/vec32 packages. This ensures consistency between how data is grouped and how it is summarized.ParamSet and updates the DataFrame headers. If a summary is performed, the headers are replaced with simple offsets representing the summary columns.query.MakeKeyFast and query.ParseKeyFast for efficient trace ID manipulation and supports context cancellation for long-running aggregations on large datasets.The playground module serves as an interactive experimentation hub for performance data analysis. It provides a web-accessible sandbox where developers and performance engineers can validate detection algorithms, test regression logic against synthetic or real-world traces, and fine-tune sensitivity parameters without impacting production systems or persistent storage.
The primary goal of the playground is to decouple the analysis logic from the data storage and ingestion infrastructure. In the standard Perf production environment, anomaly detection is often part of a large, automated pipeline that reads from BigTable and writes to SQL databases. The playground bypasses these dependencies by accepting raw data via HTTP and processing it in-memory.
This design enables:
AbsoluteStep vs. OriginalStep) on the same data set.The module is structured to separate the API lifecycle from the specific mathematical analysis being performed.
/anomaly)This is the primary functional area of the playground. It implements a sliding window approach to identify shifts in time-series data.
2 * radius + 1 across the data. This localization allows the regression package to focus on finding a single “best” step within a small context, which is more robust against long-term trends or multiple shifts in a single trace.regression and dataframe packages used by Skia Perf expect complex structures (trace sets, headers, paramsets). The playground's anomaly logic constructs transient, “dummy” dataframes to wrap raw float slices, allowing the production-grade regression code to run as if it were processing a standard database query.The following diagram illustrates how data flows from a user request through the detection engine:
[User Request (JSON)] | (Trace, Threshold, Radius, Algorithm) v [HTTP Handler] | +-----> [Data Cleaning] (Remove missing data sentinels) | +-----> [Sliding Window Loop] | | | v | [regression.StepFit] <--- (Analyzes N points) | | | +--> [Threshold Check] (Is regression > threshold?) | +-----> [Optional: Grouping/Suppression] | | (Merges adjacent hits, keeps max score) v [Enriched Response] (Indices, Medians Before/After, Regression Scores)
MedianBefore and MedianAfter values. These are calculated using vec32.RemoveMissingData to ensure that gaps in telemetry do not result in “NaN” or skewed medians, providing the user with a clean delta of the performance change.Algorithm field. This allows the playground to support any algorithm registered in the regression package without changing the API schema, making it extensible as new detection methods are developed.PlaygroundTraceName constant, the module satisfies internal requirements for named traces while maintaining the abstraction that this data is ephemeral and not tied to a real hardware bot or test suite.The anomaly playground module provides a sandbox environment for testing and tuning anomaly detection algorithms on performance data. It exposes an HTTP interface that allows users to submit individual data traces and receive a list of detected anomalies based on configurable parameters like window size and sensitivity thresholds.
This module acts as a bridge between the frontend and the core regression detection logic, allowing developers and users to experiment with detection settings without modifying production configurations or underlying databases.
The module implements a Sliding Window Step Fit approach. Instead of analyzing a whole trace at once, it moves a window across the data to identify localized “steps” or shifts in value.
Handler receives a DetectRequest containing the raw trace data, a Radius (determining window size), a Threshold (sensitivity), and the specific Algorithm to use (e.g., AbsoluteStep, OriginalStep).slidingWindowStepFit function iterates through the trace. At each index i, it creates a window of size 2 * radius + 1.dataframe.DataFrame and calls regression.StepFit. This leverages the existing production logic used by the Perf service.GroupAnomalies is false, every index flagged by the algorithm is returned.GroupAnomalies is true, the module performs Non-maximum Suppression. It groups consecutive indices flagged as anomalies and only returns the one with the highest absolute regression score (the “most significant” point in the cluster).[Trace Data] | v [Windowing] ----> [Sub-trace (i - radius to i + radius)] | | | v | [regression.StepFit Analysis] | | | v |<--- [Is it a "High" or "Low" Step?] | v [Candidate Anomalies] | +--- (If GroupAnomalies=true) ---> [Non-maximum Suppression] | (Pick best in group) v [JSON Response (Anomalies)]
anomaly.goContains the core logic for the playground:
DetectRequest / DetectResponse: Defines the JSON API. The request allows choosing the algorithm and whether to group nearby anomalies to reduce noise.slidingWindowStepFit: The engine that breaks the trace into windows. It constructs dummy dataframe headers to satisfy the requirements of the regression package's API, simulating a real data environment.Handler: The HTTP entry point. It manages the lifecycle of a request, invokes the detection, calculates metadata for each detected anomaly (like MedianBeforeAnomaly and MedianAfterAnomaly), and performs the optional grouping logic.anomaly_test.goProvides functional tests for the detection logic. It verifies that the playground correctly identifies simple steps (up/down), handles empty or flat traces, and correctly implements the grouping suppression logic to ensure only the most relevant points are reported.
vec32.RemoveMissingDataSentinel when calculating medians to ensure that gaps in performance data (common in real-world traces) do not skew the statistical summary of the detected anomaly.Regression value returned by the step-fit algorithm. When grouping anomalies, the absolute value is used to determine which point in a cluster represents the most significant shift.PlaygroundTraceName is used to identify traces within the temporary dataframes created for analysis, ensuring compatibility with internal Perf logic that expects named traces.The preflightqueryprocessor module provides specialized logic for handling complex trace queries in Skia Perf, particularly during the “preflight” stage of data exploration. It manages the aggregation of parameters and trace counts across multiple data tiles, with specific support for “missing value” logic that standard database queries cannot easily represent.
In Skia Perf, a query typically filters traces based on a set of key-value pairs (e.g., benchmark=V8_Flash). However, users often need to perform “preflight” checks to understand the shape of their data before running a full analysis. This module facilitates:
__missing__ sentinel, which allows users to query for traces that lack a specific key.__missing__)Standard database backends often struggle to query for the absence of a key alongside specific values in a single pass. To solve this, the module implements a “fetch-and-filter” strategy:
MissingValueSentinel (__missing__), the processor removes that key from the actual database query. This ensures a superset of traces (including those missing the key) is fetched from the store.FilterParams function then applies the logic manually: a trace matches if the key is missing OR if the key‘s value is within the user’s explicitly allowed set.Preflight queries often run across multiple goroutines (one per data tile). To handle this efficiently:
preflightQueryBaseProcessor holds a shared sync.Mutex and paramtools.ParamSet. All subqueries and the main query share these instances to build a single unified result.preflightSubQueryProcessor collects values into a local slice for each tile and then performs a single batch update to the shared state once the tile is fully processed.The module distinguishes between the primary query and supplementary subqueries:
The following diagram illustrates how a query is handled when a sentinel value is present:
User Query: [config=gpu, arch=__missing__] | v PrepareQueryWithSentinel() |-- 1. Create FilterMap: { "arch": { "allowed": [] } } |-- 2. Strip "arch" from Query -> Backend Query: [config=gpu] v Fetch Traces from Tiles (Parallel) | |--> Tile 1 Results ----> FilterParams() ----> If Match: Add to Shared ParamSet |--> Tile 2 Results ----> FilterParams() ----> If Match: Add to Shared ParamSet v Finalize() |-- Subqueries move collected values into the final shared ParamSet. |-- Result: Total Count + Aggregated ParamSet.
ParamSetAggregator & PreflightQueryResultCollectorThese interfaces define how trace data is consumed. The main processor implements both (tracking count and params), while the subquery processor only implements aggregation.
preflightMainQueryProcessorThe primary coordinator. It uses a map[string]bool of unique trace IDs to ensure the total count is accurate even if traces overlap across tiles. It also supports SetKeysToDetectMissing, which forces the aggregator to record an empty string if a specific expected key is absent from a trace.
preflightSubQueryProcessorOptimized for “discovery” queries. It tracks values in a local filteredValuesFromTiles map during tile processing and only populates the shared ParamSet during the Finalize phase to reduce lock overhead.
PrepareQueryWithSentinel and FilterParamsThe logic engine for the __missing__ value. PrepareQueryWithSentinel modifies the query object in place (after cloning) to ensure the backend returns a broad enough dataset for the manual FilterParams logic to work correctly.
The progress module provides a standardized mechanism for tracking and reporting the status of long-running backend tasks in the Perf application. It bridges the gap between asynchronous server-side processes (like complex data queries or “dry runs”) and the user interface, which needs real-time feedback on task advancement.
The module is built around a “push-pull” architecture:
Progress object with its current state, messages, and eventually, results.Tracker intercepts these requests and returns a serialized snapshot of the task's progress.To ensure consistency and prevent logical errors in task reporting, the state machine is strictly enforced. Once a task transition out of the Running state (to Finished or Error), any further attempts to modify its state or messages will result in a panic. This encourages developers to handle task finalization at the outermost calling level, ensuring a clean lifecycle.
The Progress interface is the core unit of state. It manages:
Running, Finished, or Error.The Tracker acts as a registry and HTTP handler for all active Progress objects. It manages the lifecycle of these objects using an internal Least Recently Used (LRU) cache.
Key responsibilities include:
Progress is added to the Tracker, it is assigned a UUID and a corresponding polling URL based on a configured basePath.Tracker provides a standard http.Handler that extracts the task ID from the URL path, retrieves the task from the cache, and serializes its state to JSON.Tracker runs a background goroutine that evicts completed or failed tasks from the cache after a set duration (defaulting to 5 minutes).The following diagram illustrates the interaction between an HTTP handler, a background worker, and the Tracker.
HTTP Handler Background Worker Tracker / UI ------------ ----------------- ------------ 1. Create Progress ----> 2. Start Goroutine 3. Add to Tracker ----> 4. Return JSON (Initial URL) -> UI starts polling | 5. update Message() ----------> UI sees "Step 1" | 6. update Results() | 7. call Finished() ----------> UI sees "Finished" & fetches Results | [ 5 minutes later ] ----------> Tracker evicts task
Progress uses a sync.Mutex to ensure that concurrent updates from a worker and read requests from the Tracker handler are thread-safe.SerializedProgress struct is designed to be easily consumed by TypeScript frontends, using go2ts compatible tags.Error(msg) is called, the status is updated, and the error message is automatically stored in the messages list under a reserved Error key.The psrefresh module is responsible for maintaining and providing an up-to-date ParamSet for a Perf instance. A ParamSet is a collection of all keys and values (metadata) for all traces stored in the system.
In a high-volume performance monitoring system, querying the underlying database to discover which parameters are available (e.g., “Which benchmarks ran on this specific bot?”) can be expensive. This module solves that by background-loading metadata into memory and optionally caching filtered results to ensure that the user interface remains responsive when users build queries.
The module retrieves metadata by looking at “tiles”—chunks of time-series data. The defaultParamSetRefresher is designed to aggregate metadata from a configurable number of the most recent tiles (typically the two most recent). This ensures that the ParamSet reflects currently active traces while ignoring stale parameters from deleted or very old data.
There are two primary ways this module serves data:
defaultParamSetRefresher): Keeps the full, global ParamSet in memory. This is updated on a periodic background tick.CachedParamSetRefresher): Wraps the default refresher. It pre-calculates and stores filtered ParamSet results in a cache (like Redis or local memory) based on specific “Level 1” and “Level 2” keys defined in the configuration. This is a performance optimization for UI components that drill down through common hierarchies (e.g., Benchmark -> Bot).sync.Mutex to protect the in-memory ParamSet during background updates, ensuring that readers never see a partially constructed set.oneStep), the system is designed to be tolerant of failures. While failing to fetch the latest tile results in an error, failures to fetch older supplementary tiles are logged as warnings rather than crashing the refresh cycle, allowing the system to provide “mostly complete” data rather than no data at all.psrefresh.goContains the core logic for the defaultParamSetRefresher.
OPSProvider Interface: Abstracts the data source (usually a TraceStore). It requires two methods: identifying the latest tile and retrieving a ParamSet for a specific tile.oneStep(): The atomic unit of work that fetches the latest tile ID, iterates backward to collect the requested number of tiles, merges their metadata, and “freezes” the result into a read-only structure.cachedpsrefresh.goImplements the CachedParamSetRefresher, which adds a caching layer over the standard refresher.
PopulateCache() to proactively execute “Preflight” queries. It iterates through values of a primary key (Level 1) and optionally a secondary key (Level 2), storing the resulting ParamSet and trace count in the cache.GetParamSetForQuery is called, the component checks if the query matches the cached levels (e.g., exactly 1 or 2 specific keys). If it matches, it serves from the cache; otherwise, it falls back to a real-time database query.The standard refresher maintains the global state of available parameters.
[ Timer Tick ]
|
V
[ oneStep() ] ----------------------> [ TraceStore (OPSProvider) ]
| |
| <--- Get Latest Tile ID -------------|
| |
| <--- Get ParamSet for Tile N --------|
| <--- Get ParamSet for Tile N-1 ------|
|
[ Merge & Normalize ]
|
[ Lock Mutex ] -> [ Update pf.ps ] -> [ Unlock Mutex ]
When the UI requests a filtered ParamSet (e.g., selecting a benchmark to see available bots), the cached refresher determines the most efficient data path.
[ Request: GetParamSetForQuery(Query) ]
|
|-- If Query has Level1/Level2 keys only? --+
| |
| [ YES ] [ NO ]
V V
[ Check Cache ] [ Real-time DB Query ]
| |
|-- Cache Hit? --+ |
| | |
[ YES ] [ NO ] |
V V V
[ Return Data ] [ Fetch from DB ] --------> [ Return Data ]
The go/psrefresh/mocks module provides mock implementations of interfaces used by the ParamSet Refresher (psrefresh) system within the Perf service. Its primary purpose is to facilitate unit testing for components that depend on an OPSProvider (Ordered ParamSet Provider), allowing developers to simulate data retrieval from the underlying storage layer without requiring a live database or complex setup.
The module relies on testify/mock and is generated via mockery. This choice ensures that the mocks are strictly typed and consistent with the actual interfaces they represent. By using generated mocks, the project maintains a clear separation between the logic being tested and the data-providing infrastructure.
A key design aspect of these mocks is the abstraction of tile-based data access. In the Perf system, data is organized into “tiles” (chunks of time-series data). The OPSProvider mock allows tests to control exactly what a component perceives as the “latest tile” or what “ParamSet” (a collection of key-value pairs representing trace metadata) exists within a specific tile.
This file contains the OPSProvider struct, which mocks the interface responsible for bridging the refresher logic and the actual data store. It manages two primary responsibilities in a test environment:
GetLatestTile. This is crucial for testing how the refresher reacts when new data is added or when the system is already up to date.GetParamSet mock function, developers can inject specific paramtools.ReadOnlyParamSet objects into the workflow. This allows for fine-grained testing of how the Perf system indexes metadata and how it handles potential errors during data retrieval.The mock includes a NewOPSProvider constructor that automatically handles test cleanup and expectation assertions, ensuring that tests fail if the code under test does not interact with the provider as expected.
The typical workflow for using this module involves setting up expectations within a unit test to simulate the lifecycle of a ParamSet refresh operation:
[ Test Setup ]
|
V
[ Mock OPSProvider ] <--- Define Return: GetLatestTile (e.g., Tile #500)
|
V
[ Mock OPSProvider ] <--- Define Return: GetParamSet(ctx, 500) (e.g., custom ParamSet)
|
V
[ Component Under Test ] --- Calls GetLatestTile() ---> [ Mock ]
| |
|<--- Returns Tile #500 ----------------------------|
|
[ Component Under Test ] --- Calls GetParamSet(500) ---> [ Mock ]
| |
|<--- Returns ParamSet -----------------------------|
|
V
[ Assertions ] <--- Verify component processed ParamSet correctly
The go/redis module provides the integration layer between Skia Perf and Google Cloud Memorystore (Redis). Its primary role is to manage the discovery of Redis instances and facilitate data caching to improve the performance of the Perf query UI.
The module bridges two distinct domains: the management of Google Cloud Platform (GCP) resources and the application-level interaction with Redis data structures.
The module is designed around the RedisWrapper interface, which abstracts the complexity of GCP infrastructure management. This abstraction allows for clean separation between the logic that locates a database and the logic that consumes it, while also enabling the automated mocking found in the mocks sub-module.
Key design decisions include:
StartRefreshRoutine) to periodically poll the state of Redis. This ensures that the application has up-to-date metadata about the target Redis instance without blocking the main execution path.RedisClient uses a sync.Mutex during cache updates. This prevents race conditions when multiple refresh cycles or concurrent operations attempt to modify the client's internal state or interaction logic simultaneously.The RedisClient is the primary implementation of the RedisWrapper. It acts as a coordinator between three major dependencies: the GCP Cloud Redis client (for infrastructure), the TraceStore (the source of data), and the Redis data client (for caching).
StartRefreshRoutine, the client manages a ticker-based loop. It searches for a specific Redis instance name (provided via configuration) within a target GCP project and zone.ListRedisInstances method handles the pagination and iteration logic required by the GCP API, converting the stream of instances into a usable slice of redispb.Instance objects.RefreshCachedQueries method (and its associated workflows) is responsible for the actual data movement. It establishes a connection to the discovered host and port and performs the necessary Redis commands to update the cache. This ensures the Query UI can retrieve pre-computed results instead of performing expensive lookups on the primary TraceStore.The following diagram illustrates how the module moves from a configuration state to an active cached state:
[ StartRefreshRoutine ]
|
| (Every refreshPeriod)
v
[ ListRedisInstances ] <---- Queries GCP Cloud Redis API
|
| Filters by config.Instance Name
v
[ Target Instance Found? ] -- No --> [ Log Error/Wait ]
|
Yes (Extract Host/Port)
v
[ RefreshCachedQueries ]
|
| 1. Create redis.NewClient(Host:Port)
| 2. Lock Mutex
| 3. Update cache keys (e.g., "FullPS")
| 4. Unlock Mutex
v
[ Cache Updated ]
This workflow ensures that even if the Redis instance is recreated or its internal IP changes, the Perf system will automatically rediscover the new endpoint and resume caching operations without requiring a manual restart.
The go/redis/mocks module provides automated mock implementations of the Redis management interfaces used within the Perf system. Its primary purpose is to enable unit testing of components that interact with Google Cloud Redis instances without requiring a live cloud environment or complex integration setups.
The module is built around the RedisWrapper mock, which is generated using mockery. The decision to use generated mocks rather than manual stubs ensures that the testing layer stays in sync with the actual RedisWrapper interface used in production.
The implementation utilizes the testify/mock framework. This allows developers to:
redispb.Instance objects) and error conditions (such as context timeouts or API failures).This file contains the RedisWrapper struct, which mocks the interface responsible for high-level Redis lifecycle and discovery operations. It focuses on two primary responsibilities:
ListRedisInstances, the mock simulates the retrieval of Redis instance metadata from a specific project or region. This is critical for testing logic that needs to dynamically discover Redis endpoints based on cloud configurations.StartRefreshRoutine method mocks the behavior of long-running background processes that handle the periodic refreshing of Redis configurations or connections. In a test environment, this allows callers to verify that the refresh cycle is initiated with the correct duration and configuration parameters (config.InstanceConfig) without actually spawning persistent goroutines.The typical usage pattern involves initializing the mock within a test and injecting it into the higher-level service that requires a RedisWrapper.
[ Test Case ]
|
| 1. Initialize NewRedisWrapper(t)
v
[ RedisWrapper Mock ] <------- 2. Setup expectations (On("ListRedisInstances").Return(...))
|
| 3. Inject mock into Perf Component
v
[ Component Under Test ] ----> 4. Calls ListRedisInstances()
|
| 5. Test ends, mock automatically asserts expectations
v
[ Assertions Passed/Failed ]
This workflow ensures that components responsible for Perf data storage and caching can be validated in isolation, ensuring that logic governing instance selection and maintenance routines is robust against various infrastructure states.
The regression module is the core analytical engine of Skia Perf. It is responsible for detecting, refining, and persisting performance regressions (shifts in metric data) across the commit history.
Performance regressions are identified by comparing metric values before and after a specific commit. The module operates by fetching “frames” of data (a window of commits), applying statistical algorithms to identify clusters of traces that show similar shifts, and then triaging these shifts based on user-defined alert configurations.
The system supports two primary detection methodologies:
The module splits the lifecycle of a regression into distinct stages:
detector.go): Executes the heavy mathematical lifting (K-Means or StepFit). It is intentionally “greedy,” finding all statistical anomalies within the provided data frame.refiner/): A post-processing layer that filters the raw detection results. It applies business logic—such as minimum trace thresholds and directionality (UP/DOWN/BOTH)—to ensure only actionable regressions are reported.Instead of scanning every individual commit, the detector uses a “Domain” (a range of commits). It leverages dfiter.DataFrameIterator to efficiently slide a window across the data. For each step in the iteration, the target commit is placed at the center of the window (the “midpoint”), allowing the algorithms to compare a stable “before” baseline against an “after” state.
To prevent small regressions from being “drowned out” by the noise of a large dataset, the module supports GroupBy. If an alert is configured to group by a parameter (e.g., device), the detector doesn't run one large query. Instead, allRequestsFromBaseRequest expands the base request into multiple sub-queries, one for each unique value of that parameter. This ensures high-granularity detection.
The Store interface supports a transition from legacy to modern schemas:
ClusterSummary and the FrameResponse (the actual data points) are stored as JSON blobs. This provides the flexibility to update detection algorithms without requiring database schema migrations.detector.go, stepfit.go)The ProcessRegressions function is the entry point for detection. It manages a pool of workers to process multiple alert configurations in parallel.
tooMuchMissingData: A critical heuristic that filters out traces with more than 50% missing data on either side of the midpoint. This prevents “gaps” in data from being falsely identified as performance drops.StepFit: Implements individual trace analysis. It looks for a “Turning Point” in the trace and calculates the magnitude of the shift.regression.go)The Regression struct tracks the state of a detected anomaly. A single Regression object can track both a High (regression) and a Low (improvement) cluster for the same commit and alert ID. This allows the UI to present a unified view of all shifts occurring at a specific point in time.
sqlregressionstore, sqlregression2store)The module provides different implementations of the Store interface:
sqlregression2store: The modern implementation optimized for Spanner. It supports advanced features like NudgeAndResetAnomalies (moving a regression to a more accurate commit) and multi-source bug tracking (Manual, Auto-Triage, and Bisection).migration/: Orchestrates the background movement of data from the legacy V1 store to the V2 store, using a transactional “pull and mark” strategy to ensure no data is lost during the transition.The following diagram shows how a detection request is transformed into a confirmed regression.
[ RegressionDetectionRequest ] | v [ detector.ProcessRegressions ] |-- allRequestsFromBaseRequest (Expand GroupBy) |-- DataFrameIterator (Fetch window of commits) | +--> [ Algorithm: KMeans or StepFit ] | | | v | [ ClusterSummaries Generated ] | v [ refiner.Process ] |-- Validate Midpoint Match |-- Filter by Direction (UP/DOWN/BOTH) |-- Filter by Minimum Trace Count | v [ Store.SetHigh / SetLow ] |-- SQL UPSERT (Atomic Read-Modify-Write) |-- Persist JSON Cluster & Frame
continuous/)The continuous submodule acts as the driver for the detection engine. It monitors for new data ingestion (via PubSub) or timer triggers, identifies which Alert configs match the new data, and dispatches them to the detector. It also handles the logic for deduplicating notifications so users aren't alerted multiple times for the same evolving regression.
The continuous module is responsible for the background detection of performance regressions in the Skia Perf ecosystem. It acts as an orchestrator that monitors incoming data and configuration changes to identify shifts in performance metrics without requiring manual user intervention.
Continuous regression detection is the “auto-pilot” of Skia Perf. While users can perform ad-hoc analysis in the UI, this module ensures that every commit and every ingested data point is evaluated against predefined Alerts (regression detection configurations).
The module supports two primary operational modes:
Historically, regression detection was a heavy background process that scanned many commits for all configurations. The implementation of RunEventDrivenClustering addresses scalability by moving toward an incremental model. By leveraging PubSub notifications from the ingestion pipeline, the system can pinpoint exactly which alerts need to be re-evaluated, reducing the lag between data ingestion and regression notification.
Regression detection is computationally expensive, involving trace fetching and clustering algorithms. The module employs a hierarchical worker pool strategy to maintain performance:
processAlertConfigsWorkerCount: Distributes different Alert configurations across parallel goroutines.processAlertConfigForTracesWorkerCount: Within a single configuration (especially in event-driven mode), individual traces are processed in parallel.To prevent “alert fatigue” and ensure precision, the module dynamically refines queries. If an alert has a GroupBy setting (e.g., grouping by device), the continuous detector doesn't just run the generic alert query. Instead, it generates specific sub-queries for each group or trace ID found in the incoming data. This allows the system to detect regressions that might be “smothered” by noise in a larger dataset.
The following diagram illustrates how a new data point travels from ingestion to a potential notification:
[Data Ingestion] | v [PubSub Message] -> Received by buildTraceConfigsMapChannelEventDriven | |-- Decode IngestEvent (contains Trace IDs) |-- Match Trace IDs against all Alert Configs | v [Config/Trace Map] -> Dispatched to Workers | |-- Parallel Processing: ProcessAlertConfig | |-- Fetch Dataframe (commits within radius) | |-- Run Clustering / Step Fit | v [Regression Found?] | |-- YES: reportRegressions | |-- Store in Regression Store | |-- Send Notifcation (Email/Bug/etc.) | |-- Update existing notifications if direction matches | |-- NO: Continue
Continuous StructThe central coordinator that holds references to data stores (regression.Store, shortcut.Store), the git provider, and the notification system. It maintains the lifecycle of background detection.
continuous.goContains the core logic for the detection loops:
reportRegressions: Evaluates clustering results. It specifically looks for the “Step Point”—the exact commit where the performance shift occurred—and validates if the magnitude and direction (UP, DOWN, BOTH) meet the Alert's criteria.updateStoreAndNotification: Handles the deduplication of alerts. It checks if a regression for a specific commit and alert ID already exists. If it's new, it triggers the notifier; if it exists but has changed, it updates the existing notification.getQueryWithDefaultsIfNeeded: A utility that merges global instance defaults with specific alert queries, ensuring that filters like stat=value are applied even if omitted in a specific alert configuration.ProcessAlertConfig)This function transforms an alerts.Alert into a regression.RegressionDetectionRequest. It calculates the “Domain” (the range of commits to analyze) based on flags and configures the regression package to execute the heavy lifting of data fetching and mathematical analysis.
The behavior of this module is heavily influenced by the InstanceConfig and FrontendFlags:
This module provides a mechanism for migrating regression data from the legacy regressions table schema to the updated regressions2 table schema. It is designed to facilitate a smooth transition between storage formats while ensuring data integrity and allowing for incremental, background processing.
The migration is handled by the RegressionMigrator struct, which orchestrates the transfer of data between the legacy sqlregressionstore and the modern sqlregression2store. The primary motivation for this migration is to move toward a more robust schema that includes additional fields and better indexing, as supported by the regressions2 table.
The migrator is designed to be run either as a one-off batch process or as a periodic background task that slowly drains the legacy table without impacting system performance.
RegressionMigrator: The central coordinator. It holds references to both the legacy and new stores and manages the transactional logic required to ensure that a regression is either fully migrated or not at all.sqlregressionstore): Used to identify regressions that haven't been migrated yet (via GetRegressionsToMigrate) and to mark them as completed once they are successfully stored in the new schema.sqlregression2store): Handles the conversion of legacy regression objects into the new format and persists them to the updated database schema.The migration process follows a “pull and mark” strategy to allow for resumes and to prevent data loss.
batchSize). This prevents memory exhaustion when dealing with large historical datasets and allows the migration to be interleaved with standard production traffic.regressions2 -> Update migration status in regressions.AlertId and CommitNumber). The sqlregression2store handles the necessary transformations to ensure the data is compatible with the new format before writing.GetRegressionsToMigrate logic in the legacy store identifies these “stale” records so they can be re-synced to the new store.The migrator can run a background loop that periodically checks for work.
[ Timer Trigger ] | v [ RunOneMigration ] --------------------------+ | | v | [ Fetch Batch from Legacy Store ] | Timeout | | (1 Minute) v | [ For each Regression in Batch ] | | | +--> [ Start Transaction ] | | | | | v | | [ Write to New Store ] | | | | | v | | [ Mark Legacy as Migrated ] | | | | | v | | [ Commit Transaction ] | | | +--------------------------------------+
To initialize the migrator, the New function sets up the required dependencies, including an AlertConfigProvider. This is necessary because the new regression store requires context regarding alerts that the legacy store did not strictly enforce or link in the same manner.
The /go/regression/mocks module provides a set of autogenerated mock implementations for the regression storage layer in the Perf system. These mocks are built using mockery and are based on the testify framework, specifically designed to facilitate unit testing of components that interact with regression data without requiring a live database or complex setup.
The primary component in this module is the Store mock. In the production environment, a regression store is responsible for persisting and retrieving performance regression data, handling triage statuses, and linking regressions to bug tracking systems. By providing a mock version of this store, the system allows developers to:
SetBugID or TriageHigh—are called with the expected parameters during a test execution.The Store.go file defines the Store struct, which embeds mock.Mock. It replicates the interface used by the actual regression storage layer, covering a wide range of operations:
GetRegression, GetByIDs, and Range allow tests to simulate the retrieval of regression clusters based on commit numbers, alert IDs, or time ranges.TriageHigh, TriageLow, and SetHigh/SetLow enable the simulation of user actions or automated processes that mark regressions as “triaged” or “ignored”.SetBugID and GetBugIdsForRegressions facilitate testing the integration between Perf regressions and external issue trackers.DeleteByCommit and NudgeAndResetAnomalies support testing cleanup and data migration logic.When writing a test for a service that consumes the regression store (e.g., an alerting service), the standard workflow involves initializing the mock and defining expected behaviors:
Test Setup phase: +-------------------------+ +--------------------------+ | 1. Create Mock Store | ---> | 2. Define Expectations | | (mocks.NewStore) | | (store.On(...).Return)| +-------------------------+ +--------------------------+ | v Execution phase: +--------------------------+ +-------------------------+ | 3. Inject Mock into | | 4. Run Business Logic | <--- | System Under Test | +-------------------------+ +--------------------------+ | v Verification phase: +-------------------------+ | 5. Assert Expectations | | (Automatic Cleanup) | +-------------------------+
mockery. This decision ensures that the mock remains in sync with the actual Store interface defined in the regression package. If a new method is added to the store interface, regenerating the mock prevents compilation errors in tests.github.com/stretchr_testify/mock, the mocks provide a fluent API for setting up return values and verifying calls.pgx.Tx (PostgreSQL transactions) in methods like DeleteByCommit, allowing tests to simulate transactional integrity without a real database connection.The refiner module provides logic to validate and filter potential performance regressions detected by the Skia Perf system. It acts as a post-processing stage that transforms raw detection responses into confirmed regressions by applying specific criteria defined in alert configurations.
In the Skia Perf pipeline, regression detection identifies clusters of data points that exhibit significant changes in performance. However, not every statistical anomaly constitutes a regression of interest according to a user's specific alerting rules.
The refiner module implements the regression.RegressionRefiner interface to bridge the gap between “statistical detection” and “actionable alert.” It evaluates detection summaries against alert parameters such as the direction of the change (improvement vs. regression) and the magnitude of the impact (number of traces affected).
The refiner‘s primary responsibility is to ensure that a detected cluster aligns with the user’s intent. It uses the stepfit status (HIGH or LOW) to determine the direction of the performance shift.
stepfit statuses. For example, if an alert is configured only for “DOWN” (typically representing a performance drop in specific metrics), any “HIGH” step-fit clusters are discarded.To reduce noise from insignificant or flaky data, the refiner enforces a MinimumNum threshold. This represents the minimum number of keys (traces) that must be part of a cluster for it to be promoted to a “Confirmed Regression.” Clusters failing to meet this count are filtered out of the final summary.
Located in default_regression_refiner.go, this is the standard implementation of the refinement logic. It processes a slice of RegressionDetectionResponse objects and returns a slice of ConfirmedRegression objects.
The refinement workflow follows these steps:
Alert config.Alert config.ClusterSummary and wraps it in a response if any clusters survived the filtering process.Raw Detection Responses | v +-----------------------------+ | Calculate Midpoint Commit | <--- Ensures we are looking at the | (from DataFrame Header) | correct point in time. +-----------------------------+ | v +-----------------------------+ | Iterate through Clusters | | | | 1. Check StepPoint Offset | ---- Fail ----> [ Discard Cluster ] | 2. Check Min Trace Count | ---- Fail ----> [ Discard Cluster ] | 3. Check Direction Match | ---- Fail ----> [ Discard Cluster ] +-----------------------------+ | | Pass v +-----------------------------+ | Build Filtered Summary | +-----------------------------+ | v Confirmed Regressions
The regressiontest module provides a standardized suite of functional tests for implementations of the regression.Store interface. By centralizing these tests, the project ensures that different storage backends (e.g., SQL-based, memory-based, or Datastore) behave consistently and adhere to the expected contract of the Perf regression system.
The primary goal of this module is to enforce a uniform behavior across various regression storage implementations. Instead of duplicating test logic for every new storage driver, developers can import this package and run their implementation against the SubTests suite.
This approach ensures that:
frame.FrameResponse and clustering2.ClusterSummary.The module exports a SubTests map, which associates descriptive names with SubTestFunction signatures. This allows implementation-specific test files to iterate over the map and run each test within their own environment (e.g., using a local emulator or a real database instance).
The tests within regressiontest.go cover the lifecycle of a regression record:
SetLowAndTriage verifies the “happy path” of creating a regression, detecting if it is new versus an update, and updating its triage status.Write ensures that multiple regressions can be persisted efficiently in a single operation, while DeleteByCommit verifies the cleanup logic.Range_Exact validates boundary conditions for commit-based lookups, and GetOldestCommit ensures the store can correctly identify the earliest point in its history, which is critical for background cleanup tasks.When a new storage backend is implemented for regressions, it follows this interaction pattern with the regressiontest module:
[ New Store Implementation ] [ regressiontest Module ]
| |
|-- Provides initialized Store ------>|
| |
| <---------- Executes SubTests -----|
| (SetLow, Range, Write, etc.)
| |
|-- Returns Success/Failure -------->|
| |
[ Validation Complete ]
The module relies on several key data structures from the Perf domain:
regression.Store: The interface being validated.types.CommitNumber: Used as the primary key for organizing regressions.frame.FrameResponse & clustering2.ClusterSummary: These are passed to the store to ensure that implementation-specific serialization (like JSON blobs in a database) correctly preserves the data needed for the UI.The sqlregression2store module provides a Spanner-backed implementation of the regression.Store interface. Its primary purpose is to persist and manage performance regressions detected within the Skia Perf system. It handles the storage of statistical metadata, triage states, and the raw data frames that justify a regression's existence.
This module is the “V2” storage layer, designed to be more flexible and descriptive than previous iterations by consolidating alert metadata, subscription links, and multi-source triage information (manual, auto-triage, and auto-bisect) into a unified relational schema.
A key responsibility of this module is handling the different ways regressions are identified based on the alerting algorithm:
<commit, alert> pair, the store updates the existing record with more accurate clustering summaries.AllowMultipleRegressionsPerAlertId configuration, the store can either treat the alert as a single entity or allow multiple distinct regression records for the same alert if they involve different traces.To support the transition from older schemas and maintain data integrity during concurrent updates, the store utilizes a readModifyWriteCompat pattern.
UPSERT (INSERT ... ON CONFLICT) pattern.The store relies heavily on JSONB columns (specifically for frame and cluster_summary). This allows the database to store complex, nested Go structures from the ui/frame and clustering2 packages without requiring a rigid table schema for every statistical detail. This choice prioritizes flexibility in the analysis pipeline over relational normalization for these specific fields.
The module implements a sophisticated bug-tracking resolution logic in GetBugIdsForRegressions. It doesn't just store a single bug ID; it joins across AnomalyGroups and Culprits tables to provide a comprehensive view of:
The store then sorts these bugs by a priority rank (Manual > Auto-Triage > Auto-Bisect) to ensure the most relevant context is presented to the user first.
The main struct implementing regression.Store. It manages a pool of database connections and maintains a cache of prepared SQL statements generated from templates. It also tracks metrics for high/low regression detections.
Instead of static strings, the module uses Go's text/template to build SQL queries. This allows for dynamic column injection based on the spanner schema definitions, ensuring that the Go code and SQL schema stay in sync regarding field names.
The store provides specialized methods for the operational lifecycle of an anomaly:
NudgeAndResetAnomalies allows moving a regression's commit range (e.g., if a developer identifies a more accurate culprit range) while resetting its triage status.IgnoreAnomalies and ResetAnomalies provide bulk updates to triage states, transitioning records between untriaged, negative, and ignored.The workflow for recording a newly detected performance shift:
Detection Logic -> SetHigh/SetLow() | v GetAlertConfig() <------- [Check Algo: KMeans vs StepFit] | v readModifyWriteCompat() (Transaction Start) | +-----------+-----------+ | | [Existing Match] [New Regression] | | Apply UpdateFunc Initialize UUID | Populate Medians | Set PrevCommit +-----------+-----------+ | v writeSingleRegression() (UPSERT into DB) | (Transaction Commit)
How the store aggregates different sources of truth for a regression:
Request: GetBugIdsForRegressions(ids) | v 1. Load Manual Bug IDs (from Regressions2 table) 2. JOIN AnomalyGroups (on regression_id) -> Get Auto-Triage IDs 3. JOIN Culprits (on anomaly_group_id) -> Get Auto-Bisect IDs | v sortBugs(Manual, AutoTriage, AutoBisect) | v Return enriched Regression objects
The schema package defines the structured SQL representation for performance regressions within the Skia Perf system. It serves as the source of truth for the Regression2Schema table, which is designed to persist regression data, triage states, and associated metadata.
This module bridges the gap between Go data structures (like clustering2.ClusterSummary and frame.FrameResponse) and the relational database, ensuring that complex performance analysis results are searchable and durable.
Unlike earlier iterations of regression storage, this schema focuses on consolidating all aspects of a regression event—its location in time (commits), its statistical significance (medians), and its operational status (triage, bugs, alerts)—into a single relational structure. This allows for complex querying and reporting without needing to join across high-volume telemetry tables.
The schema uses JSONB for the ClusterSummary and Frame fields. This is a deliberate choice to:
clustering2 or frame evolve, the database schema does not require a migration, provided the data remains JSON-serializable.Regressions are tracked via two specific commit points: CommitNumber and PrevCommitNumber. This allows the system to define the exact range where a performance shift occurred. Additionally, the inclusion of IsImprovement (boolean) and ClusterType (string) allows the UI and automated tools to quickly filter out noise or focus specifically on regressions vs. improvements.
The schema defines several composite and single-column indexes to support common query patterns in the Perf UI and alerting pipelines:
by_commit_alert supports checking if an alert has already fired for a specific commit.by_sub_name_creation_time is optimized for showing the most recent regressions for a specific subscription (e.g., a “Regression Dashboard” view).by_commit_and_prev_commit is tailored for the GetByRevision workflow, allowing the system to quickly retrieve regressions that fall within specific git ranges.The primary struct in schema.go. It utilizes Go struct tags to define the DDL (Data Definition Language) for the underlying SQL table.
ID) for global uniqueness and links to the alerting subsystem via AlertID and SubName.MedianBefore and MedianAfter as REAL (float32) values. These are critical for calculating the “magnitude” of a regression without re-processing the raw trace data.TriageStatus, TriageMessage, and BugID. These fields represent the human-in-the-loop component of the performance monitoring workflow, tracking whether a regression has been acknowledged or associated with a bug tracker entry.The following diagram illustrates how the fields in this schema represent the lifecycle of a detected regression:
Discovery Phase Analysis Phase Operational Phase (Detection Logic) (Statistical Data) (User Intervention) ----------------- ------------------ ------------------- AlertID -----> MedianBefore TriageStatus CommitNumber -----> MedianAfter -----> TriageMessage SubName -----> ClusterSummary BugID ClusterType -----> Frame CreationTime
The sqlregressionstore module provides a persistent implementation of the regression.Store interface using a SQL database backend. It is responsible for storing, retrieving, and updating performance regressions detected by the Perf system.
The storage strategy is built on a hybrid approach: relational indexing for metadata and JSON serialization for payload.
commit_number and alert_id. This ensures that for any given commit, a specific alert configuration can only produce one regression record, enforcing data integrity at the database level.regression.Regression Go struct to evolve—adding or removing fields—without requiring expensive and risky database migrations for every change in the detection algorithms.Located in sqlregressionstore.go, this is the primary struct implementing the regression.Store interface. It manages the lifecycle of regression data:
SetHigh, SetLow, or TriageHigh) into SQL statements. It handles the mapping between string-based Alert IDs used in the UI/API and the integer-based Alert IDs used in the database.readModifyWrite): This internal method ensures that updates to a regression record are atomic. It begins a transaction, locks the row (if the database supports it), deserializes the JSON, applies a callback function to modify the data, and serializes it back to the database.Range method allows for efficient retrieval of all regressions across a span of commits, which is a common requirement for rendering the Perf dashboard or generating reports.The module includes specific logic to support data evolution. It tracks a migrated status and a regression_id. This allows the system to background-migrate records from this “legacy” store to newer iterations of the regression schema (e.g., regression2) without downtime.
GetRegressionsToMigrate retrieves batches of unmigrated records.MarkMigrated updates the record status once it has been successfully moved to the new store.The following diagram illustrates how a regression update (e.g., updating a “high” regression) flows through the store:
[ Caller: SetHigh ] | v [ SQLRegressionStore.readModifyWrite ] | |---- 1. BEGIN TRANSACTION |---- 2. SELECT regression (JSON) FROM Regressions WHERE commit_number AND alert_id |---- 3. JSON Unmarshal -> regression.Regression (Go Struct) |---- 4. Execute Callback: Update HighStatus, ClusterSummary, etc. |---- 5. JSON Marshal -> Updated JSON string |---- 6. UPDATE Regressions SET regression = $1, migrated = false |---- 7. COMMIT TRANSACTION | v [ Success/Error ]
pool.Pool and standard SQL syntax allows the store to be portable across different SQL backends supported by the infra.perf_regression_store_found counters (partitioned by “high” or “low” direction), providing visibility into the frequency of regression detection and storage activity.GetRegressionsBySubName or GetByIDs) are explicitly left unimplemented in this module. These features are offloaded to the newer regression2 store, reinforcing this module's role as a stable, primary storage for established regression workflows while supporting the transition to more advanced querying capabilities.The sqlregressionstore/schema module defines the relational database structure used to persist regression data within the Perf system. It serves as the formal bridge between the Go-based regression.Regression objects and their storage representation in SQL.
The schema is designed around a composite primary key consisting of commit_number and alert_id. This reflects the operational reality of the Perf system: a regression is uniquely identified by where it happened (the commit) and why it was detected (the specific alert configuration).
By using a composite key instead of a generic auto-incrementing integer, the schema enforces data integrity at the database level, preventing duplicate regression entries for the same alert on the same commit.
The primary structure in this module, RegressionSchema, defines the columns for the Regressions table. Its fields reflect a balance between structured querying and flexible data storage:
CommitNumber, AlertID): These fields are extracted from the regression object to allow the database to perform efficient filtering and joins. Storing the AlertID as a first-class column allows the system to quickly retrieve all regressions associated with a specific detection configuration.Regression): Instead of normalizing every possible attribute of a regression (which might change as detection algorithms evolve), the bulk of the regression data is stored as a JSON string. This “schemaless-within-schema” approach provides flexibility for future changes to the regression.Regression Go struct without requiring database migrations.Migrated, RegressionId): These fields are specifically included to handle the lifecycle of data evolution. The Migrated boolean and the temporary RegressionId facilitate the movement of records between different iterations of the schema (e.g., transitioning to a “regression2” table) while ensuring no data is lost or duplicated during the transition.When a regression is detected or updated, the system maps the high-level Go objects into this schema for persistence:
[ Go Regression Object ] | | 1. Extract Identity v +------------------------+ 2. Serialize Remainder | commit_number (Key) | <---------------------------+ | alert_id (Key) | | | migrated (Status) | +----------------------+ | regression (JSON) | <--- | { "low": ..., | +------------------------+ | "high": ..., | | | "frame": ... } | | +----------------------+ v [ SQL Persistent Storage ]
This architecture ensures that while the database can efficiently index and manage the lifecycle of a regression, the complex details of the detection results remain encapsulated within the JSON blob, maintaining a clean separation between indexing concerns and data representation.
The samplestats module provides tools for performing statistical analysis on performance metrics. It is designed to compare two sets of samples (typically “before” and “after” a change) to determine if there is a statistically significant difference between them. This is primarily used within the Perf system to detect regressions or improvements in traces.
The core functionality revolves around taking two maps of trace data and producing a structured analysis. The module handles the heavy lifting of statistical testing, outlier detection, and result ordering, allowing callers to focus on high-level performance trends rather than raw data manipulation.
analyze.go)The Analyze function is the primary entry point. It correlates “before” and “after” samples based on their Trace ID. For every pair of samples found, it:
UTest or TTest).Delta (percentage change in mean) only if the result is statistically significant ($p < \alpha$).metrics.go)Before analysis, raw values are transformed into Metrics objects. This step handles:
Config, it calculates the 25th and 75th percentiles and discards values outside $1.5 \times IQR$.Percent field (Standard Deviation / Mean), which helps in understanding the relative volatility of a specific trace.sort.go)Since analysis can involve thousands of traces, the module provides a flexible sorting mechanism. Results can be ordered by Trace Name or by the magnitude of the Delta. It specifically handles NaN values (representing insignificant changes) by grouping them together during the sort process.
The following diagram illustrates the data flow from raw samples to a sorted analysis result:
[Raw Samples] -> [IQR Filter (Optional)] -> [Statistical Test] -> [Delta Calculation] | | | | (Before/After) (Remove Outliers) (Compare P vs Alpha) (% Change if P < Alpha) | v [Final Result] <---------- [Sort Results] <--------- [Collection of Rows]
Config struct allows users to toggle the statistical test type, set the Alpha threshold (defaulting to 0.05), and enable outlier removal.Row, containing the calculated Delta, the P value, and the underlying Metrics. If a test fails (e.g., all samples are identical), the error is captured in the Note field rather than crashing the analysis.go-moremath/stats for robust implementations of the Mann-Whitney and Welch's T-test algorithms.The sheriffconfig module is the management layer for Skia Perf's alerting and subscription system. It provides the mechanism for defining “Sheriff Configurations”—version-controlled rules that dictate how the performance monitoring engine should identify anomalies and which teams should be notified.
This module acts as a bridge between human-readable configuration (stored as code in LUCI Config) and the operational database (Spanner) that drives the Perf alerting engine. It ensures that performance monitoring is “Configuration as Code,” allowing teams to manage their alert thresholds, bug-filing metadata, and trace selections through standard code review processes.
The design of sheriffconfig shifts the responsibility of alert management from manual UI interactions to automated, versioned workflows.
/proto)The module uses Protocol Buffers to define the structure of a SheriffConfig. This schema decouples the detection intent (e.g., “watch for a 10% shift in memory usage”) from the implementation details of the detection algorithms. It supports a strategy pattern where users can combine different statistical methods (Step) with grouping strategies (Algo).
/validate)The validation logic acts as a strict gatekeeper. It enforces business rules and structural integrity before any configuration is persisted.
~) is compiled during validation. This prevents runtime crashes in the detection engine caused by malformed regex in a config file./service)The service layer manages the lifecycle of configuration data. It polls external configuration sources and reconciles them with the internal SubscriptionStore and AlertStore.
AnomalyConfig in a proto can expand into multiple Alert objects in the database. This allows a user to define a single logical subscription that applies to multiple distinct sets of telemetry traces.The following diagram shows the path a configuration takes from a Git repository to the Perf alerting database:
[ Git Repository ] -> [ LUCI Config ] -> [ sheriffconfig/service ]
|
v
[ sheriffconfig/validate ]
(Check: Names, Regex, Fields)
|
v
[ Spanner DB ] <--- (Atomic Transaction) --- [ Transformation ]
| (Build Queries, Map Priorities)
|
+--> [ Perf Alerting Engine ]
(Identify Regressions based on stored Alerts)
Instead of a proprietary query language, the module utilizes standard URL query strings for trace matching.
net/url parsers and provides a format that is easily testable and familiar to developers.buildQueryFromRules function in the service layer transforms these human-readable rules into normalized query strings used by the backend database to filter trace data efficiently.When a new configuration file is ingested, the service uses a single database transaction to replace alerts.
The service is designed to be “instance-aware.” A single large configuration file might contain subscriptions for v8, chrome, and skia.
sheriffconfigService instance is configured with an instance identifier. It filters the incoming global configuration, only processing and storing the subscriptions that match its assigned instance. This allows for centralized configuration files without leaking cross-project data or overloading specific service instances.The /go/sheriffconfig/proto module serves as the foundational definition for the Skia Perf alerting and configuration system. It manages the lifecycle of performance monitoring by defining how users describe “what to watch” and “how to react” when performance changes. This module provides the core data structures that bridge the gap between human-readable configurations and the automated backend engines responsible for anomaly detection and issue tracking.
The architecture is built around a centralized configuration model. Instead of hard-coding detection logic or scattering alert settings across various services, this module consolidates the entire “intent” of a performance sheriff into a structured format.
The implementation favors a strategy pattern for anomaly detection. Rather than defining a single detection path, the system allows sheriffs to combine different statistical methods (defined via Step) with different grouping strategies (defined via Algo). This decoupling allows the system to scale from simple “threshold exceeded” alerts to complex “cluster-based” analysis where multiple related traces must shift together to trigger an alert.
The selection logic is designed to handle the vast scale of Skia Perf data. By utilizing a rule-based system for trace selection, the module allows for:
While the v1 subdirectory contains the active implementation, the root module acts as the container for these definitions. The use of Protocol Buffers ensures that the configuration is both language-agnostic and forward-compatible. This is critical for Skia Perf, where configuration may be stored in Git or a database for long periods while the backend software evolves.
The module defines the transition from detection to action. The implementation choices here reflect a desire to reduce “alert fatigue”:
AnomalyConfig.group_by and Algo settings, the system determines if multiple anomalies should be consolidated into a single report.Action defined, the system either silently logs the event, creates a manual triage entry, or triggers an automated bisection.The following diagram illustrates how the components defined in these protos interact to process performance data:
[ Perf Data ] -> [ Rule Matching ] -> [ Detection Engine ] -> [ Action Dispatcher ]
| | |
(Uses Match/Exclude) (Uses Algo/Step) (Uses Action/CC)
| | |
v v v
Identify relevant Apply statistical Create Bug or
metric traces analysis window trigger Bisect
v1/: This subdirectory contains the versioned definitions of the API. By isolating the versioned protos, the project allows for breaking changes in the configuration schema while maintaining compatibility for existing sheriff configurations.v1/sheriff_config.proto: The definitive source for the data model. It encodes the business logic of how subscriptions, detection rules, and alerting metadata relate to one another.v1/sheriff_config.pb.go: The compiled Go representation of the configuration. This is the primary interface used by the Skia Perf backend to interact with the configuration data.The go/sheriffconfig/proto/v1 module defines the data structures and serialization format for Skia Perf's anomaly detection and alerting system. It uses Protocol Buffers to specify how performance metrics are selected, how regressions (anomalies) are detected within those metrics, and how the system should respond (e.g., filing bugs or initiating bisections).
This module acts as the contract between the configuration stored in the system and the Perf engine that processes incoming data.
The configuration hierarchy is designed to support multi-tenant monitoring where different teams (Sheriffs) can track specific subsets of performance data with customized detection logic.
SheriffConfig
└── [Subscription]
└── [AnomalyConfig]
└── Rules (Metric Selection)
AnomalyConfig objects to apply different detection logic to different sets of metrics.Traces are selected using a query-string format: {key1}={value1}&{key2}={value2}.
match field uses a wildcard-by-default approach. If a key is omitted, it matches everything. It supports regex-style matching (e.g., bot=~lacros-.*-perf).exclude field allows for fine-grained removal of specific noisy traces.OR operation, while keys within a single string and exclusion rules are treated as AND operations.The module defines several strategies for identifying regressions through the Step and Algo enums:
ABSOLUTE_STEP), percentage-based changes (PERCENT_STEP), and advanced statistical tests like COHEN_STEP or MANN_WHITNEY_U.STEPFIT) or grouped together using KMEANS to identify collective shifts in performance across multiple bots or benchmarks.radius: Controls the window of commits analyzed around a potential change point.direction: Allows sheriffs to ignore “improvements” (e.g., a speed increase) and only alert on regressions.group_by: A powerful field that allows splitting the analysis across specific keys, ensuring that anomalies are only grouped if they share common attributes.The Action enum within AnomalyConfig determines the lifecycle of a detected anomaly:
Subscription.sheriff_config.proto: The primary source of truth defining the messages and enums. It contains extensive documentation on the expected string formats for rules and the behavior of detection enums.sheriff_config.pb.go: The generated Go code providing the structures used by the Perf service to parse and process configurations.generate.go: Contains the go:generate directives used to keep the Go code in sync with the protobuf definitions.The sheriffconfig/service module acts as the synchronization engine between externalized “Sheriff Configurations” stored in LUCI Config and the internal database used by Skia Perf to track subscriptions and trigger alerts. By treating these configurations as code, the service allows teams to manage anomaly detection rules, bug filing metadata, and ownership via version-controlled repositories.
The primary role of this service is to fetch, validate, transform, and persist configurations. It bridges the gap between the high-level, human-readable Protobuf definitions (SheriffConfigs) and the low-level SQL structures required by the Perf alerting engine.
To minimize database churn and ensure consistency, the service uses a revision-checking mechanism. Before processing a subscription, it queries the subscriptionStore to see if a subscription with the same name and revision already exists.
A single LUCI Config file may contain subscriptions for multiple Perf instances (e.g., “chrome-internal”, “v8”, “skia”).
sheriffconfigService is initialized with a specific instance string. During the processConfig phase, it discards any subscription defined in the Protobuf that does not match its assigned instance. This allows centralized management of alerts across a project while maintaining instance-specific execution.Sheriff configurations use a rule-based system (match and exclude lists) to define which telemetry traces an anomaly config should monitor.
buildQueryFromRules function transforms these rules into URL-style query strings. It handles exclusion logic by prefixing values with !. These queries are then stored in the Alert objects, which the Perf engine uses to filter incoming data.When importing a config file, the service wraps the insertion of both subscriptions and alerts (via ReplaceAll) into a single database transaction.
The service typically runs as a background routine (StartImportRoutine), polling LUCI Config at a defined interval.
[ LUCI Config ] --(Fetch Project Configs)--> [ service.ImportSheriffConfig ]
|
[ validate.ValidateConfig ]
|
+----------------------------------------------+----------------------------------------------+
| | |
[ Filter by Instance ] [ Transform to Entities ] [ Check Revision ]
(Drop if mismatch) (Map Protos to DB Models) (Skip if exists)
| | |
+----------------------------------------------+----------------------------------------------+
|
[ DB Transaction (Spanner) ]
|-- Insert Subscriptions
|-- Replace All Alerts
+---------------------------> [ Success/Commit ]
sheriffconfigService implementation. It manages the dependency injection of stores (Alert, Subscription) and the LUCI Config API client. It also defines the mapping constants for algorithm types (e.g., STEPFIT, KMEANS) and action types (e.g., TRIAGE, BISECT).The service performs significant data translation to bridge the two domains:
2) if they are omitted in the configuration.AnomalyConfig inside a subscription can generate multiple Alert objects—one for each match rule provided. This expansion allows a single subscription to monitor several distinct sets of traces with different detection parameters (like radius or threshold).The validate module provides the logic necessary to ensure the integrity and correctness of Sheriff Configurations used in the Perf tool. It acts as a gatekeeper, verifying that configuration files (typically managed via LUCI Config) adhere to structural and business rules before they are processed by the system.
Sheriff configurations define how anomalies (regressions) are assigned to different teams or “subscriptions.” This module takes raw data—usually base64-encoded prototext from an external configuration service—deserializes it into Go protocol buffer objects, and runs a battery of validation checks.
The validation logic is hierarchical, mirroring the structure of the SheriffConfig proto:
The module uses the standard URL query format (e.g., key1=val1&key2=val2) to define match and exclude patterns.
net/url.ParseQuery), providing a familiar and robust syntax for users to define trace filters without requiring a custom DSL parser.~ are treated as regular expressions. The validator explicitly compiles these during the validation phase to catch syntax errors early, preventing runtime failures during actual anomaly matching.The DeserializeProto function specifically handles Base64 decoding followed by Prototext unmarshaling.
validate.go)This is the core of the module. It implements a top-down validation strategy:
ValidateConfig: The entry point for validating a full SheriffConfig. It ensures that the config is not empty and that every subscription has a unique name, which is critical for identifying subscriptions in logs and UI.validateSubscription: Ensures that every subscription is actionable. It mandates a Name, ContactEmail, BugComponent, and Instance. A subscription without these cannot effectively track or report anomalies.validateAnomalyConfig: Focuses on the rules of the subscription. It requires at least one Match pattern, as a configuration that matches nothing is considered a configuration error.validatePattern: The most granular validation step.The typical workflow for a configuration string being processed by this module is:
[ Base64 String ]
|
v
[ DeserializeProto ] ---------------------> [ Decode Base64 ]
| |
| v
| [ Unmarshal Prototext ]
v |
[ *SheriffConfig Proto ] <-------------------------/
|
v
[ ValidateConfig ]
|
+--> [ validateSubscription ]
|
+--> [ validateAnomalyConfig ]
|
+--> [ validatePattern ] (Match)
|
+--> [ validatePattern ] (Exclude, singleField=true)
| Level | Constraint |
|---|---|
| Global | Subscription names must be unique. |
| Subscription | Must contain Name, ContactEmail, BugComponent, and Instance. |
| Anomaly Config | Must have at least one Match pattern. |
| Pattern (Match) | Must be a valid URL query string; must have $\ge 1$ key. |
| Pattern (Exclude) | Must have exactly 1 key. |
| Pattern (Values) | Values starting with ~ must be valid Go Regex. |
The shortcut module provides a unified interface and core logic for managing “shortcuts” within the Perf application. A shortcut is a persistent, shareable identifier that represents a collection of performance trace IDs. Instead of passing around large lists of trace keys in URLs or API requests, the system generates a compact hash-based ID that can be used to retrieve the original set of keys.
A fundamental design choice in this module is the use of content-addressable storage. The ID of a shortcut is not a random UUID or an auto-incrementing integer; instead, it is a deterministic hash of the trace keys it contains.
The IDFromKeys function implements this logic:
This approach ensures that identical sets of traces are automatically deduplicated in the underlying storage, as they will always resolve to the same primary key.
The module defines a Store interface that abstracts the persistence layer. This allows the application to remain agnostic of whether shortcuts are stored in a relational database, an in-memory cache, or a cloud-native solution.
The interface supports:
Shortcut object (InsertShortcut) or directly from an io.Reader (Insert), which is useful for processing JSON payloads from HTTP requests.GetAll method returns a channel of shortcuts. This design decision facilitates large-scale data migrations or maintenance tasks without loading the entire shortcut database into memory, preventing OOM (Out-Of-Memory) errors.Shortcut data structure (a simple wrapper around a slice of strings) and the Store interface. It contains the logic for ID generation and normalization.Store interface. These are used across the Perf codebase to test components that depend on shortcuts (like the dashboard or alerting systems) without requiring a live database.Store interface (e.g., for a new database backend) uses this suite to verify it correctly handles edge cases, such as key normalization and asynchronous retrieval.Store. It maps the Go interface to a SQL backend (PostgreSQL/Spanner), handling the serialization of trace keys into JSON blobs for efficient storage and retrieval.The following diagram illustrates how data flows through the module from creation to retrieval:
Input Keys shortcut Module Storage Backend ========== =============== =============== | | | | 1. Create Shortcut | | |------------------------> | | | | 2. Sort Keys & Hash | | | 3. Generate ID ("X...") | | | | | | 4. Persist (ID, Keys) | | |----------------------------> | | <------- Return ID ------| | | | | | | | | 5. Get(ID) | | |------------------------> | | | | 6. Fetch by ID | | |----------------------------> | | <---- Return Keys -------| |
This module is typically used by the Perf frontend when a user wants to “pin” a specific view of traces or share a link to a complex query. The frontend sends the list of trace IDs to the backend, which uses this module to generate and store the shortcut, returning a short ID that is then embedded in the URL.
The go/shortcut/mocks module provides a set of autogenerated mock implementations for the shortcut package, specifically targeting the Store interface. These mocks are designed to facilitate unit testing of components that depend on persistent shortcut storage without requiring a live database or complex setup.
The primary motivation for this module is to decouple the business logic of the Perf application from its storage layer during testing. By using mocks, developers can simulate various database behaviors, such as:
The mocks are generated using mockery and are based on the testify/mock framework. This allows for a declarative style of testing where expectations are set up at the beginning of a test case.
This file contains the Store struct, which implements the shortcut.Store interface. It provides mockable versions of all standard CRUD operations required for shortcut management:
Get, GetAll): Allows tests to return predefined shortcut objects or channels. GetAll is particularly useful for testing batch processing or migration scripts that iterate over all stored shortcuts.Insert, InsertShortcut): Enables testing of how the system handles new shortcut creation. The Insert method handles raw io.Reader input, while InsertShortcut handles structured objects, reflecting the dual ways shortcuts might be ingested.DeleteShortcut): Supports testing of cleanup routines and transaction handling, as it accepts a pgx.Tx parameter to simulate behavior within a database transaction.A typical testing workflow using this module involves initializing the mock, setting expectations, and then injecting the mock into the consumer service.
+-------------------+ +-----------------------+ +-------------------------+
| Unit Test | | Mock Store | | Consumer Service |
+---------+---------+ +-----------+-----------+ +------------+------------+
| | |
| 1. NewStore(t) | |
+---------------------------->| |
| | |
| 2. On("Get").Return(...) | |
+---------------------------->| |
| | |
| 3. Call Method Under Test | |
+-----------------------------|--------------------------->|
| | |
| | 4. Get(ctx, id) |
| |<---------------------------+
| | |
| | 5. Return Mock Data |
| +--------------------------->|
| | |
| 6. AssertExpectations() | |
+---------------------------->| |
The NewStore function simplifies this process by automatically registering cleanup functions that assert all defined expectations were met before the test finishes, reducing boilerplate code in the test suite.
The shortcuttest module provides a standardized compliance suite for validating implementations of the shortcut.Store interface. By centralizing test logic, the module ensures that different storage backends (e.g., SQL-based, in-memory, or cloud-native) exhibit consistent behavior regarding data persistence, normalization, and error handling.
The primary goal of shortcuttest is to enforce the contract of the shortcut.Store interface. A key design decision in the Perf system is that shortcuts—which are collections of keys representing trace sets—should be idempotent and normalized.
The test suite enforces the following behaviors across all implementations:
shortcut.Store.The module exports a SubTests map, which associates descriptive names with SubTestFunction signatures. This allows developers implementing a new shortcut.Store to run the entire suite against their implementation using a standard Go sub-test pattern:
Test Runner (External) | +---- Loop over SubTests ----+ | | v v [ InsertGet ] [ GetAll ] Verifies ID generation Validates stream-based and key normalization. retrieval via channels.
Instead of providing a single monolithic test, the module breaks down requirements into specific functional checks:
InsertGet: This function validates both Insert (via io.Reader) and Get. It specifically checks that the shortcut.Shortcut retrieved from the store has its Keys slice sorted alphabetically, even if the input was unsorted. This ensures that the “Shortcut” concept remains a canonical set of trace keys.GetAll: Validates the asynchronous retrieval pattern used for maintenance or migration tasks. It ensures that the store can correctly stream all existing shortcuts into a channel.DeleteShortcut: Confirms that the store correctly handles the removal of data and that subsequent Get calls reflect the deletion.GetNonExistent: Ensures that the store returns an error (rather than crashing or returning an empty object) when queried with a missing or invalid ID.The module relies on the testify library to provide clear assertions. Because it is a testing utility, it resides in its own package to avoid introducing testing dependencies (like testify) into the production shortcut package.
When implementing a new store, the developer typically creates a test in their local package that spins up the required infrastructure (like a local SQL instance), creates the store instance, and passes it to the functions defined in shortcuttest.
The sqlshortcutstore module provides a production-grade implementation of the shortcut.Store interface using a SQL backend (compatible with PostgreSQL and Spanner). It facilitates the persistence, retrieval, and management of “shortcuts”—compact, shareable identifiers that represent collections of performance trace IDs.
This module acts as the concrete bridge between the high-level Perf shortcut logic and the underlying relational database, ensuring that complex query definitions can be saved and referenced by a simple hash-based key.
The store utilizes a content-addressable approach for shortcut IDs. When a shortcut is inserted, the ID is generated based on the hash of the trace keys it contains (via shortcut.IDFromKeys).
While the backend is a SQL database, the trace IDs themselves are stored as a single JSON-encoded string in a TEXT column.
INSERT statement, the shortcut.Shortcut Go struct is marshaled into a JSON blob. Upon retrieval, this blob is unmarshaled back into the struct.The GetAll method returns a Go channel (<-chan *shortcut.Shortcut) rather than a slice.
SQLShortcutStoreLocated in sqlshortcutstore.go, this is the primary struct implementing the storage logic. It encapsulates a pool.Pool to communicate with the database.
query format. This prevents malformed data from polluting the database.DeleteShortcut method optionally accepts a pgx.Tx (transaction) object. This allows deletion operations to be part of a larger atomic unit of work, which is useful when cleaning up related resources.The module uses a central statements map to define its SQL queries. This separates the SQL syntax from the Go logic, making the code easier to maintain and ensuring that queries like ON CONFLICT (id) DO NOTHING are handled consistently.
The following diagram demonstrates the lifecycle of a shortcut being stored and retrieved:
Application Code SQLShortcutStore SQL Database ================ ================ ============ | | | 1. Insert(Reader) ------> [ Decode JSON ] | | [ Validate Keys] | | [ Generate ID ] | | [ Encode JSON ] | | |--- INSERT (id, blob) ------> | | <----- Return ID ------ | (ON CONFLICT IGNORE) | | | | | | | 2. Get(ID) -------------> [ Query Row ] | | | <------- SELECT blob ------- | | [ Decode JSON ] | | <--- Return Struct ---- | |
schema sub-module. It defines the Shortcuts table with an id as the Primary Key and trace_ids for the data payload.sqlshortcutstore_test.go leverages sqltest to spin up ephemeral Spanner instances, ensuring the store is tested against real database engines rather than mocks. It runs the standard suite of shortcut tests defined in shortcuttest to ensure interface compliance.The schema module defines the structural contract for persisting performance trace shortcuts in a SQL database. A shortcut in this context is a persistent mapping between a unique identifier and a collection of Trace IDs, allowing users to reference complex sets of performance data via a compact, shareable key.
The schema is intentionally kept minimal, prioritizing serialization flexibility and retrieval speed over database-level normalization.
ShortcutSchemaThe ShortcutSchema struct serves as the single source of truth for the database table structure. Its design reflects two primary requirements:
ID field is defined as a TEXT UNIQUE NOT NULL PRIMARY KEY. This ensures that every shortcut has a permanent, collision-free anchor. The use of a string-based ID (typically a hash) allows the ID itself to be a representation of the content it points to, facilitating deduplication before insertion.TraceIDs field is stored as a TEXT column intended to hold a serialized shortcut.Shortcut JSON object.The decision to store trace IDs as a serialized JSON blob rather than in a relational junction table (e.g., a many-to-many mapping of shortcut_id to trace_id) was driven by the access patterns of the Perf system:
TraceIDs as an opaque JSON blob at the database layer, the internal structure of the shortcut.Shortcut Go struct can evolve without requiring a database migration.The following diagram illustrates how the schema facilitates the lifecycle of a shortcut:
Application Layer Schema Layer (SQL) Database Storage ================= ================== ================ 1. Create Shortcut ------> [ ID (Hash) ] ------> INSERT INTO Shortcuts (List of IDs) [ TraceIDs (JSON)] (id, trace_ids) | v 2. Request Shortcut <------ [ ID (Primary Key)] <------ SELECT trace_ids (via ID) [ TraceIDs (JSON)] WHERE id = ? | v 3. Deserialize JSON --------> Result: List of IDs
schema.go: Defines the ShortcutSchema struct. This file is the authoritative reference for SQL migration tools and ORM-like mappers used elsewhere in the sqlshortcutstore parent module. It ensures that the Go representation of a shortcut's persistence layer remains synchronized with the actual SQL table constraints (e.g., PRIMARY KEY, UNIQUE).The /go/sql module is the central authority for the database schema within the Skia Perf application. It implements a “Schema-as-Code” methodology, where Go struct definitions serve as the single source of truth for the underlying Google Cloud Spanner database structure.
In a high-throughput performance monitoring system, database consistency across distributed components (ingesters, frontends, and maintenance tasks) is paramount. This module provides a unified interface for defining, generating, migrating, and testing the database schema.
Instead of manually maintaining DDL (Data Definition Language) files, developers modify Go structs. The module then provides tooling to project these definitions into SQL strings, Go constants for type-safe querying, and serialized JSON files used for environment validation.
The architecture is built on the principle that the application code should dictate the database structure, not the other way around.
Alerts or Favorites) to ensure persistence.tables.go)The Tables struct in tables.go acts as the master registry. It aggregates schema definitions from various sub-packages across the Perf project (e.g., alerts, regressions, trace stores). This centralized struct is used by reflection-based tools to understand the entire database landscape.
tosql and exportschema)These sub-modules transform Go code into deployable artifacts:
tosql: A CLI tool that parses the Go structs and generates go/sql/spanner/schema_spanner.go. This generated file contains the raw SQL DDL strings and Go slices of column names used by the application at runtime.exportschema: A utility that serializes the schema into a standardized schema.Description (JSON). This artifact is used for comparing the “expected” state against the “live” state of a production database.expectedschema)This component manages the transition of the database over time. It embeds the expected JSON descriptions into the binary and provides the ValidateAndMigrateNewSchema logic.
FromLiveToNextSpanner).TraceParams table. Since the keys in performance data change as new benchmarks are added, this module dynamically adds or drops generated columns and indexes in Spanner to maintain query performance.The following diagram illustrates the lifecycle of a schema change:
[ Developer ] | v [ Modify Go Structs ] ----> (e.g., add field to TraceValuesSchema) | v [ Run 'tosql' ] ----------> (Updates schema_spanner.go constants) | v [ Run 'exportschema' ] ---> (Generates schema_spanner.json) | v [ Deployment ] -----------> [ Maintenance Task ] | v [ Validate & Migrate ] | +--------------------+--------------------+ | | | (Match Prev?) (Match Next?) (Match Neither?) | | | [ Run Migration ] [ Do Nothing ] [ Panic/Error ] | | +----------+---------+ | v [ Update Dynamic TraceParams ] (Add/Drop Generated Columns)
The sqltest sub-module and sql_test.go provide the infrastructure for integration testing.
The expectedschema module manages the lifecycle and validation of the database schema for Skia Perf. It serves as the authoritative source for what the database structure should look like at any given version of the software, and provides the mechanism to transition the database from a previous state to the current one.
In a distributed system where multiple services (frontend, ingesters, maintenance tasks) share the same database, schema synchronization is critical. This module ensures that:
traceparams) are dynamically adjusted based on the actual data flowing through the system to optimize query performance.The module implements a “n-1” compatibility strategy. It tracks two versions of the schema:
schema_prev_spanner.json: The schema as it existed in the previous version of the application.schema_spanner.json: The desired “next” schema for the current version.This approach is chosen because Perf components are deployed simultaneously. When a new version is rolled out, the maintenance task upgrades the schema. If the frontend or ingester starts before the migration, they check the schema; if it matches neither “prev” nor “next”, they panic. This ensures that the system only runs on a known, supported database state.
embed.go)The module uses Go's embed package to include JSON representations of the Spanner schema directly into the binary. This makes the schema definition portable and easily accessible for comparison against the live database.
Load(): Retrieves the current expected schema.LoadPrev(): Retrieves the previous version's schema.migrate.go)This file contains the logic for transitioning the database. It defines two raw SQL strings that must be manually updated by developers whenever a schema change is introduced:
FromLiveToNextSpanner: The DDL commands to apply the new change.FromNextToLiveSpanner: The DDL commands to revert the change (primarily used for testing and local development).The ValidateAndMigrateNewSchema function performs the core logic:
traceparams_schema.go)Unlike static tables, the traceparams table uses Spanner's generated columns and indexes to optimize filtering. Since the keys in performance data (params) change over time, this module dynamically manages these columns.
UpdateTraceParamsSchema performs the following workflow:
traceparams table.traceParamsUpdateTemplate) to generate and execute DDL that adds missing columns/indexes and drops obsolete ones.The following diagram illustrates how the maintenance task synchronizes the database during a deployment:
[ Start Maintenance Task ] | v [ Fetch Live Schema from DB ] <----------+ | | +---- matches Next? --------> [ Success: No action needed ] | | +---- matches Prev? --------> [ Execute FromLiveToNextSpanner ] | | +---- matches neither? ------> [ Error: Inconsistent State ] | v [ Update Dynamic TraceParams ] | +--> [ Get keys from recent tiles ] +--> [ Add/Drop Generated Columns ] +--> [ Add/Drop Indexes ] | v [ Done ]
migrate_spanner_test.go provides a suite to verify that migrations correctly transition a database from the “prev” state to the “next” state and that dynamic column generation works as expected.The exportschema module provides a command-line utility designed to bridge the gap between Go-defined database schemas and their serialized representations. In the context of the Perf system, it acts as a generator that translates internal Go struct definitions and Spanner schema configurations into a standardized schema.Description format. This serialized output is primarily used for schema verification, migrations, and ensuring consistency across different deployment environments.
The primary motivation for this module is to treat the database schema as a “source of truth” defined within the Go codebase rather than in disparate SQL files. By using a Go-based tool to export the schema:
The module is a thin wrapper that orchestrates the extraction of schema metadata. It leverages the generic exportschema_lib to perform the actual serialization while providing the Perf-specific schema definitions as inputs.
The following diagram illustrates how the tool transforms internal Go definitions into an external schema description:
+-----------------------+ +-----------------------+ | perf/go/sql/spanner | | perf/go/sql | | (Schema Definitions) | | (Table Structs/Tags) | +-----------+-----------+ +-----------+-----------+ | | | +----------------+ | +----->| exportschema |<-----+ | (Main) | +-------+--------+ | v +----------------------------+ | go/sql/schema/exportschema | | (Serialization Engine) | +-------------+--------------+ | v +----------------------+ | .json / .sql output | | (schema.Description) | +----------------------+
main.go: This is the entry point. It defines the CLI interface, accepting a -databaseType to determine the target dialect and an -out path for the resulting file. It explicitly imports perf/go/sql/spanner to access the Schema object, which contains the specific table layouts and column types required by the Perf application.sql.Tables{}: The tool passes an empty instance of the Perf SQL tables to the exporter. This allows the reflection-based serialization engine to inspect the struct tags (such as sql:"...") used throughout the Perf module to understand how Go objects map to database columns.The module is responsible for:
spanner.Schema) should be exported.The go/sql/spanner module serves as the authoritative source for the Google Cloud Spanner database schema used by the Skia Perf application. It contains the DDL (Data Definition Language) statements required to initialize the database environment and provides Go constants that represent the table structures, ensuring type safety and consistency when interacting with the database.
The primary file, schema_spanner.go, is generated by an external tool (//go/sql/exporter/). This approach ensures that the Spanner schema remains synchronized with the internal Go structures used across the Perf application. Manual edits to this file are discouraged to prevent drift between the application logic and the database state.
The schema is optimized for the high-volume time-series data typical of performance monitoring.
Alerts and SourceFiles use bit_reversed_positive sequences. This is a specific Spanner optimization to prevent hotspots during high-throughput inserts by distributing primary key values across the keyspace.createdat column and a TTL policy of 1095 days (3 years). This automates data retention and prevents unbounded storage growth for ephemeral performance traces and logs.TraceValues and TraceValues2: Store the actual measurement values associated with a specific trace and commit. TraceValues2 provides more granular dimensions (benchmark, bot, test, subtests) for improved querying.Postings and ParamSets: Facilitate the “inverted index” style search used in Perf, allowing the system to quickly find traces based on key-value pairs (e.g., finding all traces where cpu=arm64).TraceParams: Stores the full set of parameters for a trace ID in a JSONB column, balancing structured searching with flexible metadata storage.The schema defines a sophisticated relationship between performance regressions and their remediation:
This file contains a single large string constant, Schema, which includes the full set of CREATE TABLE, CREATE INDEX, and CREATE SEQUENCE statements. It also exports slice variables (e.g., var Alerts, var Commits) that list the column names for each table, providing a programmatic way to reference table structures without hardcoding strings in the application logic.
The schema supports a workflow where incoming performance data is transformed into searchable traces.
Incoming Data File | v [SourceFiles] <------- [Metadata] (Links to external logs) | +-----> [TraceValues] (Value at Commit X) | +-----> [TraceParams] (The "What": bot=linux, test=draw) | v [Postings] (Inverted index for searching) [ParamSets] (Summary of available search terms)
SourceFiles.TraceValues or TraceValues2.Postings and ParamSets, enabling the Perf UI to populate search filters and quickly locate relevant trace_ids.Regressions, AnomalyGroups, and Culprits.The sqltest module provides standardized utilities for initializing and managing database instances during unit testing. It is specifically designed to facilitate integration testing against Spanner-compatible PostgreSQL interfaces using local emulators.
Testing database logic requires a consistent, reproducible, and isolated environment. This module automates the orchestration of ephemeral databases to ensure that tests do not interfere with one another and that they run against a schema identical to production.
The implementation relies on two primary architectural choices:
spanner package) before returning a connection, ensuring that the code under test interacts with the expected table structures.The primary entry point is NewSpannerDBForTests. This function handles the entire lifecycle of a test database:
The module does not return a raw database driver connection. Instead, it returns a pool.Pool interface wrapped in a timeout validator:
pgxpool with timeout.New, the module ensures that every database operation performed during the test includes a context with a defined timeout. This prevents tests from hanging indefinitely if a deadlock or performance issue occurs in the underlying logic.pool.Pool interface, it allows the rest of the application to remain agnostic of the underlying driver implementation (PostgreSQL vs. Spanner-via-PGAdapter).The following diagram illustrates the sequence of operations when a test requests a new database connection:
Test Invocation | v [ Check Emulators ] ----> (Require Spanner & PGAdapter running) | v [ Generate Name ] ------> (Prefix + Random ID) | v [ Connect Pool ] -------> (Establish connection to PGAdapter) | v [ Apply Schema ] <------- (Loop: Try applying spanner.Schema) | (until success or 10s timeout) v [ Wrap Connection ] ----> (Inject timeout enforcement wrapper) | v Return Pool
sqltest.go: Contains the logic for connecting to the PostgreSQL-compatible endpoint provided by the emulator. It handles the string formatting for connection strings (e.g., postgresql://root@...) and manages the integration between the pgx library and the project's internal pool abstractions.The tosql module provides a command-line utility designed to maintain a “Go-first” approach to database schema management. It serves as a bridge between high-level Go struct definitions and the concrete SQL schema required by the database engine, specifically targeting Google Cloud Spanner for the Perf application.
The primary design goal of this module is to ensure that Go code remains the single source of truth for the database schema. Rather than manually maintaining .sql files and trying to keep Go structs in sync with them, tosql automates the generation of SQL schema strings and column constants directly from Go definitions.
This approach offers several advantages:
The module's entry point is main.go. Its responsibility is to orchestrate the conversion process by:
//perf/go/sql).exporter (from //go/sql/exporter) to translate Go types and tags into Spanner-compatible SQL dialects.spanner/schema_spanner.go).The module defines specific transformation policies for the Perf database. A notable implementation choice is the handling of Time To Live (TTL). The generator explicitly excludes certain tables—such as Alerts, Favorites, Subscriptions, and TraceParams—from automated TTL policies. This reflects a design decision to treat configuration and user-created entities as permanent, while allowing raw performance data to be eligible for lifecycle management.
The following diagram illustrates how tosql fits into the development lifecycle:
[ Go Structs ] --> [ tosql ] --> [ Generated Go Code ] --> [ Application ] (Source of Truth) | (Schema Strings ) (Type-safe SQL) | (Column Constants ) v [ SQL Exporter Logic ] (Spanner Dialect) (TTL Exclusions )
perf/go/sql package to add a new column or table.tosql tool triggers the exporter.The stepfit module provides algorithms for detecting and quantifying “steps” or shifts in time-series data (traces). In the context of performance monitoring, these steps represent regressions (performance degradation) or improvements.
The core functionality revolves around taking a slice of telemetry data and determining if a significant change in value occurs at a specific point. The module evaluates these changes using several different statistical and heuristic methods, allowing the caller to choose the best detection strategy for their specific data type (e.g., noisy vs. stable benchmarks).
The primary entry point is GetStepFitAtMid, which analyzes a trace centered around a specific index to determine if a step exists at that “turning point.”
The StepFit struct is the result of an analysis. It contains:
HIGH (step up/potential regression), LOW (step down/improvement), or UNINTERESTING (no significant change).The module supports multiple algorithms defined via types.StepDetection:
StepSize / LSE. It is effective for identifying clear shifts while accounting for noise.Regression value is the p-value of the test, and the Status is determined by whether this p-value meets the “interesting” threshold.The following diagram illustrates the general process within GetStepFitAtMid:
Input Trace [x0, x1, ..., xN] | v +-----------------------+ | Pre-processing | (Normalization or | | Length Adjustment) +-----------+-----------+ | v +-----------+-----------+ | Split Trace at Middle | -> [Left Half] | [Right Half] +-----------+-----------+ | v +-----------+-----------+ | Apply Algorithm | (Original, Cohen, U-Test, etc.) | (Calculate Means, | | StdDev, or Ranks) | +-----------+-----------+ | v +-----------+-----------+ | Calculate Regression | (Score representing | and Step Size | change magnitude) +-----------+-----------+ | v +-----------+-----------+ | Determine Status | (Compare Regression | | to Interesting Threshold) +-----------+-----------+ | v Result: StepFit
For the OriginalStep algorithm, the module performs normalization using vec32.Norm. This ensures that traces with different scales can be compared using a uniform “interesting” threshold. A stddevThreshold is used to prevent division by zero or extreme amplification of noise in very flat traces.
The interesting parameter passed to GetStepFitAtMid is polymorphic in its meaning depending on the algorithm:
OriginalStep, AbsoluteStep, CohenStep, and PercentStep, a higher interesting value makes the detector less sensitive (requires a larger shift).MannWhitneyU, where the regression score is a p-value, a lower interesting value (e.g., 0.05) makes the detector less sensitive (requires higher statistical confidence).The module requires a minimum trace size (defined as 3). For most algorithms, it expects the trace provided to be a window around a specific point. If not using the OriginalStep algorithm, the module truncates the last element of the trace to ensure symmetry (2*N length) for the split-at-mid logic.
The subscription module provides the data management layer for “Subscriptions” within the Skia Perf ecosystem. In this context, a Subscription is a configuration object that defines how the system should react when a performance anomaly is detected. It acts as a bridge between the detection of a regression and the filing of an actionable bug report, containing metadata such as target bug components, priority levels, and point-of-contact information.
This module defines the standard Store interface for persisting these configurations and provides the underlying Protocol Buffer definitions that ensure consistency across the backend services.
A core design principle of the subscription system is revision-based tracking. Subscriptions are not simply overwritten; they are versioned by a combination of their name and a revision (typically a Git hash or unique identifier from the configuration source).
The module is structured to decouple the schema of a subscription from its persistence and its testing:
store.go)The Store interface is the primary contract for subscription data access. It supports two main modes of operation:
GetActiveSubscription and GetAllActiveSubscriptions are used by the live regression detection pipeline to find the most recent rules for filing bugs.GetSubscription(name, revision) allows the system to reference the exact configuration that was in place when a specific anomaly was detected, even if the subscription has since been updated./proto)The v1.Subscription message is the source of truth for what constitutes a subscription. It includes:
bug_component, bug_cc_emails, and contact_email.bug_labels, hotlists, bug_priority, and bug_severity.name field serves as the unique identifier for a specific monitoring rule./sqlsubscriptionstore)The standard implementation of the Store interface. It manages the SQL lifecycle of subscription records, handling the translation between Go structs and database rows, and enforcing the “soft-deactivation” logic during updates.
The following diagram illustrates how a subscription moves from a configuration file into the database and is eventually used during an anomaly detection event:
[ Config Source ] ----> [ Subscription Manager ] (Git/Repo) | | 1. Parse & Validate v [ SQL Store ] <--------- [ Store Interface ] | | 2. InsertSubscriptions(new_set, tx) | | - Set old records is_active = false | | - Insert new records is_active = true v [ Database ] | | 3. GetAllActiveSubscriptions() v [ Anomaly Detector ] ----> [ External Issue Tracker ] 4. File bug using labels/components from Subscription
store.go: Defines the Store interface which abstracts the underlying persistence mechanism.proto/v1/subscription.proto: The definitive schema for subscription data, used for both storage and cross-service communication.sqlsubscriptionstore/sqlsubscriptionstore.go: The SQL implementation of the store, containing the logic for versioned updates and retrieval.mocks/Store.go: An autogenerated mock of the Store interface for use in unit tests.The subscription/mocks module provides autogenerated mock implementations of the Store interface used in Perf subscription management. This module is designed to facilitate unit testing for components that depend on the subscription storage layer without requiring a live database connection or complex setup.
By utilizing these mocks, developers can simulate various database states, verify that the application logic calls the storage layer with the expected parameters, and test error-handling scenarios in a predictable, isolated environment.
The implementation relies on testify/mock and is generated via the mockery tool. This approach ensures that the mock interface remains synchronized with the actual Store interface defined in the subscription package.
Key design choices include:
Store, the business logic governing subscriptions (such as validation or processing) can be tested independently of the underlying PostgreSQL implementation (facilitated by the pgx dependency).pgx.Tx as an argument (e.g., InsertSubscriptions), allowing tests to verify transactional logic even within a mocked context.NewStore constructor automatically registers a cleanup function on the testing object. This ensures that AssertExpectations is called at the end of every test, enforcing that all expected calls were made and preventing “silent” test failures where code logic skips necessary database interactions.This is the primary file containing the Store mock struct. It mirrors the capabilities of the real subscription storage engine:
GetActiveSubscription, GetAllActiveSubscriptions, GetAllSubscriptions, and GetSubscription. These allow tests to simulate the presence or absence of specific subscription configurations (represented by v1.Subscription protos).InsertSubscriptions mock enables verification of how the system writes or updates subscription data, including support for bulk operations and database transactions.The following diagram illustrates how the mock interacts with a consumer (e.g., a Subscription Manager) and a test suite:
+-----------+ +-----------------------+ +--------------+ | Test | | Subscription Manager | | Mock Store | +-----------+ +-----------------------+ +--------------+ | | | | 1. Setup Expectation | | |--------------------------->| | | (On "GetSubscription") | | | | | | 2. Trigger Action | | |--------------------------->| | | | 3. Call GetSubscription() | | |----------------------------->| | | | | | 4. Return Mocked Proto/Error | | |<-----------------------------| | 5. Assert Result | | |<---------------------------| | | | | | 6. Automatic Cleanup | | | (AssertExpectations) | | |--------------------------->|----------------------------->|
In this workflow, the Mock Store allows the Test to define exactly what the Subscription Manager should receive when it queries for a subscription, ensuring the manager handles the returned data (or error) correctly according to the system's design requirements.
The go/subscription/proto module defines the foundational data structures used for anomaly notification routing and issue tracking within the Skia Perf ecosystem. This module serves as the contract between the performance analysis engines—which detect regressions—and the reporting services—which notify stakeholders.
The design of the proto definitions in this module reflects a transition toward automated, template-based issue management.
bug_component, hotlists, and bug_labels are first-class citizens, ensuring that when an anomaly is detected, the resulting ticket is pre-triaged and routed to the correct engineering queue.Defined in subscription.proto, the Subscription message is the primary data model. It acts as a routing rulebook for performance regressions.
bug_component and bug_cc_emails fields define the destination of the alert. This ensures that the right team is notified immediately without manual triage.bug_labels and hotlists fields allow the system to tag issues with relevant metadata (e.g., “Chromium-Perf-Regression” or “Milestone-110”). This is critical for automated dashboards that track the health of specific product releases.contact_email field is mandatory to ensure every subscription has an owner who can be reached if the alerting rules become noisy or obsolete.The module includes the generated Go code (subscription.pb.go) to provide a type-safe interface for the Perf backend.
generate.go: This file encapsulates the logic for invoking the protocol buffer compiler. By including this in the module, the project ensures that the Go structs remain in sync with the proto definitions, preventing runtime errors during the serialization or deserialization of subscription configurations.The following diagram demonstrates how the proto definitions facilitate the transition from a detected performance dip to an actionable engineering task:
[ Regression Detector ] | | (A) Detects significant change in trace v [ Subscription Manager ] <---- [ Proto-based Config Files ] | (Defines Name, Component, Priority) | | (B) Matches trace to "Subscription" name v [ Reporting Service ] | | (C) Maps Proto fields to API Call: | - Labels -> bug_labels | - Component -> bug_component v [ External Issue Tracker ]
v1/subscription.proto: The source of truth for the subscription data model. It defines the structure used by both the configuration files and the internal Go services.v1/subscription.pb.go: The auto-generated Go implementation of the proto. It contains the structs and methods used by the Perf service to manipulate and pass subscription data.v1/generate.go: A utility script used to trigger the code generation process, ensuring the Go bindings are updated whenever the proto definition is modified.The subscription.proto module defines the schema for anomaly alerting configurations within the Skia Perf ecosystem. Its primary purpose is to decouple the logic of detecting performance regressions from the logic of reporting them. By providing a structured data format, it allows the system to determine exactly how and where to route notifications when an anomaly is identified.
The module is centered around the Subscription message, which acts as a template for issue creation. The design follows several key principles:
revision field indicates that subscriptions are likely managed as “Configuration as Code.” This allows the system to track which version of an internal configuration repository was used to generate or update the subscription, ensuring that changes to alerting rules are auditable.bug_component, bug_priority, and bug_severity (constrained to a 0-4 range) ensure that filed bugs are immediately actionable and correctly categorized without manual intervention.contact_email field ensures that every automated alert has a human owner responsible for the subscription's validity, preventing “zombie” alerts that fire into unmonitored components.The Subscription message is the core entity. It bridges the gap between a detected event and an external tracking system.
name is the unique key used by the Perf service to look up reporting rules. The contact_email identifies the team or individual maintaining the alert.bug_labels and hotlists allow for fine-grained filtering within issue trackers, enabling teams to organize anomalies by sub-project or release milestone.bug_component defines the destination, while bug_priority and bug_severity define the urgency. The use of repeated strings for bug_cc_emails allows for cross-team visibility on critical regressions.The subscription.pb.go file provides the concrete implementation of these structures for use in Go services. This ensures type safety when the Perf backend processes subscription data retrieved from storage or configuration files.
The following diagram illustrates how the Subscription proto is utilized during an anomaly event:
[ Perf Detection Engine ] | | 1. Anomaly Found v [ Subscription Lookup ] <--- Uses "name" to find Subscription proto | | 2. Extract Bug Metadata (Component, CCs, Labels) v [ Issue Tracker API ] ----> Creates Bug with: - Component: bug_component - CCs: bug_cc_emails - Labels: bug_labels
subscription.proto: The source of truth definition for the subscription data structure.subscription.pb.go: The compiled Go code used by internal services to handle subscription data.generate.go: Contains the automation logic for regenerating the Go code when the proto definition changes, ensuring consistency between the schema and the implementation.The sqlsubscriptionstore module provides a persistent SQL-based implementation of the subscription.Store interface. It is responsible for storing, versioning, and retrieving configurations that define how the Perf system should handle anomalies, specifically focusing on bug filing metadata such as components, labels, and priority.
The store implements a “deactivate-then-insert” pattern for updates. When new subscriptions are inserted via InsertSubscriptions, the store wraps the operation in a transaction that first marks all existing subscriptions as inactive before inserting the new set as active.
This design choice ensures that:
(name, revision), the store maintains a full lineage of how a subscription's metadata (like its bug component or CC list) has changed over time, keyed to specific infrastructure Git revisions.is_active flag allows the system to distinguish between the current production configuration and historical records without physical data removal.The module is designed to map directly to the requirements of issue tracking systems (like Monorail or Buganizer). Implementation details such as storing BugLabels and Hotlists as string arrays, and BugPriority/BugSeverity as integers, allow the Perf service to programmatically construct bug reports that adhere to specific team triage workflows without needing complex transformation logic at the application layer.
Located in sqlsubscriptionstore.go, this is the primary struct implementing the data access logic. It wraps a pool.Pool to interact with the underlying database (typically Spanner or PostgreSQL).
InsertSubscriptions method explicitly accepts a pgx.Tx (transaction) object. This allows the caller to coordinate subscription updates with other database operations, ensuring that configuration updates are atomic across the system.The underlying table structure (defined in the schema submodule) enforces the immutability of specific revisions. Fields like bug_cc_emails and contact_email are stored to ensure that the notification engine knows exactly who to alert when an anomaly is detected under a specific subscription's criteria.
The following diagram illustrates the process of updating subscriptions within the store, highlighting the transition of active states.
1. Caller starts Transaction (tx)
2. InsertSubscriptions(ctx, new_subs, tx)
|
v
+---------------------------------------+
| SQL: UPDATE Subscriptions |
| SET is_active = false | <-- Archive existing configs
| WHERE is_active = true |
+---------------------------------------+
|
v
+---------------------------------------+
| SQL: INSERT INTO Subscriptions |
| (name, revision, ..., is_active=true) | <-- Activate new configs
+---------------------------------------+
|
v
3. Caller commits Transaction
The store provides multiple ways to access data based on the caller's context:
GetSubscription(name, revision) retrieves a specific historical version of a config.GetActiveSubscription(name) or GetAllActiveSubscriptions() retrieves only the configurations currently marked as active, used by the live alerting engine.GetAllSubscriptions() returns the entire database contents, including inactive versions.The schema module defines the data structure and database layout for storing subscriptions within the Perf system. It serves as the single source of truth for the SQL table definitions used by the sqlsubscriptionstore, ensuring that subscription metadata is persisted consistently and can be queried efficiently.
The schema defines a primary key composed of both name and revision.
PRIMARY KEY(name, revision)
This design choice facilitates versioning and traceability. Instead of overwriting an existing subscription when configurations change, the system records a new entry tied to a specific infra_internal Git hash (revision). This allows the system to:
A significant portion of the schema is dedicated to bug metadata (labels, hotlists, components, priority, and severity). The implementation uses STRING ARRAY types for fields like bug_labels and hotlists to provide flexibility, allowing a single subscription to categorize bugs across multiple workstreams without requiring complex relational mapping tables.
The inclusion of bug_priority and bug_severity as integers (constrained to 0-4) maps directly to standard issue tracking priorities (e.g., P0 through P4), ensuring that the Perf system can programmatically set triage urgency based on the subscription configuration.
Located in schema.go, this struct defines the mapping between Go objects and the SQL database. Its responsibilities include:
Name and Revision fields which uniquely identify the configuration.BugCCEmails and ContactEmail to ensure that the correct stakeholders are alerted when the subscription triggers.IsActive boolean allows for soft-deactivation of subscriptions, enabling users to pause monitoring without deleting the historical configuration or metadata.The following diagram illustrates how the schema supports the transition from a configuration defined in code/Git to a persisted database record used for bug filing.
[ Git Revision ] ----> [ Subscription Config ]
| |
| (Name + Revision used as Key)
| |
| v
| +-----------------------+
+----------->| SQL: Subscriptions |
+-----------------------+
| name: "Chrome_Perf" |
| revision: "a1b2c3d" | <--- Ensures auditability
| bug_component: 12345 |
| is_active: true |
+-----------+-----------+
|
v
+-----------------------+
| Bug Filing Process |
+-----------------------+
| CCs: bug_cc_emails |
| Labels: bug_labels |
+-----------------------+
SubscriptionSchema struct definition with SQL tags that define the column types and constraints for the underlying database engine.The tracecache module provides a specialized caching layer for Perf trace identifiers. It is designed to bridge the gap between high-level user queries and the underlying data tiles, reducing the computational overhead of repeatedly resolving complex queries against the same dataset.
In the Perf system, data is organized into “tiles.” When a user executes a query, the system must identify which traces match that query within a specific tile. This resolution process can be expensive, especially for broad queries or large datasets.
TraceCache addresses this by memoizing the results of query resolutions. It maps a combination of a TileNumber and a query.Query to a list of matching paramtools.Params (trace identifiers). This allows the system to bypass the query engine for subsequent requests for the same data, significantly improving performance for dashboard loading and data exploration.
The cache's efficiency relies on its key generation strategy. The module uses a composite key: [TileNumber]_[QueryString]
TileNumber in the key, the cache automatically invalidates or isolates results as time progresses and new tiles are created. This ensures that query results are always contextually tied to the specific temporal bucket of data they represent.query.Query object is converted to its KeyValueString() representation. This ensures that queries with the same parameters result in the same cache key, maximizing the hit rate.Trace identifiers are stored as JSON blobs within the cache backend. While JSON introduces a small overhead for marshaling and unmarshaling, it provides a stable, human-readable format that simplifies debugging and ensures compatibility regardless of the underlying cache provider (e.g., in-memory, Redis, or Memcache).
The TraceCache struct does not implement a caching engine itself. Instead, it wraps an implementation of the cache.Cache interface. This decoupling allows the tracecache module to remain agnostic of the storage backend, enabling the use of local in-memory caches for development and distributed caching systems for production environments.
The primary coordinator of the module. Its responsibilities include:
cache.Cache client.TileNumber and Query) into flat string keys.paramtools.Params arrays into JSON and back.paramtools.Params; if the key is missing (a cache miss), it returns nil, signaling that the caller must perform the query resolution manually.The typical lifecycle of a trace lookup using this module follows this pattern:
User Query + Tile | v [ TraceCache.GetTraceIds ] ----(Key: TileID_Query)----> [ Cache Backend ] | | +<-----------( JSON Result / Miss )------------------+ | | If Miss: | 1. Execute Query against Tile | 2. [ TraceCache.CacheTraceIds ] ----------> [ Cache Backend ] | 3. Return Results | | If Hit: | 1. Unmarshal JSON | 2. Return Results
The tracefilter module provides a specialized tree-based data structure designed to identify and isolate “leaf” traces within a hierarchical path structure. In the context of performance monitoring and trace management, data often arrives with overlapping prefixes or hierarchical relationships. This module allows for the filtering of redundant parent nodes, ensuring that only the most specific (deepest) traces are processed.
The primary goal of tracefilter is to resolve hierarchical dependencies between trace paths. When multiple paths are added to the filter, some may be prefixes of others. For example, if both root/cpu/usage and root/cpu/usage/core1 are registered, the latter is a more specific leaf node.
By modeling these paths as a tree, the module can efficiently determine which traces represent actual data endpoints versus those that are merely architectural containers for more granular metrics. This is particularly useful for deduplicating metrics or ensuring that aggregations don't double-count data that exists at multiple levels of a hierarchy.
TraceFilter)The core of the module is the TraceFilter struct, which functions as a recursive node in a prefix tree (trie). Each node stores:
value: The specific path segment string (e.g., “p1”).traceKey: An identifier associated with that specific path.children: A map of sub-paths to nested TraceFilter nodes.AddPath)The AddPath method builds the tree incrementally. It accepts a slice of strings representing the hierarchy and a traceKey. As paths are added, the module creates the necessary branch nodes. If a path is added that extends an existing branch, the tree grows deeper.
GetLeafNodeTraceKeys)This is the central logic of the module. It performs a recursive depth-first search to find nodes that have no children.
The implementation logic follows a “specificity wins” rule:
traceKey is ignored, and the search continues into its children.traceKey is collected and returned.This ensures that if a parent key is added and later a child of that parent is added, only the child's key (the more specific one) will be returned in the final result set.
Consider a scenario where various metrics are registered. The tree filters out the intermediate “p2” and “p3” keys because more specific children exist.
Input Paths: 1. ["root", "p1", "p2"] Key: "key_parent" 2. ["root", "p1", "p2", "p3"] Key: "key_intermediate" 3. ["root", "p1", "p2", "p3", "t1"] Key: "key_leaf_A" 4. ["root", "p1", "p2", "p4"] Key: "key_leaf_B" Tree Construction: root └── p1 └── p2 (key_parent) ├── p3 (key_intermediate) │ └── t1 (key_leaf_A) <-- Leaf └── p4 (key_leaf_B) <-- Leaf Resulting Leaf Keys: ["key_leaf_A", "key_leaf_B"]
In this example, “key_parent” and “key_intermediate” are discarded by GetLeafNodeTraceKeys because the filter assumes that the presence of deeper nodes makes the higher-level nodes redundant for the specific filtering task.
The tracesetbuilder module provides a high-performance, concurrent mechanism for aggregating disparate trace data fragments into a unified TraceSet and a corresponding ParamSet. This is primarily used in Perf to consolidate data fetched from multiple storage tiles or shards into a single contiguous representation suitable for visualization or analysis.
In the Perf system, performance data is often stored and retrieved in chunks (tiles). When a user requests data over a large time range, the system must fetch multiple tiles and stitch them together. TraceSetBuilder manages this stitching process efficiently.
The design prioritizes performance and thread safety by using a “sharded worker” architecture. Instead of protecting a shared result set with a global mutex—which would cause significant contention when processing thousands of traces—the builder distributes the work across a pool of independent worker routines.
The builder uses a pipeline pattern to process incoming trace data:
Add() is called, the builder iterates over the provided traces. It calculates a CRC32 checksum of each trace key to determine which worker should handle that specific trace.TraceSet and ParamSet without any internal locking.CommitNumber to an output index, allowing it to place data points at the correct temporal position regardless of the order in which tiles are processed.Build() is invoked, the builder waits for all workers to finish their queues and then merges the independent results from each worker into a final consolidated set.Add(traces) Worker 1 (Keys A, D) Build() | +-----------+ | |--- Hash(A) --->| TraceSet1 |--- Merged -| | +-----------+ | |--- Hash(B) ---. |--> Final TraceSet |--- Hash(C) --.| Worker 2 (Keys B, C) |--> Final ParamSet | +-----------+ | `--- Hash(D) --->| TraceSet2 |--- Merged -' +-----------+
tracesetbuilder.go)The primary coordinator. It initializes a pool of 64 workers (defined by numWorkers) and a sync.WaitGroup to track pending work. It is designed for a single lifecycle: you Add() data, Build() the result, and then Close() the builder. It cannot be reused after Build() is called.
tracesetbuilder.go)Internal workers that maintain their own state. Each worker listens on a buffered channel for request objects.
types.Trace filled with sentinel values (missing data). It then populates the trace at specific indices based on the commit mapping provided in the request.paramtools.ParamSet to reflect the dimensions and values present in the traces it has processed.The request object is the unit of work passed to workers. It contains:
Params (to avoid redundant parsing in the workers).commitNumberToOutputIndex map, which defines exactly where each data point in the input should land in the final output trace.New(size int) requires the total length of the resulting traces (e.g., the number of commits in the requested range).Add() is non-blocking to the extent of the channel buffers. It distributes traces to workers and increments the internal WaitGroup.Build() blocks until all workers have finished processing their queues. It then performs the final merge of the 64 worker-local maps into the return values.Close() must be called to shut down the worker goroutines and release resources.The tracestore module provides the core abstractions and interfaces for storing, retrieving, and querying performance trace data within the Skia Perf system. It acts as the bridge between raw performance metrics (time-series data) and the storage backends, ensuring that high-cardinality data can be queried efficiently.
In the Skia Perf ecosystem, a “trace” is a series of floating-point values associated with a specific set of parameters (e.g., ,arch=x86,config=8888,). The tracestore module defines how these traces are organized into “Tiles”—fixed-size blocks of commits—and provides the interfaces for performing complex queries across these tiles.
The module is built around three primary interfaces:
TraceStore: The main interface for reading and writing trace data, calculating tile offsets, and executing queries.TraceParamStore: Specifically handles the mapping between a trace's unique identifier (an MD5 hash) and its human-readable parameters.MetadataStore: Manages “sidecar” information, such as links to source files or diagnostic data associated with the ingestion process.To handle years of performance data without performance degradation, tracestore utilizes a tiling system.
TraceStore interface exposes methods like TileNumber and CommitNumberOfTileStart to translate between absolute commit numbers and their positions within specific storage blocks.The design separates the storage of the numeric values (the “what”) from the parameters (the “who”).
trace_id (MD5 hash).TraceParamStore maintains the lookup table for these IDs, while the TraceStore focuses on the high-volume numeric values and commit associations.tracestore.go)This is the central entry point for the module. It defines the contract for how the rest of the Perf system (like the dfbuilder for creating DataFrames) interacts with performance data.
Key Responsibilities:
QueryTraces and QueryTracesIDOnly provide the mechanism to search millions of traces based on parameter matches (e.g., finding all traces where cpu=arm64).ReadTraces) and arbitrary commit range reads (ReadTracesForCommitRange).WriteTraces is responsible for committing new data points into the store, ensuring that the associated ParamSet (the global index of all known keys and values) is updated.traceparamstore.go)This interface manages the lifecycle of trace identities.
traceId to the paramtools.Params object.InMemoryTraceParams found in the SQL implementation) to speed up the translation from IDs back to human-readable strings.metadatastore.go)This interface provides context to the raw numbers.
MetadataStore to find exactly which file generated that specific value.While this module defines the interfaces, the sqltracestore submodule provides a concrete implementation designed for CockroachDB and Spanner. It implements specialized logic for:
The following diagram shows how the tracestore components interact when a user requests data for a specific graph:
UI / API Request | v [ TraceStore.QueryTraces ] | |-- 1. Identify matching TraceIDs via Query | |-- 2. Fetch Values (TraceStore implementation) | [ SQL TraceValues Table ] | |-- 3. Fetch Params (TraceParamStore) | [ SQL TraceParams Table ] | |-- 4. Fetch Source Info (MetadataStore) | [ SQL SourceFiles Table ] v Combined TraceSet + Metadata
The tracestore/mocks module provides autogenerated mock implementations of the core interfaces used for storing and retrieving performance trace data within the Perf system. These mocks are generated using mockery and are based on the testify framework, facilitating unit testing of components that depend on tracestore and metadatastore.
In the Perf architecture, the TraceStore and MetadataStore are critical abstractions for interacting with time-series data and its associated metadata (such as source file links). Because these stores often interact with external databases (like BigTable or SQL backends), using real implementations in unit tests is often impractical.
This module provides:
TraceStore Mock: Simulates the primary data store for performance traces, supporting operations like querying by parameters, reading by commit range, and tile management.MetadataStore Mock: Simulates the storage used for mapping source file names to additional metadata, such as links or IDs.This file contains the mock for the tracestore.TraceStore interface. It is designed to allow developers to simulate complex data retrieval scenarios without a running database.
Key Capabilities:
TileNumber, CommitNumberOfTileStart, and TileSize allow tests to verify how components handle Perf's “tiled” data architecture.QueryTraces and QueryTracesIDOnly can be configured to return specific TraceSet results or stream parameters, enabling tests for the UI and alerting logic.WriteTraces can be mocked to ensure that ingestion pipelines are correctly formatting and sending data to the store.This file provides the mock for the MetadataStore interface, focusing on the association between raw trace data and its origin files.
Key Capabilities:
GetMetadata and GetMetadataMultiple allows testing of features like the “Source File” links in the Perf UI.GetMetadataForSourceFileIDs for performance-sensitive batch lookups.The mocks follow the testify/mock pattern. When a new mock is created via NewTraceStore(t) or NewMetadataStore(t), it automatically registers a cleanup function that asserts expectations when the test finishes.
This diagram illustrates how a test uses the mock to verify a component that processes trace data:
Test Logic Component Under Test TraceStore Mock | | | |-- 1. On("QueryTraces") ->| | | .Return(myTraceSet) | | | | | |----- 2. RunAction() --->| | | |------- 3. QueryTraces() ------->| | | | | |<------ 4. myTraceSet -----------| | | | |<---- 5. Verify Results -| | | | | |-- 6. Cleanup/Assert ----|-------------------------------->|
TraceSet correctly.QueryTraces with the expected arguments.This module provides a high-performance, SQL-backed implementation of the tracestore.TraceStore interface for Skia Perf. It is designed to store and query high-cardinality time-series performance data, primarily targeting databases like CockroachDB or Spanner.
The implementation focuses on optimizing two primary workloads:
arch=x86) to find relevant traces.To prevent indices from growing indefinitely and to facilitate data aging/management, data is organized into “Tiles.” Each tile represents a fixed number of commits (e.g., 256). This allows the system to partition lookups and optimize the ParamSets table by only querying the keys and values relevant to the specific time range being viewed.
Trace names are structured keys (e.g., ,arch=x86,config=565,). Storing these long strings repeatedly in the TraceValues table would be storage-inefficient and slow for indexing. Instead, the module uses an MD5 hash of the trace name as a BYTEA (or BYTES) primary key (trace_id).
TraceParams table stores the mapping from trace_id back to the original JSONB parameter map.InMemoryTraceParams)While SQL is powerful, querying millions of traces based on complex parameter combinations (including regex and exclusions) can be slow in a pure SQL environment.
TraceParams table into an in-memory, integer-encoded columnar structure.arch=x86 & config=~.*8888 are resolved in-memory by scanning bitsets or integer arrays, which then produces a list of trace_ids to be used in a highly optimized SQL IN clause against the TraceValues table.SQLTraceStore (sqltracestore.go)The central orchestrator. It manages the lifecycle of traces, handles the conversion between human-readable trace names and SQL-friendly hashes, and coordinates with caches. It uses Go templates to generate dynamic SQL queries for batch operations.
InMemoryTraceParams (inmemorytraceparams.go)An in-memory search engine for trace metadata.
trace_id keyspace into 16 partitions) to rapidly load metadata from SQL into RAM.int32 identifiers to minimize memory footprint and speed up comparison logic.SQLTraceParamStore (sqltraceparamstore.go)Handles the durable storage of the trace identity.
trace_id to the full paramtools.Params (JSON).SQLMetadataStore (sqlmetadatastore.go)Stores “sidecar” information about the ingestion process.
source_file_id (an integer) to external links or diagnostic metadata. This keeps the primary TraceValues table focused strictly on performance metrics.Intersection Logic (intersect.go)A utility for combining results from multiple search channels. It uses a binary tree of Go channels to efficiently find the intersection of ordered trace_id sets without the overhead of reflection.
The following diagram illustrates how a user query for “config=565” across a specific tile is resolved:
User Query: "config=565" for Tile 176 | v [ InMemoryTraceParams ] <--- (Scans encoded columns in RAM) | | Result: List of matching TraceIDs (MD5 hashes) v [ SQL Database: TraceValues ] | | SQL: SELECT val FROM TraceValues | WHERE trace_id IN (...) AND commit_number BETWEEN 45056 AND 45311 v [ TraceSet Result ] ----> (UI/Graphing)
Ingestion prioritizes atomicity and avoiding redundant writes:
Incoming Data: {Commit: 100, Params: {arch: x86}, Value: 1.2, Source: "file.json"} | | 1. Update SourceFiles: Get/Create ID for "file.json" | 2. Update ParamSets: Ensure "arch=x86" is registered for the tile | 3. Hash Trace: ",arch=x86," -> MD5 TraceID | 4. Write TraceParams: Store {TraceID: Params} (ON CONFLICT DO NOTHING) v [ SQL Database: TraceValues ] | | INSERT INTO TraceValues (trace_id, commit, val, source_id) | ON CONFLICT (trace_id, commit) DO UPDATE ...
The sqltracestore/schema module defines the foundational data structures used to map Go types to SQL table definitions for Skia Perf's trace storage. It acts as the “source of truth” for the database schema, utilizing struct tags to define column types, primary keys, and indices.
The schema is designed to handle high-cardinality time-series data (performance metrics) while maintaining fast lookups for both specific trace values and metadata.
The core performance data is stored in TraceValuesSchema and its successor TraceValues2Schema.
(trace_id, commit_number). This ensures that for any given metric (trace), data points are physically ordered by time (commit number), optimizing range scans for graphing.To facilitate searching across millions of traces based on arbitrary parameters, the module defines a PostingsSchema:
tile_number. This sharding strategy prevents the posting indices from growing indefinitely, allowing the system to query only relevant time ranges.key_value (representing a key=value pair) is indexed against trace_id. This allows the system to quickly resolve a query like device=pixel6 into a set of trace IDs.JSONB. This is used when the system needs to reconstruct the full identity of a trace after it has been located via an index.The following diagram illustrates how these entities relate during data ingestion and retrieval:
[ SourceFiles ] <---------- [ TraceValues ] ----------> [ TraceParams ] (Maps filename to ID) (The actual metrics) (Full key/value map) | | (linked by trace_id) v [ ParamSets ] <---------- [ Postings ] (All possible keys/vals) (Search index for traces)
TraceValues, Postings, and TraceParams.by_source_file_id) to support administrative workflows, such as identifying all data points associated with a corrupted or updated source file.The perf/go/tracing module serves as a specialized wrapper for initializing distributed tracing within Perf applications. It bridges the gap between the generic infrastructure-level tracing utilities and the specific configuration requirements of a Perf instance.
Its primary purpose is to ensure that performance data and request flows across Perf services are captured and exported consistently to a tracing backend (typically Google Cloud Trace) without requiring each sub-service to manually manage initialization logic or environment-specific metadata.
The module abstracts the complexity of OpenCensus initialization. By consolidating this in one place, the project ensures that all Perf components—such as the frontend, ingestion service, and query engine—use identical sampling logic and metadata tagging. This consistency is crucial for correlating traces across different service boundaries.
A key design choice in Init is the automatic injection of contextual metadata into every trace.
MY_POD_NAME environment variable (injected via Kubernetes templates), the module allows developers to pinpoint exactly which container instance handled a specific request.instance name is included in the trace attributes to allow for easy filtering in the tracing dashboard.Tracing is intentionally bypassed when running in local mode. This prevents development environments from attempting to authenticate with cloud-based tracing exporters or polluting production trace data with local testing noise.
The module utilizes TraceSampleProportion from the InstanceConfig. This allows for dynamic control over the volume of traces generated. High-traffic instances can set a lower proportion to manage costs and overhead, while smaller or more critical instances can increase the sample rate for higher visibility.
tracing.goThis is the core of the module, responsible for the following:
go/tracing infrastructure package but pre-configures it with Perf-specific defaults.InstanceConfig and system environment variables into a structured map of attributes that are attached to every trace span.The following diagram illustrates how the tracing configuration flows from the application startup into the global tracing state.
Application Startup | | (local flag, InstanceConfig) v +--------------------------+ | perf/go/tracing.Init() | +--------------------------+ | |-- Check local flag (Return nil if true) |-- Extract InstanceName from Config |-- Fetch MY_POD_NAME from OS | v +-----------------------------------+ | infra/go/tracing.Initialize() | <--- Global Trace Exporter +-----------------------------------+ | |-- Sets Sampling Rate (Proportion) |-- Configures Project ID (Auto-detect) |-- Attaches {podName, instance} Attributes v Tracing Ready
The go/ts module is a utility program designed to bridge the gap between the Go backend and the TypeScript frontend in the Perf application. Its primary responsibility is to ensure type safety across the network boundary by automatically generating TypeScript interfaces and types from Go structs that are serialized into JSON for the web UI.
The module addresses the “fragile base class” problem in web development: when a Go struct used in a JSON response changes, the frontend code often breaks silently if its TypeScript definitions are out of sync.
Instead of manually maintaining duplicate type definitions, this module uses reflection (via the go2ts package) to inspect Go structs and produce a source-of-truth TypeScript file. This ensures that:
GenerateNominalTypes = true, the generator treats specific Go types as distinct in TypeScript, preventing logic errors where structurally similar but semantically different types might be confused.main.go)The core of the module is a CLI tool that configures a go2ts.Go2TS generator. The execution follows a specific sequence:
nil values for specific mapping types like paramtools.Params to prevent unnecessary optionality in TypeScript.pivot, progress, ingest).modules/json/index.ts).Go doesn't have a native “Union” type similar to TypeScript. The module uses a helper function, addMultipleUnions, to map collections of Go constants to TypeScript Union types. This is critical for states, statuses, and configuration options (e.g., regression.Status or alerts.ConfigState), ensuring the frontend can only use valid, predefined values.
[Go Source Code] [go/ts/main.go] [TypeScript Output] | | | |-- (Reflects on) --------| | | Structs & Constants | | | | | | |-- (Converts to TS) ---->| | | | | | |-- index.ts | | | (Interfaces, | | | Namespaces, | | | Unions)
The module acts as a central registry, importing almost every major data-holding package in the Perf system to expose their structures:
perf/go/frontend/api: Defines the shapes of requests and responses for the web API.perf/go/alerts & perf/go/regression: Core domain objects for alerting logic and anomaly detection.perf/go/clustering2: Data structures representing results from clustering algorithms.perf/go/types & go/paramtools: Low-level primitives for trace keys and parameter sets.perf/go/chromeperf & perf/go/pinpoint: Structures for interacting with external Chromeperf and Pinpoint services.The module is intended to be run via go generate. When a developer modifies a Go struct that is sent to the frontend, they should trigger the generator to update the TypeScript definitions, which are then checked into version control. This maintains a synchronized state between the two languages.
The go/types module serves as the central repository for core domain types and shared constants used throughout the Skia Perf system. It establishes a common language for time-series data, versioning, and anomaly detection configurations, ensuring consistency across data ingestion, storage, and analysis.
The system handles large-scale time-series data by indexing it against repository commits.
The module provides conversion logic to navigate between these two coordinate systems:
CommitNumber ----(TileSize)----> TileNumber [0, 255] / 256 0 [256, 511] / 256 1
A Trace is the fundamental unit of measurement data, represented as a slice of float32 values.
vec32.MISSING_DATA_SENTINEL) to represent gaps in measurement, allowing the system to distinguish between a zero value and no data.The module defines the enums and types that control how the system identifies changes in performance:
Defines how traces are aggregated before analysis:
Determines the mathematical approach used to identify a regression within a single trace or cluster centroid:
CohenStep (Effect size) and MannWhitneyU (Rank-sum test) for robust change detection.PercentStep, AbsoluteStep, and Const for simpler magnitude-based thresholds.Specifies the lifecycle of a detected anomaly via AlertAction:
int32 for CommitNumber, the system prioritizes performance and simplicity in indexing over the complexity of a full Git DAG.TraceSourceInfo uses an internal sync.RWMutex. This design choice acknowledges that source information is often updated concurrently during data ingestion while being read by the UI or analysis engines.BadCommitNumber (-1) and BadTileNumber (-1) provides a standard way to handle errors or uninitialized references without relying on Go's zero-value (0), which is a valid index.The /go/ui module serves as the primary backend orchestration layer for the Perf UI. Its main purpose is to bridge the gap between high-level user interactions (like clicking a “shortcut” link or requesting a custom dashboard) and the underlying data storage and processing systems. It acts as a coordinator, delegating specific tasks to specialized submodules like frame for data processing or shortcuts for state persistence.
A key design choice in the Perf UI is to avoid massive, complex URLs. Instead of encoding an entire UI state (queries, zoom levels, formula transformations) into the URL, the module uses a “Shortcut” system.
/go/ui layer retrieves the original state and hydrates the UI.The module is designed around the concept of a Frame. A “Frame” is not just raw data, but a structured package containing trace values, metadata, anomaly markers, and display instructions.
frame submodule, which acts as the “brain” for transforming raw trace queries into a format the UI can immediately consume.Because performance data can span millions of points and take seconds to process, the UI backend implementation prioritizes progress tracking.
The following diagram shows how the ui module coordinates a request to view data, starting from a short URL:
Browser (URL with Shortcut ID) | |-- 1. Get ID ----> [ /go/ui/shortcuts ] (Retrieve State) | | |<- 2. UI State ----------' | |-- 3. Request Data --> [ /go/ui/frame ] | (State ID) | | |-- a. Query Tracestore | |-- b. Run Calculations | |-- c. Attach Anomalies | |-- d. Link Commits/Source | V |<-- 4. DataFrame <-----------' | (Render Graph/Table)
/go/ui/frameThe heavy lifter of the module. It handles the FrameRequest lifecycle. It is responsible for:
calc engine to process mathematical formulas on the fly./go/ui/shortcutsManages the lifecycle of “Shortcuts” (short IDs that map to complex UI states).
Beyond the submodules, the root /go/ui package often contains the logic for global UI settings and navigation. It determines which features are enabled based on the instance configuration (e.g., whether to show anomaly detection features or specific repository links).
The /go/ui/frame module is responsible for orchestrating the transition from a user's high-level data request (represented as queries, formulas, or shortcuts) into a rich, structured DataFrame suitable for visualization in the Perf frontend. It acts as the “brain” of the Explore page, managing the complexity of parallel data fetching, calculation, pivoting, and metadata enrichment.
The primary entry point is ProcessFrameRequest, which manages the lifecycle of a FrameRequest. A single request can contain multiple data sources—queries, mathematical formulas, and pre-saved shortcut keys—all of which must be aggregated into a unified view.
The module follows a structured workflow to build a response:
DataFrameBuilder to fetch raw trace data based on the provided queries or shortcuts.go/calc engine to perform transformations (like sum() or filter()) on the fetched traces.pivot module, aggregating traces by specific parameters.progress.Progress object to give the frontend real-time status updates.The module supports two distinct ways of looking at time: REQUEST_TIME_RANGE (absolute Unix timestamps) and REQUEST_COMPACT (a fixed number of commits leading up to a point). The implementation abstracts this difference by passing specialized parameters to the dfBuilder while maintaining a consistent internal DataFrame structure.
One specific design choice is the use of the preflightqueryprocessor (pqp). Before fetching data, the module prepares queries with “sentinels” (e.g., __missing__). This allows the system to handle complex queries where a user specifically wants to find traces that lack a certain parameter, which is then enforced through an in-memory filterTraceSet pass after the raw data is loaded.
The getMetadataForTraces and populateTraceMetadataLinksBasedOnConfig functions implement logic to generate human-readable commit ranges. Instead of just showing a raw hash, it compares the current commit to the previous one in the trace and generates a +log/prev..current link if a change is detected. This is specifically tuned for major repositories like Chromium, V8, and WebRTC via configuration.
The module determines how the frontend should render the data by analyzing the FrameRequest.
DisplayPivotTable.DisplayPivotPlot.DisplayPlot.The following diagram illustrates how the frameRequestProcess coordinates different sub-systems:
User Request (FrameRequest) | |---- Queries ----> [ DataFrameBuilder ] ----. | | |---- Keys -------> [ ShortcutStore ] -------|--> [ Combined DataFrame ] | | | | |---- Formulas ---> [ calc.Eval ] <----------' | | .--------------------------------------------------------' | |---- [ pivot.Pivot ] (Optional Reshaping) | |---- [ anomalies.Store ] (Attach Anomaly Markers) | |---- [ MetadataStore ] (Attach Source Links) | V Final Response (FrameResponse)
FrameRequest / FrameResponse: The JSON-serializable structures that define the API between the frontend (Explore page) and the backend logic.frameRequestProcess: A private struct that maintains the state of a single request, including progress counters and references to required stores (Git, Shortcuts, Tracestore).doSearch / doCalc / doKeys: Internal methods that isolate the logic for different data retrieval strategies. doCalc is notable for providing callback functions (rowsFromQuery, rowsFromShortcut) to the calculation engine, allowing formulas to recursively fetch data.addRevisionBasedAnomaliesToResponse bridge the gap between the trace data and the anomaly detection system, ensuring that points on a graph can be highlighted if they represent performance regressions.The urlprovider module is a utility component within the Skia Perf system designed to programmatically generate deep-link URLs for various Perf UI pages. It centralizes the logic for constructing complex query parameters, ensuring that links to the Explore page, MultiGraph page, and Group Reports are consistent across the application.
The primary goal of this module is to abstract the transformation of internal state—such as commit numbers, trace parameters, and shortcut IDs—into URL strings that the Perf frontend can interpret.
A key design choice in this module is the integration with perfgit.Git. Because the Perf UI relies on Unix timestamps for time-range filtering rather than raw commit numbers, the URLProvider uses the Git service to resolve commit numbers into their corresponding timestamps. This ensures that generated URLs point to the correct temporal window even as the underlying data evolves.
Defined in urlprovider.go, this is the main stateful component. It requires an instance of perfgit.Git to perform commit-to-timestamp lookups.
begin and end URL parameters. A specific implementation choice made here is to shift the end time forward by one day (AddDate(0, 0, 1)). This is done to ensure that the data points or anomalies associated with the final commit are clearly visible on the rendered graph and not cut off at the edge of the display.Explore method constructs URLs for the /e/ endpoint. It handles the nesting of trace queries by encoding trace parameters into a single queries parameter.MultiGraph method targets the /m/ endpoint, utilizing a shortcut ID (representing a saved set of traces) rather than a raw query string.disableFilterParentTraces flag (translated to the disable_filter_parent_traces query parameter) and allow for arbitrary additional query parameters via a url.Values argument.The GroupReport function is a stateless utility. It generates URLs for the /u/ endpoint, which is typically used for viewing anomaly groups or specific bugs.
anomalyGroupID, anomalyIDs, bugID, rev, and sid.The following diagram illustrates how the URLProvider orchestrates data from internal services to produce a frontend URL:
Input Parameters Git Service (perfgit) (Commit Nums, Query) | | | v | +--------------------------+ | | URLProvider.Explore() | | +------------+-------------+ | | | |---- Request Timestamps ------->| |<--- Return Unix Timestamps ----| | | (Internal Logic) | - Add 1 day buffer to End Time | - Encode Query parameters | - Append Optional Filters | v Result: "/e/?begin=123&end=456&queries=..."
urlprovider.go: Contains the logic for calculating time ranges, encoding parameters, and building the final URL paths for Explore, MultiGraph, and Group Report pages.urlprovider_test.go: Validates that the URL generation correctly handles escaping, timestamp calculation, and optional parameter injection. It uses a mockable or test instance of the Git service to verify the integration between commit numbers and timestamps.The userissue module provides the core abstractions and storage logic for associating external issue tracker IDs (specifically Buganizer) with specific performance data points in the Perf system.
By linking a “trace key” and a “commit position” to a specific issue ID, the system allows developers and automated tools to contextualize performance anomalies with human-reported bugs. This enables the Perf UI to overlay bug information directly onto graphs, helping users understand if a regression or change is already being tracked.
The module is designed around the concept of a point-in-time association. Because a trace represents a series of data points over time, an issue is not just linked to a trace, but to a specific moment in that trace's history (the commit position).
UserIssue Struct: Represents the core domain model. It contains the identity of the user who created the association, the TraceKey, the CommitPosition, and the external IssueId.Store Interface: Defines a storage-agnostic contract for persisting and retrieving these associations. This abstraction allows the system to swap underlying database implementations (e.g., switching to a different SQL dialect or a NoSQL provider) without affecting the business logic or the UI handlers.The module follows a clean separation between the interface definition and its concrete implementations:
store.go): Defines the required operations:Save: Persists a new association.Delete: Removes an association based on the unique combination of trace and commit.GetUserIssuesForTraceKeys: A bulk retrieval method designed for high-performance graph rendering, fetching all issues related to a set of traces within a specific range of commits.sqluserissuestore): A production-ready implementation that uses SQL (compatible with Spanner). It utilizes dynamic SQL templating to handle bulk queries efficiently, ensuring that complex filters over varying numbers of trace keys remain performant.mocks): Provides automated mocks for testing. This allows other parts of the Perf system (like the alert service or the API layer) to simulate database interactions, error conditions, and specific data scenarios without requiring a live database connection.When a user identifies a performance change on a graph and links it to a bug, the data flows through the following path:
User Interaction Perf Backend Store Implementation | | | |-- 1. Create Link ------->| | | (Trace, Commit, ID) |-- 2. Call Save() ---------->| | | |-- 3. SQL INSERT | | | (last_modified set) | | | | <--- 4. Confirmation ----| | | | | | | | |-- 5. Refresh Graph ----->| | | |-- 6. Bulk Fetch ----------->| | | (for all traces in view) |-- 7. SQL Template | | | (IN clause generated) | <--- 8. Data with IDs ---| |
The module relies on a composite primary key consisting of the trace_key and the commit_position. This design decision ensures:
The userissue/mocks module provides automated mock implementations of the interfaces defined in the userissue package. Its primary purpose is to facilitate unit testing for components that depend on user issue persistence without requiring a live database or a complex manual setup.
This module utilizes test-double generation via mockery. By generating mocks based on the Store interface, the project ensures that the testing utilities stay in lockstep with the actual production code.
The decision to provide a dedicated mocks package serves two main purposes:
Store.testify/mock framework, developers can write assertive tests that verify not just the output of a function, but also that the interactions with the storage layer (e.g., “was the correct trace key deleted?”) happened exactly as expected.Store.go)The Store struct is the central component of this module. It implements the userissue.Store interface, providing mockable versions of the following operations:
Save, Delete): These allow tests to verify that the application correctly attempts to write or remove user issue metadata associated with specific traces and commit positions.GetUserIssuesForTraceKeys): This mimics the complex querying of user issues over a range of commits. In a test environment, this is crucial for simulating “found” vs “not found” states when rendering performance graphs or dashboards.The typical workflow involves initializing the mock within a test suite, setting expectations for method calls, and then injecting the mock into the high-level business logic.
Test Suite Component Under Test Mock Store | | | |---- 1. Setup Mock -------->| | | (NewStore) | | | | | |---- 2. Expect Save() ----->| | | | | |---- 3. Call Business Logic | | | (e.g. CreateIssue) -|---- 4. Call Save() ------>| | | | | |<--- 5. Return nil/err ----| | | | |<--- 6. Verify Result ------| | | | | |---- 7. Assert Expectations | | (Check if Save was called)
Store interface in the parent userissue package should be updated and the mock regenerated.NewStore constructor automatically registers a cleanup function using t.Cleanup. This ensures that AssertExpectations is called at the end of every test, preventing “silent failures” where a test passes even if a predicted database call never actually occurred.The sqluserissuestore module provides an SQL-backed implementation of the userissue.Store interface. Its primary purpose is to persist and retrieve associations between performance anomalies (identified by a trace and a specific commit) and external issue tracking IDs (specifically Buganizer).
By storing these relationships, the Perf system can contextualize automated performance data with human-reported issues, allowing the UI to overlay bug information directly onto graphs and alerts.
A key requirement of the store is fetching issue associations across a variable number of trace keys within a specific commit range.
text/template package to dynamically construct SQL queries for the GetUserIssuesForTraceKeys method.IN clauses require a specific number of placeholders corresponding to the slice length of input keys, templating allows the store to generate the correct number of $n placeholders at runtime while maintaining compatibility with prepared statement parameters to prevent SQL injection.The implementation relies on the database schema's composite primary key (trace key + commit position) to enforce data integrity.
Save method does not use an “upsert” (Insert or Update) logic. Instead, it performs a standard INSERT. If an association already exists for a specific trace at a specific commit, the database returns a constraint violation, which the store wraps as an error. This ensures that users do not inadvertently overwrite existing issue mappings without an explicit deletion or update workflow.Delete operation performs a lookup before executing the DELETE statement. This ensures the system can provide feedback if a user attempts to remove a record that doesn't exist, preventing silent failures in the UI.The store captures the current system time during the Save operation and persists it to the last_modified column. This centralizes the “last modified” logic within the store implementation, ensuring that even if the client doesn't provide a timestamp, the database reflects when the association was actually created or modified.
UserIssueStoreLocated in sqluserissuestore.go, this is the central struct that satisfies the userissue.Store interface. It wraps a pool.Pool connection to the database. Its methods translate high-level domain objects into SQL commands:
UserIssue record.listUserIssues TemplateThis SQL template handles the most complex query in the module. It filters the UserIssues table by a set of trace keys and a closed interval of commit positions (>= Begin and <= End).
When the Perf UI renders a graph containing multiple traces, it needs to know which data points have associated bugs. The data flows as follows:
[ Perf UI ] | | Request (Traces: ["A", "B"], Commit Range: 100-200) v [ UserIssueStore.GetUserIssuesForTraceKeys ] | |-- 1. Generate SQL Template: "SELECT ... WHERE trace_key IN ($1, $2) AND ..." |-- 2. Execute Query with trace keys and range parameters v [ SQL Database ] | |-- 3. Filter UserIssues table by PK components v [ UserIssueStore ] | |-- 4. Map SQL Rows to []userissue.UserIssue v [ Perf UI ] (Displays bug icons on relevant graph points)
sqluserissuestore.go: Contains the logic for CRUD operations and the SQL templates used to interact with the database.schema/: (Referenced by the store) Defines the table structure, ensuring that trace_key and commit_position act as the unique identifier for any given issue association.sqluserissuestore_test.go: Validates the store's behavior against a real SQL instance (typically Spanner for tests), ensuring that constraints are respected and queries return accurate data.The schema module defines the structural contract for persisting user-reported issue associations within the Perf backend. It serves as the single source of truth for the SQL table structure used by the sqluserissuestore.
The primary goal of this schema is to bridge the gap between performance anomalies (represented by a specific trace at a specific point in time) and external issue trackers (Buganizer). By maintaining this mapping, the system can overlay human-provided context onto automated performance graphs and reports.
The schema uses a composite primary key consisting of trace_key and commit_position. This choice reflects the functional requirement that an issue association is unique to a specific data point.
IssueId to be linked to multiple regressions across the system.The UserId field is explicitly included to capture the email of the person who created the association.
uber-proxy authentication layer. This provides an audit trail and allows the system to identify who is responsible for specific manual annotations.The LastModified field utilizes the TIMESTAMPTZ type with a default of now().
UserIssueSchemaLocated in schema.go, this struct defines the layout of the UserIssues table. It maps Go types to SQL definitions:
TraceKey (string) and CommitPosition (int) define where and when the issue occurred.IssueId (int) links the record to the external Buganizer ticket.UserId and LastModified provide context on the origin and age of the data.When a user identifies a performance change in the UI and associates it with a bug, the data flows as follows:
[ User Action ] | | (Auth: User Email) v [ Perf Frontend ] ----> [ sqluserissuestore ] | | (Maps struct to SQL) v [ SQL Database ] +---------------------------------------+ | Table: UserIssues | | PK: (trace_key, commit_position) | | Data: issue_id, user_id, last_modified| +---------------------------------------+
This schema ensures that if a user updates an existing association for the same trace and commit, the record is updated (or rejected depending on the store's upsert logic) rather than duplicated, maintaining a clean 1:1 mapping between data points and their primary associated issue.
The go/workflows module serves as the public interface and contract definition for Skia Perf's automated orchestration system. It defines the entry points for complex, long-running processes—such as performance bisection and culprit analysis—that are executed via the Temporal workflow engine.
The primary purpose of this module is to decouple the workflow callers from the workflow implementations. By providing standardized parameter structures and string-based workflow identifiers, it allows various parts of the Skia infrastructure to trigger orchestration logic without needing to import the heavy dependencies of the internal activity and workflow implementations.
The module defines constants for workflow names (e.g., ProcessCulprit, MaybeTriggerBisection). This design choice is critical for Temporal-based systems:
internal/ implementation code, which typically includes gRPC clients, Gerrit connectors, and complex business logic.Rather than passing loose variables, the module defines explicit Param and Result structs for every workflow.
AnomalyGroupServiceUrl). This pushes the responsibility of service location to the caller or the configuration layer, keeping the workflows themselves more generic and testable across different environments.workflows.go)This file acts as the “API header” for the orchestration layer. It defines two primary workflows:
MaybeTriggerBisection:
JobId (typically a Pinpoint Job ID) if a bisection was successfully triggered.ProcessCulprit:
CulpritIds and IssueIds generated during the persistence and notification phase.The following diagram illustrates how this module fits into the broader system architecture, acting as the bridge between the service triggering the work and the workers executing it:
[ Caller Service ] [ Temporal Cluster ] [ Perf Worker ] | | | | 1. Start Workflow | | | (using Param struct) | | +--------------------------->| | | | 2. Schedule Task | | +--------------------------->| | | | | | 3. Execute Implementation | | | (defined in /internal) | | |<---------------------------+ | 4. Return Result | | | (using Result struct) | | |<---------------------------+ |
internal/: Contains the actual Go logic for the workflows and activities. This is where the gRPC calls to Gerrit, Anomaly Group services, and Culprit services are implemented. It handles the “wait-and-retry” logic and the 30-minute aggregation period for anomalies.worker/: The executable entry point. It registers the implementations from internal/ against the names defined in workflows.go and listens on the Temporal Task Queue for incoming work.This module contains the internal Temporal workflow and activity implementations for the Perf orchestration system. It is responsible for the automated lifecycle of performance anomalies—from grouping and initial triage to triggering bisections and notifying users.
The module acts as the “glue” between various Skia Perf and Pinpoint services. It orchestrates complex, long-running processes that involve waiting for data stability, interacting with external gRPC services (Anomaly Group, Culprit, and Gerrit), and managing child workflows for performance bisection.
By leveraging Temporal, these workflows provide durability and fault tolerance for operations that can take hours or even days to complete (such as a Pinpoint bisection).
This is the primary entry point for processing a newly detected anomaly group. It manages the decision logic for how to handle performance regressions.
GroupAction type of the anomaly group, it branches:CulpritFinderWorkflow as a child workflow. It specifically uses the PARENT_CLOSE_POLICY_ABANDON policy to ensure bisections continue even if the triggering workflow completes.Invoked after a bisection successfully identifies a culprit, this workflow handles the “aftermath” of a find:
Activities wrap gRPC client calls to external services, providing retry logic and timeout management defined in options.go.
maybe_trigger_bisection.go (e.g., benchmarkStoriesNeedUpdate) mimics legacy Catapult dashboard behavior. It handles special cases for “System Health” benchmarks where story names require character replacement (e.g., _ to :) to remain compatible with Pinpoint expectations.max, std) from the chart name string. This is necessary because the Perf database often stores these as a combined string, while Pinpoint requires them as separate parameters.options.go defines strict 1-minute timeouts for gRPC-based activities to ensure the system doesn't hang on network issues, while allowing up to 12 hours for child workflows to accommodate the long compile and execution times of performance tests.The following diagram illustrates the flow of the MaybeTriggerBisectionWorkflow:
[ Start ] | v [ Sleep (30m) ] <-- Wait for more anomalies to group | v [ Load Anomaly Group ] | +----( GroupAction == BISECT? )----> [ Resolve Git Hashes ] | | | v | [ Trigger Pinpoint ] | | | v | [ Update Group w/ JobID ] | +----( GroupAction == REPORT? )----> [ Fetch Top 10 Anomalies ] | | | v | [ Notify User / Create Bug ] | | | v | [ Update Group w/ IssueID ] v [ End ]
The go/workflows/worker module implements the executable entry point for the Temporal worker responsible for executing Skia Perf's backend automation workflows. It serves as the bridge between the Temporal orchestration engine and the specific business logic required for anomaly detection, bisection triggering, and culprit management.
The primary design goal of this module is to provide a scalable, stateless execution environment. By decoupling the workflow definitions from the service that triggers them, the worker can be scaled independently to handle varying loads of performance analysis tasks.
The worker is designed as a long-running daemon that connects to a Temporal cluster. It registers a set of Workflows (stateful orchestrations) and Activities (idempotent units of work) and then listens to a specific task queue for instructions.
The worker establishes a connection to the Temporal service via a client.Dial. This connection is configured with custom metrics handling to export Temporal-specific telemetry to Prometheus, ensuring visibility into worker health, task latency, and execution success rates.
The worker registers several domain-specific activities and workflows. The registration process maps internal Go functions to string-based identifiers used by the Temporal cluster to route tasks.
ProcessCulprit and MaybeTriggerBisection. These functions orchestrate complex, long-running processes that might involve waiting for external signals or timers.CulpritServiceActivity, AnomalyGroupServiceActivity, GerritServiceActivity). These are the “muscles” of the system, performing side-effect-heavy operations such as querying databases or interacting with Gerrit for code reviews.This file acts as the lifecycle manager for the worker process. Its responsibilities include:
worker instance to the specific logic defined in the internal package. This creates a clear separation between the “runner” (this module) and the “logic” (the internal module).The following diagram illustrates how the worker interacts with the broader system:
+----------------+ +-------------------+ +-----------------------+
| Temporal Cloud | ----> | Worker Process | ----> | Internal Services |
| (Task Queue) | | (worker/main.go) | | (internal/activities) |
+----------------+ +---------+---------+ +-----------+-----------+
| |
| 1. Polls for Tasks |
|---------------------------->|
| |
| 2. Executes Activity/WF |
| <---------------------------|
| |
| 3. Reports Completion |
|---------------------------->|
The worker is packaged as a container (skia_app_container) named grouping_workflow. This naming reflects its primary responsibility: managing the lifecycle of anomaly groups and the resulting workflows that process potential performance culprits. In a production environment, this worker typically runs within Kubernetes, connecting to a centralized Temporal service.
The /images module serves as the centralized repository for graphical assets and brand identity markers used across the project. It provides a single source of truth for logos, icons, and UI-specific graphics, ensuring visual consistency across different sub-modules and user-facing components.
The module prioritizes scalability and cross-platform compatibility by primarily utilizing the SVG (Scalable Vector Graphics) format. This choice allows assets to be rendered at any resolution without loss of quality, which is critical for high-DPI displays and varied UI contexts.
A notable implementation pattern within this module is the use of SVG wrappers for raster data. Files such as androidx.svg, flutter.svg, and fuchsia.svg contain Base64-encoded PNG data embedded within an SVG <image> tag. This approach was chosen for several reasons:
viewBox and preserveAspectRatio attributes on the SVG wrapper ensure that raster logos are displayed consistently, regardless of the container's constraints.skia.svg or widevine.svg) directly within the asset file.While SVG is the preferred format for logos, the module includes other formats based on specific use cases:
germanium.webp or alpine.png) where vectorization would be inefficient or impossible.line-chart.svg use pure path data for lightweight, performant UI decorations.The module is responsible for organizing assets into three functional categories:
line-chart.svg that are used for internal data visualization or navigational markers.Assets are exposed to the rest of the build system through a central configuration that designates which files are available for external reference. This prevents accidental internal dependency on draft or temporary assets.
[ Feature Module ] ----> [ Request: v8.svg ] | v [ /images Module ] <--- [ BUILD.bazel Exports ] | +-- (SVG Vector Processing) --> Rendered Icon | +-- (SVG Raster Decoding) --> Embedded Bitmap
The use of the exports_files directive in the module's build configuration facilitates this, allowing other packages to consume these specific images as labels without needing access to the entire directory.
The /integration module provides a controlled set of performance data used to verify the Perf ingestion pipeline and integration features. It serves as a bridge between raw performance results and the high-level analysis tools by providing a predictable, historical baseline of metrics tied to a specific demonstration repository.
The module is designed around the principle of traceable performance evolution. Rather than providing static benchmarks, it provides a sequential history that mirrors a real-world software lifecycle.
git_hash values from the perf-demo-repo, the module allows the system to test its ability to identify performance shifts across commits.min and max values for a single metric allows for testing variance detection (jitter), while splitting memory metrics into kb (size) and num (count) allows for testing the detection of different types of resource leaks.generate_data.go)This utility is responsible for maintaining the consistency of the integration test suite. It programmatically generates the JSON artifacts to ensure they adhere to the format.Format schema used by the Perf ingester.
/data)The data directory acts as a mock “filestore” that an ingester of type dir would monitor.
arch: x86, config: 8888) and a specific functional test (test: encode).malformed.json and files with unknown git hashes (e.g., ffff...) to verify that the system correctly identifies and reports data quality issues without halting the ingestion of subsequent valid files.The following diagram shows how this module interacts with the broader Perf ecosystem during an integration test:
[ generate_data.go ]
|
| (creates)
v
[ /integration/data/ ] <---------- [ Ingester ('dir' type) ]
| |
| (scans filesystem) | (parses & validates)
v v
[ Malformed/Bad Hash ] [ Valid Commit Data ]
| |
+--> Log Error +--> Map to Git History
+--> Continue Processing +--> Update Trace Store
+--> Detect Regressions
The data structure within this module follows a specific hierarchy to support complex queries:
Key (e.g., arch, config) defines the environment. Design-wise, this allows the system to separate “what” was tested from “where” it was tested.test: encode), allowing one file to contain multiple independent benchmarks.ns, alloc). Each type contains an array of SingleMeasurement objects, distinguishing between different units or statistical bounds (min, max, count) for that specific metric.This module serves as a historical repository of performance benchmarks and system metrics, indexed by Git commit hashes. It provides the ground-truth data necessary for detecting regressions, analyzing performance trends over time, and verifying the integration pipeline's ability to handle various data states.
The data is structured to facilitate automated comparison between software iterations. By decoupling the performance results from the source code and storing them as static JSON artifacts, the system achieves several design goals:
git_hash. This allows the integration engine to map performance spikes or memory leaks directly to specific changes in the codebase.key object (containing fields like arch and config) ensures that measurements are not analyzed in a vacuum. It acknowledges that performance is hardware- and configuration-dependent, allowing the consumer to filter results for “apples-to-apples” comparisons.alloc (memory footprint) and ns (timing). Each category supports multiple values (e.g., min, max, kb, num), enabling a nuanced view of system behavior, such as identifying increased jitter even if average latency remains stable.version field at the root level allows the integration logic to evolve. If the measurement format changes, the parser can handle legacy data files (like those found in this module) without breaking the analysis pipeline.The module's contents represent a sequence of snapshots (demo_data_commit_1.json through demo_data_commit_10.json) showing the evolution of a specific test case, such as the “encode” operation.
Measurements are stored in nested arrays to allow for extensibility. For instance, the alloc measurement tracks both the size of memory used (kb) and the count of allocations (num). This distinction is critical for identifying “death by a thousand cuts” scenarios where total memory usage is low, but high allocation frequency causes CPU overhead.
The presence of malformed.json is a deliberate implementation choice for integration testing. It serves as a negative test case to ensure that any data ingestion service or parser can gracefully handle and report syntax errors in the data stream without crashing the monitoring pipeline.
The following diagram illustrates how the data in this module is intended to be consumed by an integration or monitoring service:
[ Git Commit ] ----> [ Run Benchmarks ] ----> [ Generate JSON ]
|
v
[ Integration Data ] <----------------------- [ /integration/data/ ]
|
+--> Compare current git_hash results against previous hashes
+--> Validate "measurements" (e.g., did "num" of allocs increase?)
+--> Trigger alerts if metrics exceed defined thresholds
results array, these define the specific functional area being tested (e.g., test: encode). This allows a single commit file to store data for multiple distinct sub-systems.min and max values for nanosecond (ns) timing, the data supports variance analysis. A significant widening of the gap between min and max across commits (as seen between commit 1 and commit 10) indicates decreasing stability in the code path.links field is reserved for cross-referencing external artifacts, such as detailed profiling traces or build logs, though it remains null in the baseline demo sets.The /jupyter module provides an interface for performing advanced data analysis and visualization of Skia performance data. By leveraging Jupyter Notebooks, it allows developers to move beyond the standard Skia Perf web UI to perform complex calculations, statistical modeling, and custom plotting using the Python data science stack.
The primary goal of this module is to bridge the gap between the Skia Performance monitoring system (perf.skia.org) and the analytical power of tools like Pandas, NumPy, and Matplotlib.
While the standard Perf UI is excellent for discovering regressions and viewing individual traces, it is not designed for “bulk” analysis—such as calculating the ratio of GPU to CPU performance across hundreds of tests or finding which hardware models exhibit the most noise (coefficient of variation). This module provides the glue code to fetch data from Perf's backend and load it into a Pandas DataFrame for such tasks.
The implementation centers around an asynchronous request-and-poll pattern to interact with the Skia Perf API.
Accessing data follows a specific sequence to ensure the notebook remains responsive and handles the potentially large datasets stored in Perf:
/_/initpage/ endpoint. This is a design choice to automatically discover the current “window” of data (the most recent commits) and the available paramset (all valid keys and values like model, test, device, etc.)./_/frame/start. This does not return data immediately; instead, it triggers a long-running query on the server and returns a unique ID./_/frame/status/<id> until the server reports success. This prevents notebook timeouts during heavy calculations on the server side.1e32) by converting them to NaN (Not a Number), ensuring that standard statistical functions like .mean() or .std() work correctly without being skewed by invalid data points.Perf+Query.ipynb)The logic is encapsulated in two primary entry points that abstract away the HTTP communication:
perf_query(query): Used for selecting raw traces based on metadata (e.g., source_type=skp&sub_result=min_ms). This is the programmatic equivalent of the “Query” dialog in the Perf UI.perf_calc(formula): Used for server-side processing using Skia Perf's functional query language. This allows the server to perform operations like ave(), count(), or ratio() before sending the result to the notebook, which is more bandwidth-efficient than downloading all raw data.README.md)Because data science dependencies (like scipy and matplotlib) can be sensitive to system-level Python versions, the module advocates for a Virtualenv-based deployment. This ensures that the analytical environment remains isolated from the system's Python installation and that all required libraries are pinned to versions compatible with the provided notebooks.
This diagram illustrates how data flows from the Skia Perf servers into a local visualization.
[ Jupyter Notebook ] [ Skia Perf Server ] | | |--- 1. POST (Query/Formula) ->| | |-- 2. Process Request --| |<-- 3. Return Query ID -------| | | | | |--- 4. GET (Poll Status) ---->| | |<-- 5. "Still Working" -------| | | ... | | |--- 6. GET (Poll Status) ---->| | |<-- 7. "Success" -------------| | | | | |--- 8. GET (Fetch Results) -->| | |<-- 9. JSON Traceset ---------| | | | [ Parse JSON to Pandas ] | [ Generate Matplotlib Plot ]
The module provides pre-configured examples for common “Why” questions:
The /lint module provides a specialized reporting interface for static code analysis, specifically designed to integrate with JSHint. Its primary purpose is to bridge the gap between raw analysis data and a human-readable, machine-parseable terminal output. Instead of relying on default verbose formats, this module implements a custom reporting logic that prioritizes clarity and precision in locating syntax errors or stylistic inconsistencies.
The core design philosophy behind this module is minimalist observability. In many build environments, linting output can become cluttered with metadata that obscures the actual location of a bug. The implementation in /lint/reporter.js focuses on a “one-line-per-error” strategy.
By standardizing the output format to file:line:character reason, the module ensures that:
reporter.js)The module exports a single reporter function expected by the JSHint API. Its responsibility is to iterate through a collection of error objects and transform them into a cohesive string buffer.
console.log for every individual error—which can lead to performance bottlenecks and interleaved output in asynchronous environments—the module aggregates all results into a single string.process.stdout.write. This choice avoids the trailing newline logic inherent in console.log, allowing the module complete control over the vertical spacing of the final report.The following diagram illustrates how data flows from the static analysis tool through this module to the user's terminal:
[ JSHint Engine ] | | (Raw Result Array) v [ /lint/reporter.js ] | |-- For Each Result: | Extract: { file, line, character, reason } | Format: "file:line:char reason" | |-- Finalize: | Append Total Count Summary v [ System Stdout ] -> (Displayed to Developer)
reporter.js: Functions as the primary entry point. It contains the transformation logic that maps JSHint's internal error representation to the standardized string format used across the project. It is responsible for the final presentation layer of the linting process.The /modules directory contains the frontend architecture of the Skia Perf application. It is built as a collection of modular Custom Elements (using Lit) and specialized utility libraries that coordinate performance data querying, time-series visualization, and anomaly triage.
The module architecture is designed to handle massive-scale performance telemetry by separating data management from visual presentation. The system revolves around three core pillars:
DataFrame objects (time-series) and reflecting application state (queries, zoom levels) in the browser URL.stateReflector to ensure that every view—including specific zooms, selected traces, and active filters—is shareable via a deep link.progress polling mechanism to keep the UI responsive.DataFrame and AnomalyMap data across deeply nested component trees without “prop-drilling.”Explore or Triage) are composed of smaller, reusable primitives (like query-chooser-sk or triage-status-sk).The charting infrastructure is split between raw data processing and visual rendering.
dataframe: Manages the lifecycle of performance data. It handles joining multiple data chunks, padding missing values with MISSING_DATA_SENTINEL, and providing the data to charts via context.plot-google-chart-sk: The primary interactive chart. It uses a layered approach where the lines are SVG-based (Google Charts), but interactive elements like anomalies and bug icons are HTML overlays to maintain performance during panning.explore-simple-sk: The central orchestrator for data exploration. It combines the chart, a navigation summary (plot-summary-sk), and the query interface.plot-summary-sk: Provides a “bird's-eye view” of long-range data. It implements Min-Max downsampling to ensure peaks and valleys remain visible even when thousands of points are condensed into a small sparkline.These modules facilitate the transition from identifying a “spike” to resolving a performance regression.
anomalies-table-sk: A sophisticated management table that groups related anomalies (e.g., by benchmark or revision range) to allow for bulk triaging.triage-menu-sk: A contextual popup used to “nudge” anomaly boundaries, ignore false positives, or initiate bug filing.new-bug-dialog-sk & existing-bug-dialog-sk: Specialized modals that automate the boilerplate of reporting issues by pre-filling titles and metadata derived from the anomaly's trace parameters.bisect-dialog-sk & pinpoint-try-job-dialog-sk: Integration points for Chrome-specific debugging tools, allowing users to trigger A/B bisections directly from a regression point.Navigating millions of traces is handled through hierarchical and summary-based components.
test-picker-sk: A “drill-down” interface that guides users through valid parameter combinations (e.g., selecting a benchmark reveals only the bots that ran it).query-sk & query-chooser-sk: The standard multi-select filter interface used to build complex trace queries.paramset-sk: A read-only visualization of a query, used to summarize what data an alert or a graph is currently showing.perf-scaffold-sk: The master template providing navigation, theme switching (dark/light mode), and authentication integration. It supports both a “Legacy” sidebar and a modern “V2” header layout.telemetry: A buffered reporting system that tracks application performance and user actions, flushing data in batches to minimize network overhead.common: A utility layer containing the ShortcutRegistry for global hotkeys (e.g., p for positive triage) and plot-builder for transposing backend data into chartable formats.The system uses a reactive loop to update visualizations as users filter data.
[ User Interaction ] -> [ test-picker-sk ] -> [ Update URL State ] | | v v [ explore-simple-sk ] <--- [ query string ] <--- [ stateReflector ] | |-- 1. requestFrame() (via DataService) |-- 2. startRequest() (Polling progress) |-- 3. merge results into [ DataFrameRepository ] |-- 4. render [ plot-google-chart-sk ]
Sheriffs move from high-level alerts to specific code changes through integrated navigation.
[ triage-page-sk ] (Matrix of commits vs alerts) | |-- Click Status Icon --> [ cluster-summary2-sk ] (View Centroid) | +-----------------------------------+-----------------------------------+ | | | [ Triage Action ] [ Investigation ] [ External Link ] (Mark Pos/Neg) (View on Dashboard) (Link to Gitiles) | | | v v v [ POST /_/triage/ ] [ explore-simple-sk ] [ Source Browser ]
Because the UI combines SVG charts and HTML overlays, many modules (like chart-tooltip-sk and plot-google-chart-sk) perform “Pixel to Data” translations.
[ Mouse Hover X/Y ] | v [ ChartLayoutInterface ] -> [ Data Value (Commit/Date) ] | v [ lookupCids() ] ----------> [ Git Hash / Author / Message ] | v [ commit-range-sk ] -------> [ HTML Link to Source ]
json: Contains the “Source of Truth” for data structures, automatically generated from the Go backend to ensure type safety across the network.cid: A specialized resolution service that translates sequential CommitNumbers (used for storage efficiency) into full Commit metadata.themes: A delta-based styling layer that extends the shared Skia infrastructure with Perf-specific color palettes and spacing resets.errorMessage: A global utility that captures both application errors and network failures, displaying them in a persistent <error-toast-sk> until dismissed.The alert-config-sk module provides a comprehensive configuration interface for managing performance regression alerts in the Perf system. It allows users to define which traces to monitor, how to detect anomalies, and what actions to take when a regression is identified.
This element serves as the primary editor for Alert configurations. It maps complex JSON configuration objects to a user-friendly form, handling the conditional logic required by different detection algorithms and notification strategies.
The design emphasizes data binding between the UI and a central Alert object. Changes in the UI immediately update the underlying object, which can then be persisted to the backend.
Alert and ParamSet)The module's primary inputs are the config (the alert definition) and the paramset (the available keys and values in the performance database).
config: An object implementing the Alert interface. The element provides setters/getters that ensure default values (like radius or interesting thresholds) are populated from global settings if missing.paramset: Used to populate the query-chooser-sk and the “Group By” multi-select options, allowing the user to filter traces based on actual metadata present in the system.The complexity of regression detection is managed through two coordinated selections:
algo-select-sk): Determines if traces are clustered (K-Means) before analysis or if each trace is analyzed individually.select-sk): Allows the user to choose the mathematical model for finding regressions (e.g., Cohen's d, Mann-Whitney U, or Absolute magnitude).The UI dynamically updates the Threshold label and units based on the selected Step Detection algorithm using a thresholdDescriptors map. This ensures users provide inputs that make sense for the chosen math (e.g., “standard deviations” for Cohen's d vs “alpha” for Mann-Whitney).
The element's layout changes based on the global window.perf configuration and user selections:
window.perf.notifications is set to html_email or markdown_issuetracker, the element displays either email recipient fields or Issue Tracker component IDs.window.perf.need_alert_action is enabled, it exposes options for automated behaviors like filing bugs or triggering Pinpoint bisections./_/alert/bug/try and /_/alert/notify/try) before saving the config.The element uses a “top-down” data flow for configuration and “bottom-up” for updates via event listeners:
[Parent Component]
| (sets .config and .paramset)
v
[alert-config-sk]
|
+-- @input / @change events --> [Updates internal _config]
|
+-- [query-chooser-sk] --------> (updates _config.query)
|
+-- [algo-select-sk] ----------> (updates _config.algo)
alert-config-sk.ts: Contains the main logic for the Lit-based element, including the conditional rendering logic and API calls for testing templates.alert-config-sk.scss: Defines the layout, ensuring that nested controls (like spinners and labels) are indented and styled consistently with the Perf theme.alert-config-sk-demo.ts: Provides a sandbox for testing various UI states (e.g., toggling “Group By” or switching between email/issue tracker notifications) without a full backend.window.perf for environment-specific flags. This allows the same UI component to behave differently across different Perf instances (e.g., some instances might not support bisection).\d+) and triggers an errorMessage toast on invalid input to prevent malformed data from being sent to the server.connectedCallback uses _upgradeProperty for config and paramset. This ensures that if the properties were set before the custom element was defined, the values are correctly captured and rendered.The alerts-page-sk module provides a comprehensive interface for managing performance alert configurations within the Perf application. It allows users to view, create, edit, and archive alert rules that monitor trace data for anomalies.
The module is designed around a centralized management table. It acts as a bridge between the backend alert storage and the alert-config-sk component, which handles the complex logic of individual alert parameterization.
Key design choices include:
alogin-sk module to determine if a user has the “editor” role. Actions like “New”, “Edit”, and “Delete” are restricted or disabled for non-editors.<dialog> element. This dialog wraps the alert-config-sk element, ensuring a consistent experience between creating a brand-new alert and modifying an existing one.window.perf configurations (e.g., changing “Alert” to “Component” if issue tracker integration is enabled).alerts-page-sk.ts: The core logic of the page. It manages the lifecycle of the alert list, including fetching data from /_/alert/list/, handling the state of the editing dialog, and performing CRUD operations via fetch requests to the backend.alerts-page-sk.scss: Defines the layout for the management table, specifically handling overflow and ellipsis for long query strings to ensure the table remains readable even with complex alert rules.alerts-page-sk-demo.ts: Provides a robust mocked environment for development, simulating various backend responses for alert lists, login statuses, and trace counts.When a user interacts with the alert list, the module manages the state transition from a read-only list to an interactive configuration form.
[Alerts Table] --(Click Edit)--> [Fetch Current Config] | v [List View] <---(Cancel)--- [Modal Dialog (alert-config-sk)] ^ | | (Modify & Accept) | | +-------(Post to /update) <-------+
The module supports deep linking. If the page is loaded with a search query (e.g., /a/?5646874153320448), the openOnLoad method automatically identifies the matching alert and opens the edit dialog immediately upon data retrieval.
Every alert in the table includes a “Dry Run” link. This utilizes the dryrunUrl helper to convert the alert's configuration into a URL query string, redirecting the user to the Explore page (/d/) to visualize exactly what data the alert would trigger on before saving changes.
alert-config-sk: Used as the internal editor for alert details.paramset-sk: Used in the table to provide a summarized view of the alert's query./_/alert/list/{showDeleted}: Retrieves the set of alerts./_/alert/new: Fetches a default skeleton for a new alert./_/alert/update: Saves a modified or new alert./_/alert/delete/{id}: Archives an alert.The algo-select-sk module provides a custom UI component for choosing between different anomaly detection or clustering algorithms in Perf. It acts as a specialized wrapper around the generic select-sk component, providing a type-safe and domain-specific interface for algorithm selection.
The module is designed to bridge the gap between low-level UI selection (indexes) and high-level application logic (algorithm names).
The component uses the algo attribute/property as its source of truth. It supports two primary algorithms defined in the ClusterAlgo type:
To ensure robustness, the component implements a fallback mechanism. Any invalid string provided to the algo attribute is automatically coerced to kmeans via the internal toClusterAlgo utility.
Instead of exposing the raw select-sk child, algo-select-sk encapsulates the selection logic. It listens for selection-changed events from its internal select-sk element, maps the selected index to a ClusterAlgo value, and dispatches a domain-specific algo-change event.
[ User Clicks ] -> [ select-sk (index) ] -> [ algo-select-sk (mapping) ] -> [ algo-change Event ]
Located in algo-select-sk.ts, this is the main class for the element.
algo state. Updating the property updates the attribute and triggers a re-render.lit to render a select-sk containing two options. It uses the ?selected directive to synchronize the internal state of the options with the component's algo property._selectionChanged method translates the numerical index from the underlying selector into a string value (kmeans or stepfit) by querying the value attribute of the child div elements.AlgoSelectAlgoChangeEventDetail:{ algo: 'kmeans' | 'stepfit'; }
algo-select-sk-demo.html and .ts show the component in various states (default, pre-selected, and dark mode) and log event details to the screen when selections change.algo-select-sk_test.ts validates the attribute-to-property reflection, the fallback logic for invalid inputs, and the correct dispatching of events.algo-select-sk_puppeteer_test.ts performs visual regression testing using Puppeteer to ensure the component renders correctly and responds to clicks in a real browser environment.The anomalies-table-sk module provides a comprehensive, interactive table for visualizing, grouping, and triaging performance anomalies detected in the Perf system. It serves as a central hub for users to review regression and improvement alerts, manage associated bugs, and navigate to detailed graphical reports.
The primary component, AnomaliesTableSk, renders a list of anomalies and provides tools to manipulate their presentation. Rather than a flat list, the table utilizes a sophisticated grouping logic to combine related anomalies, reducing visual clutter and allowing bulk actions.
SelectionController, AnomalyGroupingController, ReportNavigationController) to manage complexity.anomalies-table-sk.ts)The main UI element. It orchestrates the rendering of table rows, handles keyboard shortcuts (like p for filing a bug or g for graphing), and manages the “Triage Selected” popup. It delegates data processing to sub-controllers while maintaining the visual state of the table (expanded/collapsed groups).
anomaly-grouping-controller.ts)Manages how anomalies are aggregated into table rows. It persists user preferences for grouping (e.g., “Group by Benchmark” or “Exact Revision Match”) in localStorage.
The grouping logic follows a specific hierarchy:
EXACT: Ranges must be identical.OVERLAPPING: Ranges that share any commit.ANY: All anomalies are considered a single group.BENCHMARK, BOT, or TEST.report-navigation-controller.ts)Handles the transition from the table to the “Explore” (graphing) pages. It manages:
/_/anomalies/group_report API to obtain a Session ID (SID) which represents the collection.anomaly-transformer.ts)A utility class responsible for converting raw data into displayable strings and determining summary values for collapsed groups.
* (e.g., test1/sub1 and test1/sub2 become test1/sub*).anomalies-grouping-settings-sk.ts)A configuration panel embedded within the table header that allows users to toggle the grouping criteria and revision modes described above.
The table uses a SelectionController to track which anomalies are currently active. Selection state flows from the UI to the controller, which then triggers a re-render to update checkbox states (including indeterminate states for partially selected groups).
User Interaction (Checkbox Click) | v SelectionController updates Set<Anomaly> | v LitElement (Table) requestsUpdate() | +-----> Update Header Checkbox (All/None/Indeterminate) +-----> Update Group Summary Checkboxes +-----> Update Action Buttons (Triage/Graph Enabled State)
When a user triages a group or selection, the table interacts with the TriageMenuSk.
[Select Anomalies] -> [Click Triage Selected] -> [triage-menu-sk appears] | +----------------------------------------------+ | | | [File New Bug] [Existing Bug] [Ignore Anomaly] | | | Opens Dialog Lists Associated Sends 'RESET' or Issues from API 'IGNORE' to backend
Clicking the “Chart” icon or the “Graph Selected” button initiates a navigation workflow:
timerange_map for the selected anomalies.ReportNavigationController calls /_/shortcut/update to store the specific graph configurations./m/?shortcut=[id]&begin=[start]&end=[end].The anomaly-playground-sk module provides an interactive environment for testing and tuning anomaly detection algorithms within the Perf ecosystem. It serves as a “sandbox” where developers and data scientists can input arbitrary trace data, apply various statistical detection methods, and visualize the results in real-time without needing to modify production alerts or wait for new data ingestion.
This module bridges the gap between algorithm development and visualization. It wraps a specialized instance of the explore-simple-sk component to provide a familiar graphing interface, while adding a control panel for manual data entry and parameter manipulation.
The primary goal is to allow users to answer questions like:
mannwhitneyu algorithm with a threshold of 3.0?”Unlike the main Explore page which queries a backend database for historical traces, the playground allows for direct manual input via a comma-separated list of values. This design choice facilitates rapid prototyping of edge cases. When a user inputs data:
DataFrame.CommitNumber and TimestampSeconds headers for each data point to satisfy the requirements of the graphing engine.,name=playground,) to the data.The module leverages explore-simple-sk as its visualization engine rather than re-implementing graphing logic. To make it behave like a “playground” rather than a search tool, several features of the child component are programmatically disabled or hidden:
openQueryByDefault is set to false (no need to search a database).showHeader and navOpen are disabled to maximize space for the playground controls.disablePointLinks is enabled because synthetic data points do not link to real Git commits.The component uses stateReflector to sync the current playground configuration (the trace string, algorithm, radius, threshold, etc.) with the URL's query parameters. This allows researchers to share a specific “scenario” by simply copying and pasting the URL.
The workflow follows a standard Input -> Configure -> Request -> Visualize cycle:
[ User Input ] ----> [ Input Parser ] ----> [ Local Plotting ] | | | v | [ explore-simple-sk Graph ] v ^ [ Param Controls ] | (Algo, Radius, etc) | | | v | [ "Detect" Click ] --> [ Backend API Request ] ---+ (/_/playground/anomaly/v1/detect)
DataFrame.Anomaly objects.anomalymap, determines if they are “improvements” based on the selected direction, and calls UpdateWithFrameResponse on the graph to render the familiar red/grey circles on the trace.anomaly-playground-sk.ts: The main logic hub. It manages the lifecycle of the synthetic DataFrame, handles synchronization between the UI inputs and the URL state, and coordinates communication with the detection API.explore-simple-sk (External Dependency): While not in this directory, it is the primary visual dependency. The playground acts as a controller for this component, feeding it data and anomaly maps manually.anomaly-playground-sk-demo.ts: Provides a mocked environment for local development, simulating the backend responses for detection and frame updates.The module supports several detection algorithms via the StepDetection type:
absolute, const, percent, cohen, mannwhitneyu.UP) or decrease (DOWN) in value is treated as a regression or an improvement.The bisect-dialog-sk module provides a specialized modal dialog used in the Perf UI to initiate performance bisection jobs (Pinpoint) for Chrome performance regressions. It captures necessary metadata from a performance anomaly—such as test paths and revision ranges—and submits a request to create a bisection job to identify the root cause of a regression.
The bisection logic within this module is specifically tailored for the Chrome performance testing ecosystem. This is reflected in how it parses “test paths” and maps them to specific bisection parameters like benchmark, configuration, and story.
A significant portion of the logic in bisect-dialog-sk.ts involves decomposing a single testPath string into the structured fields required by the Pinpoint bisection API.
Master/Bot/Benchmark/Chart/Story).avg, max, std). If found, it separates the statistic from the chart name to ensure the bisection job monitors the correct metric.:) with underscores (_) in the story field. This choice was made to reduce errors when querying test paths in legacy data tables.Bisection is a resource-intensive operation. The module utilizes the alogin-sk infrastructure to verify the user's identity and roles before allowing a submission. If a user is not logged in or lacks the necessary permissions, the dialog prevents the request and surfaces an error message.
This is the primary implementation file. It defines the BisectDialogSk class, which handles:
startCommit, endCommit, bugId, and the resulting jobUrl.setBisectInputParams method allows parent components (like a chart tooltip or an anomaly list) to populate the dialog with context-specific data before opening.POST request to /_/bisect/create and handles the asynchronous response, displaying a direct link to the created Pinpoint job upon success.The UI is built using lit-html and styled with Scss. It provides a clean, form-based layout within a <dialog> element.
spinner-sk provides visual feedback during the bisection request.The typical lifecycle of a bisection request through this module is as follows:
[ External Component ] --(testPath, revisions)--> [ bisect-dialog-sk ]
|
[ .open() called ]
|
<-- User edits/reviews form -->
|
[ .postBisect() ]
|
--------------------------------------------------------------------------------------
| | |
[ Validation Fails ] [ Network Request ] [ Auth Fails ]
| | |
[ Show error-sk ] [ /_/bisect/create ] [ Show error ]
|
------------------------------
| |
[ Success (200) ] [ Failure (5xx) ]
| |
[ Display Pinpoint Link ] [ Show error-sk ]
setBisectInputParams(params: BisectPreloadParams) to populate the dialog.open() to display the modal to the user.errorMessage utility to communicate failures to the user.The bug-tooltip-sk module provides a specialized custom element designed to display a summary of bugs (typically regressions) associated with a data point or alert. It balances a minimal UI footprint with quick access to detailed external bug tracking information.
The module is built as a hover-triggered informational component. Instead of cluttering the main interface with long lists of bug IDs, it displays a concise count and reveals a detailed list only when the user expresses interest by hovering over the element.
Key implementation choices include:
createRenderRoot() { return this; }, meaning it renders directly into the light DOM. This choice simplifies global styling and ensures that the absolute positioning of the tooltip behaves predictably relative to its parent containers in the Perf UI.:hover states on the .bug-count-container rather than JavaScript event listeners. This reduces the overhead of the component and ensures high performance during rapid UI interactions.http://b/ shortcut, optimized for internal issue tracking workflows.This file defines the BugTooltipSk LitElement. Its primary responsibility is to transform an array of RegressionBug objects into a readable summary.
bugs property. If the list is empty, the entire component is hidden via the hidden attribute to save space.totalLabel property allows consumers to change the suffix of the count (e.g., “with 2 regressions” vs “with 2 total”), making it reusable across different alert types.The stylesheet manages the complex positioning and transition logic for the tooltip.
bottom: 125% of the container, ensuring it pops up above the text.max-height and features overflow-y: auto. This prevents the tooltip from expanding beyond the viewable area of the Perf content pane.0.7s opacity transition is applied to provide a smooth “fade-in” effect when the user hovers over the bug count.This file provides the Page Object (BugTooltipSkPO) for the module. It abstracts the DOM structure for integration tests, allowing tests to verify:
The following diagram illustrates how the component handles user interaction to reveal bug data:
[Data Input] -> [bugs: RegressionBug[]] | v +---------------------+ | Is bugs.length > 0?| -- No --> [Render Nothing] +---------------------+ | Yes v +------------------------+ | Render: "with X total" | +------------------------+ | [User Hover Action] | v +------------------------+ | CSS: opacity 0 -> 1 | | CSS: visibility: vis | +------------------------+ | +------------------------+ | Rendered List: | | - ID (Link to b/ID) | | - Type (Source) | +------------------------+
The component expects the bugs property to conform to the RegressionBug interface (imported from the central JSON definitions), which requires:
bug_id: The numeric identifier for the bug.bug_type: A string indicating the origin or category of the bug (e.g., “monorail”).The calendar-input-sk module provides a hybrid date-selection component. It combines a manual text input field with a graphical calendar picker, ensuring that users can either type a specific date quickly or browse a calendar for context.
The component is designed around the principle of flexibility and validation. It acknowledges that while calendar pickers are user-friendly for relative date selection (e.g., “next Thursday”), manual entry is often faster for absolute date entry (e.g., “1995-03-12”).
<input type="text"> restricted by a regex pattern (YYYY-MM-DD). This provides a lightweight way to enter dates without requiring the heavy overhead of native browser date pickers, which can vary significantly in behavior and styling across platforms.date-range-icon-sk) that activates the graphical selection interface.<dialog> element containing a calendar-sk component. Using a native dialog allows the component to leverage built-in browser features for modal behavior, such as focus trapping and the “Esc” key to close.The component synchronizes state between the text field and the calendar widget:
[ User Input ] --> [ Regex Validation ] --(valid)--> [ Update Internal Date ] --(emit)--> [ input event ]
| ^
| |
[ Click Icon ] --> [ Open Dialog ] --> [ Select Date in Calendar ] --(close)--> [ Update Input Value ]
input event. It validates the string against the required pattern. If valid, it parses the date and updates the internal state.calendar-input-sk manages this interaction using a Promise-based approach. The openHandler awaits a Promise that is resolved when a date is picked in the sub-component (calendar-sk) or rejected if the user cancels.calendar-sk element, allowing users to navigate the calendar grid using arrow keys even though the dialog has focus.type="date", which often forces a specific UI localized by the browser, this component uses type="text" with a pattern. This ensures a consistent look and feel across all browsers while still providing immediate feedback via CSS (using the :invalid pseudo-class) when the format is incorrect.displayDate property acts as the single source of truth. Setting this property triggers a re-render of the input value and updates the state of the underlying calendar-sk widget.input event containing the selected Date object in the detail field. This mirrors the standard input behavior while providing a rich data type to the consumer. It explicitly stops propagation of internal native input events to prevent confusing them with the component's own semantic “date changed” event.The component uses scoped CSS to handle validation states. When the input's regex pattern is not met, an “invalid” indicator (an “✘” mark) is displayed via CSS transitions:
input:invalid + .invalid selector allows for a CSS-only toggle of error messages, minimizing the amount of manual DOM manipulation required in the TypeScript logic.perf/modules/themes variables to ensure the dialog and input colors remain consistent with the overall application theme (supporting both light and dark modes).The calendar-sk module provides a custom web component that displays an accessible, themeable, and localized calendar for selecting a single date. It is designed to overcome the limitations of the native <input type="date"> (specifically Safari compatibility and lack of styling) and other third-party libraries that may be inaccessible or difficult to theme.
CalendarDate helper class and local time manipulation to ensure that date selection is predictable and avoids the common pitfalls of UTC vs. local time offsets.aria-live regions for month/year changes, aria-selected states for the current selection, and a robust focus management system.Intl.DateTimeFormat API. This allows the calendar to automatically adapt its labels (e.g., “January” vs “一月”) and weekday headers based on the provided locale property.keyboardHandler method. This allows the parent application to decide when the calendar should respond to input (e.g., only when a specific dialog is open).calendar-sk.tsThis is the core logic of the module. It defines the CalendarSk class, which extends ElementSk and uses lit-html for rendering.
_displayDate which determines which month is currently visible.calendar-sk.scssStyles the calendar using CSS variables for themeability (e.g., --background, --secondary, --surface-1dp). It ensures that the calendar buttons are uniform and that the “today” and “selected” states are visually distinct.
The component communicates state changes to the rest of the application via a standard DOM event:
change: Fired whenever a user selects a date. The detail property of the event contains the selected Date object.The user can navigate through time using UI buttons or keyboard shortcuts. When a date is selected, the component updates its internal state and notifies listeners.
User Action Component Logic UI Update ----------- --------------- --------- Click "Next Month" -> incMonth() -----------------> Re-renders table grid Press "ArrowRight" -> keyboardHandler(incDay()) --> Updates focus & aria-selected Click a Day Button -> dateClick() ----------------> Dispatches 'change' event
When the keyboardHandler is active, the following shortcuts are supported:
| Key | Action |
|---|---|
ArrowLeft / ArrowRight | Move back/forward one day |
ArrowUp / ArrowDown | Move back/forward one week |
PageUp / PageDown | Move back/forward one month |
In a parent component or page, you can initialize the calendar and hook into its events:
const calendar = document.querySelector('calendar-sk'); // Set initial date and locale calendar.displayDate = new Date(); calendar.locale = 'en-US'; // Listen for selection calendar.addEventListener('change', (e) => { console.log('New date selected:', e.detail); }); // Proxy keyboard events from a container window.addEventListener('keydown', (e) => calendar.keyboardHandler(e));
The chart-tooltip-sk module provides a rich, interactive tooltip designed for performance charts. It serves as the primary interface for users to inspect specific data points, view commit metadata, triage anomalies, and initiate debugging workflows like bisections.
Unlike a simple text tooltip, chart-tooltip-sk is a complex orchestrator that aggregates data from multiple sources (dataframes, anomaly maps, and backend CID handlers). It is designed to be dynamically positioned over a chart and provides contextual actions based on the nature of the selected data point (e.g., whether it is a single point, a range, or a detected anomaly).
triage-menu-sk for filing bugs or ignoring the regression.bisect-dialog-sk) and Pinpoint try jobs (pinpoint-try-job-dialog-sk).commit-range-sk to show the range of commits associated with a point and provides direct links to the source repository.moveTo)The tooltip implements custom positioning logic instead of relying on standard CSS hover tooltips. This is necessary because it must stay within the viewport and the chart boundaries.
getBoundingClientRect() and automatically flips to the left of the cursor if it would overflow the right edge of the screen.load method?Rather than using many individual attributes, the module uses a comprehensive load() method. This decision ensures that all interrelated properties (anomaly data, commit info, color, and triage state) are updated atomically before a render is triggered. This prevents “flicker” where the tooltip might show an old anomaly‘s data with a new point’s coordinates.
The UI is highly reactive to the global window.perf configuration and the specific data passed to it:
user-issue-sk component to track non-anomaly bugs.| Component | Role within Tooltip |
|---|---|
commit-range-sk | Displays and links the range of revisions for the selected point. |
triage-menu-sk | Provides the UI for filing new bugs or associating the anomaly with an existing one. |
point-links-sk | Renders custom links based on the specific trace configuration (e.g., V8 or WebRTC specific dashboards). |
bisect-dialog-sk | A dialog triggered from the tooltip to start a performance bisection. |
json-source-sk | Displays the underlying data source when enabled via show_json_file_display. |
When a user interacts with a chart, the following process occurs to populate the tooltip:
Chart Event (Hover/Click) | v explore-simple-sk (or parent) calls .load(...) | +--> Update internal state (index, anomaly, commit_info) +--> Determine Trace Color +--> Configure sub-components (CommitRange, UserIssue) | v ._render() | +--> logic: Is this an anomaly? | |-- YES: Show anomalyTemplate() (Medians, Triage Menu) | +-- NO: Show user-issue-sk | +--> logic: Is always_show_commit_info true? |-- YES: Show Author/Message/Hash +-- NO: Hide commit info if it's a range
The tooltip facilitates the transition from “seeing a spike” to “taking action”:
Anomaly object to the tooltip.bug_id === 0), the triage-menu-sk appears.anomaly-changed event refreshes the display to show the new Bug ID.The cid module provides a centralized client-side interface for resolving internal commit identifiers—represented as CommitNumber types—into rich commit metadata.
In the Perf system, performance data is often indexed by a sequential CommitNumber (also known as an offset) to optimize storage and time-series lookups. However, for human readability and integration with version control systems, these numbers must be translated back into git hashes, timestamps, authors, and commit messages. This module abstracts that translation process.
The decision to use a dedicated RPC endpoint (/_/cid/) rather than embedding commit metadata directly into performance data streams is driven by bandwidth efficiency. Performance results often contain thousands of data points; including full commit details for every point would result in massive payloads. Instead, the UI receives lightweight CommitNumber integers and uses this module to batch-resolve only the specific commits needed for display (e.g., when hovering over a point in a chart or viewing a table of regressions).
The module is designed around batching. The lookupCids function accepts an array of CommitNumbers, allowing the frontend to resolve an entire range of commits in a single HTTP POST request. This minimizes network overhead and reduces latency when populating large data views.
cid.ts)The core functionality is encapsulated in the lookupCids function. It acts as the bridge between the frontend and the Perf backend's CID handler.
CommitNumber values./_/cid/ endpoint.CIDHandlerResponse object containing a commitSlice (an array of detailed commit objects) and an optional logEntry for debugging or context.The use of jsonOrThrow ensures that the calling code doesn't have to manually check HTTP status codes for common failure modes, streamlining error handling in the UI components that consume this data.
The following diagram illustrates how a UI component uses this module to transform raw data into a human-readable format:
+----------------+ +------------------+ +-----------------+ | UI Component | | CID Module | | Perf Backend | | (e.g. Chart) | | (cid.ts) | | (/_/cid/ RPC) | +-------+--------+ +--------+---------+ +--------+--------+ | | | | 1. Request Resolution | | | [101, 102, 105] | | +------------------------>| | | | 2. POST /_/cid/ | | | JSON([101, 102, 105]) | | +------------------------->| | | | | | 3. Return Metadata | | | (Hashes, Msgs, etc.) | | |<-------------------------+ | 4. Update View | | |<------------------------+ | | | | +-------+--------+ +--------+---------+ +--------+--------+
The module relies heavily on types defined in /perf/modules/json, specifically:
Commit objects containing the hash, author, timestamp, and message.The cluster-lastn-page-sk module provides a comprehensive interface for testing, dry-running, and saving performance alert configurations. It allows developers and performance engineers to “test drive” anomaly detection algorithms against historical data before committing them as active monitors.
At its core, this module acts as a sandbox for the Perf clustering and regression detection system. It enables users to define parameters for an alert (such as the algorithm, threshold, and data query), run that configuration against a specific range of commits, and inspect the resulting clusters to verify if the alert is too noisy or missing real regressions.
The module heavily relies on alert-config-sk for defining the detection logic.
stateReflector to synchronize the current alert configuration with the URL. This allows users to share a specific “dry-run” setup by simply copying the browser address./_/alert/update endpoint. It dynamically changes its UI (e.g., button labels) based on whether the user is creating a new alert or updating an existing one.The “Run” process is an asynchronous operation that leverages a progress-tracking API.
RegressionDetectionRequest to /_/dryrun/start, containing the alert configuration and the commit range (defined by domain-picker-sk).startRequest) to receive intermediate updates. This allows the UI to display real-time status messages and partial results.The module doesn't just list regressions; it provides deep-dive capabilities into the clusters found:
triage-status-sk within the results table to show the current status of detected anomalies.cluster-summary2-sk. This allows users to see the specific traces contributing to a cluster without navigating away from the dry-run page.open-keys event, the module can open the Explore page in a new tab, pre-populated with the specific trace keys and time range associated with a detected regression.User Configures Alert -> Clicks "Run" | V POST /_/dryrun/start (Alert Params + Domain) | +--<-- Polling /_/progress/ -> Updates Status UI | V Results Received -> Render Table (Commits x Clusters) | +-- Click Cluster -> Open Triage Dialog (Internal Inspection) | +-- Click "Accept" -> Save Alert to Database
The module uses a domain-picker-sk to define the “where” and “when” of the test. Users can specify:
The use of <dialog> elements for alert-config-sk and cluster-summary2-sk ensures that the main dry-run context (the results table and run settings) remains visible and persistent in the background while the user fine-tunes parameters or inspects specific data points.
The module distinguishes between transient request errors and configuration errors. Error messages from the dry-run process are captured and displayed within a dedicated <pre> block to preserve formatting (like stack traces or detailed engine logs), while authentication or persistence errors are routed through a global error-toast-sk.
The cluster-page-sk module provides a comprehensive interface for performing regression detection and trace clustering within the Skia Perf framework. It allows users to identify groups of performance traces that exhibit similar behavior—such as a coordinated step or shift—around a specific commit.
The primary purpose of this module is to give developers and performance engineers a way to “cluster” traces based on their statistical properties. Instead of looking at individual traces, users can identify patterns across hundreds or thousands of benchmarks.
The workflow typically involves:
Clustering is a computationally expensive operation that can take a significant amount of time. To prevent UI blocking and handle potential timeouts, the module uses a “start-and-poll” pattern.
/_/cluster/start.progress utility to poll for status updates.To support bookmarking and sharing of specific clustering configurations, the module uses stateReflector. Parameters such as the selected commit offset, the query string, algorithm choice ($K$, radius, etc.), and “interestingness” thresholds are automatically mirrored in the URL's hash.
The page is composed of several specialized sub-elements, each handling a distinct part of the clustering lifecycle:
commit-detail-picker-sk: Handles the complex logic of searching for and selecting a specific commit.algo-select-sk: Provides the UI for switching between different clustering strategies (like K-Means vs. Step-Fit).query-sk & query-count-sk: Allow users to filter the multi-million trace dataset down to a specific subset while getting immediate feedback on the number of matches.cluster-summary2-sk: Visualizes the output of the clustering engine, showing centroids and statistical summaries of the discovered groups.State class)The internal State class defines the schema for what makes a clustering request unique. This includes:
offset: The commit index to analyze.radius: How many commits before and after the offset to include in the window.k: The number of clusters to find (0 allows the server to auto-calculate).interesting: A threshold score; clusters with a regression score below this are ignored.sparse: A boolean flag to skip traces that lack data in the requested range.The start() method is the core logic driver. It gathers the current state into a RegressionDetectionRequest and manages the lifecycle of the network request.
[ User Input ] -> [ State Reflector ] -> [ URL Updated ]
|
[ Click "Run" ]
|
v
[ POST /_/cluster/start ] ----> [ Server starts Job ]
| |
|<-------[ Poll Progress ] <----+
| |
v |
[ Update Spinner ] <+
[ Show Messages ] <+
|
v
[ GET Final Results ] -> [ Map to cluster-summary2-sk ]
commitSelected: Listens for selection events from the commit picker to update the target offset.openKeys: When a user clicks on a specific cluster summary, this handler constructs a URL for the Explore page (/e/) using a shortcut to the traces in that cluster, allowing for deeper drill-down analysis.queryChanged: Dynamically updates the paramset-sk and triggers a re-count of matching traces to help the user gauge the scope of their request before running it.Once the results are returned, they are rendered as a list of cluster-summary2-sk elements. The module includes a sort-sk component that allows users to re-order these results based on:
The cluster-summary2-sk module provides a comprehensive UI component for visualizing and triaging performance regressions in the Perf system. It represents a “cluster” of traces that exhibit similar behavior (usually a step-up or step-down) at a specific point in time.
This component serves as a detailed view for an anomaly. It combines several data dimensions into a single interface:
A key design challenge in Perf is that different step detection algorithms (e.g., mannwhitneyu, cohen, percent) produce statistically different outputs. Rather than a generic “Value” label, cluster-summary2-sk uses a mapping strategy (labelsForStepDetection) to provide context-aware labels and formatting.
mannwhitneyu) but potentially insignificant for a “Percentage Change.”alert property. When an alert is set, it updates the internal labels object, changing UI strings (e.g., “p:” vs “Percentage Change:”) and the corresponding number formatters (percent vs decimal).The component uses a CSS Grid layout to manage a complex set of child elements, ensuring that critical information remains visible even as secondary tools (like the Word Cloud) are toggled.
+-------------------------------------------+ | [Regression Status Banner (High/Low)] | +----------------------+--------------------+ | [Stats Row] | [Triage Controls] | +----------------------+ | | [Google Chart Plot] | | +----------------------+--------------------+ | [Commit Detail Panel] | +-------------------------------------------+ | [Action Buttons: Dashboard / Word Cloud] | +-------------------------------------------+ | [Collapsible Word Cloud Area] | +-------------------------------------------+
The component consumes a FullSummary object, which is a composite of the ClusterSummary (the statistics) and the FrameResponse (the raw data for the plot).
centroid (an array of numbers) and the dataframe header (commit info) into a format suitable for plot-google-chart-sk. It also places an “x-bar” (vertical line) at the exact commit where the regression was detected.alogin-sk. If the user lacks the editor role, triage controls are visually disabled to prevent unauthorized state changes.cluster-summary2-sk.tsThe main logic engine. It handles:
triaged when a user updates the status and open-keys when a user wants to explore the cluster on the main Perf dashboard.lookupCids method to fetch commit metadata when a user clicks a point on the graph.notriage attribute to hide triage UI in read-only contexts.The component acts as a coordinator for several other specialized modules:
plot-google-chart-sk: Renders the trend line of the cluster's centroid.triage2-sk: Provides the dropdown/selection for anomaly status (Untriaged, Positive, Negative, etc.).word-cloud-sk: Visualizes the param_summaries2 data, helping users identify which dimensions (like arch or config) are most common in the cluster.commit-detail-panel-sk: Displays the git log/author information for the selected point in the regression.commit-range-sk: Allows users to inspect the range of commits around the regression point.When a developer identifies a regression, the following interaction occurs:
triage2-sk and optionally enters a message.User Click -> update() -> dispatchEvent('triaged', {columnHeader, triageStatus})If the summary is insufficient, the “View on Dashboard” button facilitates a deep dive: Click "View on Dashboard" -> openShortcut() -> dispatchEvent('open-keys', ...) This event carries a shortcut ID, which the Explorer page uses to reload the exact set of traces and time range represented by the cluster.
A Custom Element that displays a list of commits in a table format. Each entry in the table is rendered using a commit-detail-sk element, providing a consistent view of commit metadata across the Perf application.
The commit-detail-panel-sk acts as a container and controller for a collection of commit summaries. It is designed to be versatile, supporting both purely informational displays and interactive selection workflows (e.g., choosing a specific commit from a list associated with a performance anomaly).
Commit objects into a vertical list of detailed rows.selected attribute.commit-selected events for parent components.trace_id, down to individual commit-detail-sk children to ensure they have the necessary context for rendering or linking.The component uses a selectable boolean attribute to toggle between two distinct behaviors:
cursor: pointer to rows, and the component responds to user clicks by updating its state and broadcasting the selection.This dual-mode design allows the same component to be used in a read-only dashboard as well as in a “point-and-click” triage workflow.
The component implements a click listener on the top-level <table> rather than attaching individual listeners to every row. This is more efficient for large lists.
When a click occurs, it uses the findParent utility to locate the nearest TR element. This ensures that even if a user clicks a nested link or span inside the commit-detail-sk child, the panel correctly identifies which index in the details array was targeted.
The following diagram illustrates how a user interaction is converted into an application-level event:
User Click | v [Table Click Handler] --(Check selectable)--> [Exit if false] | | (findParent 'TR') v [Extract data-id] ---------------------------> [Update 'selected' attribute] | | v v [Construct Event Detail] [Trigger CSS :selected highlight] | (author, message, commit) v [Dispatch 'commit-selected']
The core logic of the element. It utilizes lit-html for efficient rendering.
details: The source array of Commit objects.selectable: Enables/disables interaction.selected: The index of the currently highlighted commit.hide: When true, prevents the list from rendering any rows, effectively clearing the view without losing the underlying data.trace_id: A string passed to children to provide context for the specific performance trace being inspected.Defines the visual state of the panel. It leverages CSS variables for theme support (light/dark mode).
tr[selected] selector.selectable state to provide a visual hint of whether the component is interactive.A Page Object (PO) implementation used for automated testing. It abstracts the DOM structure (tables, rows) into a set of asynchronous methods like clickRow(index) and getSelectedRow(), allowing tests to interact with the component at a functional level rather than a DOM level.
Fired when a user clicks a row while the selectable attribute is present.
Commit object.The commit-detail-picker-sk module provides a specialized UI component for searching and selecting a specific commit from a repository's history. It is designed to handle the discovery of commits by allowing users to browse within a configurable time window and view detailed information before making a selection.
The component acts as a high-level wrapper around three key functional areas: a trigger (a button showing the current selection), a search/filter interface (date range selection), and a results display (commit-detail-panel-sk).
The module implements a “modal picker” pattern to keep the main UI clean while providing a rich interface for selection when needed.
User Interaction State Management & Fetching Sub-components +--------------+ +---------------------------+ +-------------------+ | Click Button | ------>| Open <dialog> | | | +--------------+ | | | | | Fetch Commits (_/cidRange)| | | | (Filtered by Date Range) | | | +-------------+-------------+ | | | | | v | | +---------------------------+ | | | Update .details Property | ------>| commit-detail- | +---------------------------+ | panel-sk | | | | v | | +--------------+ +---------------------------+ | | | Select Commit| <------| Emit 'commit-selected' | <------| (Item Clicked) | +--------------+ | Close <dialog> | | | +---------------------------+ +-------------------+
commit-detail-picker-sk.ts: The core logic handler. It manages the state of the modal (open/closed), the current date range for searching, and the retrieval of commit data from the server./_/cidRange/ POST endpoint. It sends a RangeRequest containing a start and end timestamp and an optional offset. This allows the picker to populate its list based on user-defined time windows.selection property (a CommitNumber) is set externally, the component automatically triggers a fetch to ensure the details of that commit are loaded and displayed in the button label.commit-detail-panel-sk: Used within the dialog to render the list of commits. It handles the actual rendering of commit messages, authors, and hashes, and provides the selection mechanism within the list.day-range-sk: Provides the UI for the user to modify the search window. Changing the date range in this component triggers a new fetch in the picker to refresh the available commits.dialog (HTML5): Used to overlay the picker interface. This keeps the commit browsing experience contextual without navigating the user away from their current task.The component communicates the user's choice to the rest of the application via a custom event:
commit-selected: Emitted when a user clicks a commit in the panel. The detail of the event contains the selected index and commit information, following the structure defined by CommitDetailPanelSkCommitSelectedDetails.Date.now()). It fetches the commits for this range to populate the internal panel.day-range-sk to update the begin and end timestamps, which causes the picker to re-query the backend.The commit-detail-sk module provides a specialized web component for displaying concise information about a single Git commit within the Perf application. It serves as a navigational bridge, allowing users to move from a specific commit to various analysis views such as data exploration, clustering, or triage.
The element is designed to be a compact, action-oriented summary. It doesn‘t just display metadata; it contextualizes the commit based on the user’s current interaction state.
A key design choice is the conditional behavior of the Explore functionality. The component can navigate to one of two destinations depending on whether a trace_id is provided:
trace_id is present, the component assumes the user is interested in how that specific trace performed around the time of the commit. It automatically calculates a time window of +/- 4 days around the commit timestamp to provide immediate visual context in the Explore view.This is the core implementation file. It defines the CommitDetailSk class, which manages the following properties:
cid: A Commit object containing the hash, author, timestamp, message, and URL.trace_id: An optional string identifying a specific performance trace.The component uses Lit for rendering and follows a reactive pattern where updates to cid or trace_id trigger a re-render. It utilizes diffDate from the infra-sk library to display human-readable relative timestamps (e.g., “3 days ago”).
The component renders a set of standard actions, all of which open in a new browser tab:
/e/ (contextual) or /g/e/ (generic)./g/c/[hash], used for analyzing performance clusters associated with that commit./g/t/[hash], used for managing alerts or regressions at that point in time.The following diagram illustrates how the component determines the destination of the “Explore” button click:
User Clicks "Explore"
|
v
Is trace_id set?
/ \
[Yes] [No]
| |
| v
| Navigate to:
| /g/e/{commit_hash}
v
Calculate Time Range:
[ts - 4 days] to [ts + 4 days]
|
v
Construct Query Object:
{ keys: trace_id, begin, end, ... }
|
v
Navigate to:
/e/?{serialized_query}
The module includes commit-detail-sk.scss, which imports both standard color variables and Perf-specific themes. It supports a dark mode and ensures that links and buttons remain accessible and consistent with the broader Skia infrastructure design language. Material Web Components (md-outlined-button) are used for the action triggers to provide a consistent look and feel with other modern Skia modules.
The commit-range-sk module provides a custom element designed to display and link to specific commits or ranges of commits within a repository. It is primarily used in the Perf UI to help users navigate from a data point or a regression in a trace directly to the relevant code changes in a source control browser (e.g., Gitiles/Googlesource or GitHub).
The element is designed to be reactive and data-driven, relying on trace data and column headers to resolve human-readable commit numbers into machine-readable Git hashes.
window.perf.commit_range_url). This allows the same component to work across different repository hosting services by providing templates like .../range/{begin}/{end} or .../commit/{end}.cid (Commit ID) lookup service only when a link needs to be rendered.This is the core logic of the component. It manages the internal state of the link text and URL.
trace array and the commitIndex to find the “previous” valid commit. It skips over MISSING_DATA_SENTINEL values to ensure the user is directed to the full range of potential changes.{begin} and {end} placeholders in the configured URL. It also includes specific logic for “Googlesource” style URLs, converting range logs (+log/begin..end) into single commit views (+/end) when the range size is one.commit-range-changed event when a new link is successfully generated, allowing parent components (like tooltips or info panels) to react.The following diagram illustrates how the component transforms a user selection into a functional link:
User Selects Point (index N) | v Find Previous Valid Point (index M < N) in Trace | v Lookup Commit Numbers (Offsets) for M and N in Header | v [Network Request] -> lookupCids(OffsetM, OffsetN) | v Apply Hashes to window.perf.commit_range_url Template | v Render <a> link with text "OffsetM+1 - OffsetN" (if range) or "OffsetN" (if single commit)
Provides the Page Object for testing. It encapsulates the logic for retrieving the link's href and the displayed text, shielding tests from the underlying DOM structure (which alternates between an <a> tag when the URL is ready and a <span> while loading).
Contains mock data structures that simulate the header and trace objects produced by the Perf backend, ensuring consistent testing of the range-finding logic.
start_commit + 1 < end_commit. If they are adjacent, isRange() returns false, and the UI simplifies the display to a single hash or commit number.createRenderRoot() returning this) rather than Shadow DOM, which is a common pattern in this project for elements that need to inherit global styles or be easily accessible by parent tooltips.This module serves as the foundational utility layer for Skia Perf. It centralizes shared logic for data visualization, anomaly handling, keyboard interaction, and testing infrastructure.
The core responsibility of this module is to bridge the gap between raw backend trace data and the frontend charting engine (Google Charts).
The module handles the complex task of “transposing” trace data. Backend data typically arrives organized by trace keys, whereas charting libraries require data organized by rows (time/commit positions) with traces as columns.
plot-builder.ts: Contains the logic for this transformation. It supports different domains (commits, dates, or both) and handles missing data using a MISSING_DATA_SENTINEL. It also generates consistent color palettes for charts.plot-util.ts: Provides higher-level utilities to create ChartData objects, specifically managing the integration of anomaly markers into the data points so they can be rendered on the graph.To ensure that performance graphs remain readable when comparing similar builds, the module implements a deterministic coloring strategy:
ref or pgo builds). If a collision is detected in the color space, the variants are mathematically shifted to guaranteed distinct colors.Anomalies are a first-class citizen in this module. It provides types and formatting logic to present performance regressions or improvements to the user.
anomaly-data.ts: Defines the data structure for a point on a graph that represents an anomaly, including its coordinates and highlight state.anomaly.ts: Contains formatting logic for numeric changes (percentages) and human-readable links to bug trackers. It handles specialized bugId values like “Invalid” or “Ignored” alerts.To facilitate rapid “triage” workflows, the module implements a centralized shortcut system.
ShortcutRegistry: A singleton that manages categories of shortcuts (Triage, Navigation, Report, General).handleKeyboardShortcut: A global handler that maps physical key presses to specific method calls on components (e.g., onTriagePositive, onZoomIn), while intelligently ignoring events originating from input fields or textareas.buttons.scss: Defines a mixin (perf-button) that enforces a strict visual design system. It uses !important to ensure that Perf-specific buttons maintain their identity even when embedded in components with conflicting global styles or Shadow DOM boundaries.This module provides extensive infrastructure for both unit and integration testing.
test-util.ts: A comprehensive mock environment for demo pages and unit tests. It includes setUpExploreDemoEnv, which mocks the entire Perf backend API (anomalies, trace data, login status, and shortcut persistence) using fetch-mock.puppeteer-test-util.ts: Provides helper functions for E2E testing, such as polling for DOM states, waiting for Google Charts to finish rendering (waitForReady), and validating ParamSet selections.The following diagram illustrates the flow of data through the common module components:
[Raw TraceSet] ----> [plot-util.ts] <---- [Anomaly Map]
| |
| (Match anomalies to points)
| |
V V
[plot-builder.ts] <--- [ChartData]
|
(Transpose to Rows)
|
V
[Google Chart Engine]
| File | Responsibility |
|---|---|
anomaly.ts | Formatting and calculation utilities for performance anomalies. |
buttons.scss | Standardized visual styling for buttons across the application. |
graph-config.ts | Logic for managing graph state and generating persistent shortcut URLs. |
keyboard-shortcuts.ts | Central registry and event handler for application-wide hotkeys. |
plot-builder.ts | Logic for transposing dataframes and managing chart color palettes. |
plot-util.ts | High-level helpers for merging traces and anomalies into chartable data. |
test-util.ts | Backend API mocking and dummy data generation for development. |
puppeteer-test-util.ts | Synchronization and validation helpers for browser-based tests. |
/modules/const)The const module serves as a centralized source of truth for shared values used throughout the Perf UI. Its primary purpose is to ensure data consistency between the backend services (written in Go) and the frontend visualization layers, particularly regarding how incomplete or special data states are represented.
A significant challenge in performance monitoring is the representation of gaps in time-series data. The design of this module focuses on providing stable “sentinel” values that allow the UI to distinguish between valid data points and missing measurements without relying on non-standard JSON types.
The backend storage and processing layers (specifically //go/vec32/vec) utilize a specific float32 value to denote missing samples. Because the standard JSON specification does not support NaN or Infinity, the frontend must use a value that is:
float32.MISSING_DATA_SENTINEL (set to 1e32) satisfies these requirements. When the UI encounters this value within a trace, it interprets the point as a gap rather than a zero or a legitimate data spike, allowing graphing components to break lines or omit points appropriately.
For categorical data or metadata fields where a value might be absent or undefined, the module provides MISSING_VALUE_SENTINEL. Using a explicit string (__missing__) instead of an empty string or null prevents ambiguity during filtering and grouping operations, ensuring that “missing data” can be treated as its own distinct category in the UI.
| Constant | Purpose |
|---|---|
MISSING_DATA_SENTINEL | A numeric float used to mark holes in time-series traces. |
MISSING_VALUE_SENTINEL | A string used to represent the absence of a value in metadata or parameters. |
The following diagram illustrates how these constants act as a bridge between the raw data ingestion and the final visualization:
[ Backend Trace ] -> [ JSON Serialization ] -> [ UI Data Fetching ] -> [ Plotting Logic ] | | | | | | | | Uses MissingDataSentinel Converts to 1e32 Imports MISSING_DATA_ Detects 1e32 and (Go) (Valid JSON) SENTINEL (TS) renders a gap.
The csv module provides utilities for transforming performance data represented as a DataFrame into a Comma-Separated Values (CSV) format. This functionality is essential for allowing users to export trace data from the Perf system into spreadsheet software or external analysis tools.
The primary design goal of this module is to flatten high-dimensional trace data—which is structured as a collection of key-value pairs (parameters) and time-series arrays—into a two-dimensional grid. To achieve this, the module dynamically generates a schema based on the unique set of parameter keys present in the provided traces.
A challenge in converting DataFrame objects to CSV is that different traces may have different sets of parameters (e.g., one trace might have an os parameter while another has a config parameter).
To ensure a consistent grid structure:
DataFrame.header. Timestamps, which are stored internally as seconds, are converted to ISO 8601 strings to ensure they are human-readable and correctly interpreted by external tools.MISSING_DATA_SENTINEL values and converts them to empty strings in the CSV output to maintain the numeric integrity of the rest of the column.special_. These are internal synthetic traces (like averages or benchmarks) that do not conform to standard parameter schemas and would clutter the export.The CSV generation process follows a linear transformation path:
[ DataFrame ]
|
| 1. Extract Trace IDs
V
[ Trace IDs ] --> [ Parse to Params ] --> [ Identify & Sort Unique Keys ]
|
+--------------------------------------------+
|
V
[ Generate Header Row ] (Sorted Param Keys + ISO Timestamps)
|
V
[ Iterate Traces ]
|
+--> [ Map Params to Columns ] ----+
| |--> [ Concatenate Row ]
+--> [ Map Data Points to Columns ]--+
|
V
[ Final CSV String ]
dataFrameToCSV (index.ts): The main entry point. It orchestrates the collection of keys, the formatting of headers, and the row-by-row serialization of trace data.parseIdsIntoParams & allParamKeysSorted (index.ts): Helper functions that handle the translation between the serialized trace ID format and a structured, normalized set of columns. They rely on fromKey from the paramtools module to decompose the string-based identifiers.The data-service module provides a centralized, singleton-based interface for interacting with the Perf backend. It abstracts the complexities of HTTP communication, error handling, and long-running asynchronous operations into a clean API used by the frontend components.
The primary role of the DataService class is to act as the single source of truth for backend data fetching. By encapsulating all network logic, it ensures consistent headers, error reporting (via DataServiceError), and behavior across the application. It specifically manages:
data-service.ts)The main implementation follows the Singleton pattern, accessible via DataService.getInstance(). This ensures that shared configurations (like local development overrides) are consistent across all callers.
getShortcut, getDefaults, and shift wrap standard POST or GET requests. They use a private fetchJson helper that integrates with jsonOrThrow to standardize how the frontend handles malformed or failed responses.updateShortcut: Maps a complex GraphConfig array to a short ID.createShortcut: Maps a simple list of trace keys to an ID. These methods include logic to skip execution during local development if perf.disable_shortcut_update is set, preventing unnecessary 500 errors from unconfigured local proxies.The sendFrameRequest method handles one of the most complex workflows in the system: requesting data frames (collections of trace data for graphs). Because frame generation can be slow, the backend uses a “start-and-poll” pattern.
DataService leverages the progress module to manage this lifecycle:
onStart, onProgress, onMessage, onSettled) to allow UI components to update their loading states or progress bars.startRequest, which communicates with the /_/frame/start endpoint and waits for a “Finished” status.DataServiceError.The following diagram illustrates the lifecycle of a frame request through the DataService:
Component DataService Progress Module Backend | | | | |--sendFrameReq()-->| | | | |----startRequest()---->| | | | |--- /frame/start -->| | | | | (Processing) | | |<-- HTTP 200 (ID) --| | | | | | | [Loop] |--- Check Status -->| |<---onProgress()---| | | | |<--updateProgress()----|<-- SerializedMsg --| | | | | | | |--- Check Status -->| | | |<-- Status: Fin ----| |<-- FrameResponse--| | |
The module defines a specialized DataServiceError. This class extends the native Error but includes an optional status property (the HTTP status code). This allows calling components to distinguish between network-level failures (e.g., 404, 500) and application-level errors (e.g., failed data processing messages returned within a valid HTTP 200 response).
The dataframe module provides the core data structures and management logic for handling performance trace data in the Perf system. It manages the lifecycle of performance data—from fetching raw JSON responses to maintaining a local cache and transforming data for visualization.
The module centers around the concept of a DataFrame, which represents a set of performance traces (time series data) sharing a common horizontal axis (commits or timestamps). The primary goal of this module is to provide a consistent way to query, extend, merge, and visualize these traces along with their associated metadata, such as anomalies and user-reported issues.
index.ts)This file defines the fundamental logic for manipulating DataFrame objects. It is the TypeScript equivalent of the backend Go implementation.
join function allows two DataFrames to be combined into one. It handles cases where headers (commit ranges) overlap or differ by recalculating a unified header and padding missing data with a MISSING_DATA_SENTINEL.findSubDataframe and generateSubDataframe allow for extracting specific slices of data based on commit offsets or timestamps.AnomalyMap structures, ensuring that anomaly data from different requests are combined correctly for the same traces.dataframe_context.ts)The DataFrameRepository (implemented as <dataframe-repository-sk/>) acts as the state manager for performance data within a frontend application. It utilizes Lit context to provide data to descendant components.
extendRange method allows the UI to request more data (forward or backward in time) while maintaining the current ParamSet.dataframeContext: The raw DataFrame.dataTableContext: A google.visualization.DataTable prepared for google-chart.dataframeAnomalyContext: Current known anomalies.dataframeUserIssueContext: Buganizer issues associated with specific trace points.dataframeLoadingContext: A boolean flag indicating if a fetch operation is in progress.traceset.ts)Trace keys in Perf are typically comma-separated strings of key-value pairs (e.g., ,benchmark=JetStream2,bot=MacM1,). This component provides utilities to parse these keys for UI display.
norm()) from keys to ensure that data lookup for issues and anomalies remains consistent even if the data is being transformed for display.When a user requests to “extend” a chart, the following process occurs:
UI Action (e.g., "Scroll Left")
|
v
DataFrameRepository.extendRange(offset)
|
+--> calculate deltaRange (new time window)
+--> sliceRange into chunks (e.g., 30-day blocks)
+--> concurrent DataService.sendFrameRequest()
|
v
Receive multiple FrameResponses
|
+--> Sort responses by commit offset
+--> Merge ColumnHeaders (unified X-axis)
+--> Map old/new trace indices to unified header
+--> Pad missing points with SENTINEL
|
v
Update Lit Contexts
|
+--> dataframeContext (raw data)
+--> dataTableContext (Google Charts format)
+--> UI components re-render
The module handles the logic of turning complex trace keys into readable labels:
Trace A: ,benchmark=V8,test=Total,arch=arm, Trace B: ,benchmark=V8,test=Total,arch=x86, Logic: 1. Common: benchmark=V8, test=Total ==> Title: "V8/Total" 2. Unique: arch=arm vs arch=x86 ==> Legend: ["arm", "x86"]
ParamSet, extensions (paging through data) use that same set of parameters. To change the query itself, a full resetTraces is required. This simplifies the merging logic by ensuring the “vertical” dimension (the traces) remains relatively stable while the “horizontal” dimension (time) grows.DataFrame, the DataFrameRepository performs a centralized conversion to the Google Visualization DataTable format. This ensures that expensive data transformations happen once per update.TraceKey and then CommitPosition. This allows for efficient $O(1)$ lookups when rendering points on a chart, rather than iterating through lists of anomalies for every data point.The day-range-sk module provides a custom element for selecting a time range, defined by a beginning and an ending timestamp. It is designed to simplify date range selection in applications—such as performance monitoring dashboards—where users need to filter data within specific historical boundaries.
The module is built using Lit and extends the ElementSk base class. Its primary responsibility is to synchronize two separate date inputs and expose their combined state as a single range.
Unlike standard HTML date inputs that often use strings or millisecond-based timestamps, day-range-sk standardizes on seconds since the Unix epoch. This decision aligns with the data formats typically used in backend storage systems (like Prometheus or BigTable) within the Perf ecosystem, reducing the need for repeated conversions in the application logic.
The element acts as a composite wrapper around two calendar-input-sk elements:
The internal state is managed through the begin and end properties, which are mirrored to attributes. This mirroring allows the element to be initialized or manipulated via declarative HTML or imperative JavaScript.
When a user interacts with either of the internal calendars, the element processes the change and propagates it upward.
[ User Interface ] [ day-range-sk ] [ Consumer/App ] | | | |--- Change Begin Date --->| | | |--- Calculate Seconds ---->| | |--- Update 'begin' Attr -->| | |--- Dispatch Event --------| | | (day-range-change) |
calendar-input-sk components.@input event from the calendar.Date object from the calendar is converted into a floor-rounded second-based timestamp.day-range-change event is dispatched, containing both the updated and the stationary timestamp in its detail object.The element is designed to be “ready to use” immediately upon being added to the DOM.
begin or end attributes are not provided, the component defaults to a 24-hour window ending at the current time._upgradeProperty calls in the connectedCallback, ensuring that if the properties were set on the DOM element before the custom element definition was loaded, those values are correctly captured and reflected.observedAttributes, the element automatically re-renders whenever the begin or end attributes are changed externally, ensuring the visual calendar inputs always match the underlying data.This is the primary event emitted by the module. It bubbles up the DOM, allowing parent components to listen for any changes to the range.
The detail property of the event implements the DayRangeSkChangeDetail interface:
{ begin: number; // Seconds since epoch end: number; // Seconds since epoch }
The element's layout is controlled via day-range-sk.scss, which ensures the labels and inputs are displayed as block elements with consistent spacing. It integrates with the global theme system by using CSS variables like --on-surface and --surface for the input borders and backgrounds, supporting both light and dark modes.
The domain-picker-sk module provides a specialized UI component for selecting time and data ranges, specifically tailored for the Skia Perf ecosystem. Its primary purpose is to define the “domain” (the X-axis) of a performance data request, allowing users to switch between absolute time ranges and relative commit counts.
The component provides two distinct modes for querying data, known as the request_type:
The component manages its internal state via a DomainPickerState object. This interface acts as the contract between the picker and its parent application (typically a dashboard or query page).
calendar-input-sk for human-readable interaction.state getter/setter, facilitating easy integration with URL-backed state management or “Reset” buttons.force_request_typeAn architectural choice was made to allow parent components to “lock” the picker into a specific mode using the force_request_type attribute.
range or dense, the component hides the radio-sk selection buttons entirely.domain-picker-sk.ts: The core logic. It utilizes Lit for templating and manages the conditional rendering logic that swaps between the “Begin Date” picker (Range mode) and the “Points” numeric input (Dense mode).domain-picker-sk.scss: Defines the layout, ensuring that the calendar-input-sk and various labels align correctly. It uses standard element-sk variables to support theme switching (light/dark mode).calendar-input-sk: Rather than implementing date logic itself, the picker delegates date selection to this specialized sub-component, maintaining a consistent date-picking experience across the infra.The following diagram illustrates how the component resolves its output state based on user interaction or attribute overrides:
User Input / State Set | v +-----------------------+ YES +-------------------------+ | force_request_type? |-------------->| Override request_type | +-----------+-----------+ | (Hide Radio Buttons) | | NO +------------+------------+ v | +-----------------------+ | | Render Radio Buttons | | | (Range vs Dense) | | +-----------+-----------+ | | | +-------------------+-+------------------+ | v +-----------------------+ | Render Common "End" | | Calendar Input | +-----------+-----------+ | +---------------+---------------+ | | [request_type: RANGE] [request_type: DENSE] | | +---------+---------+ +---------+---------+ | Render "Begin" | | Render "Points" | | Calendar Input | | Numeric Input | +-------------------+ +-------------------+
The state property manages the following object:
| Field | Type | Description |
|---|---|---|
begin | number | Unix timestamp (seconds) for start of range. |
end | number | Unix timestamp (seconds) for end of range. |
num_commits | number | Count of points to retrieve (used in Dense mode). |
request_type | number | 0 for Range, 1 for Dense. |
The errorMessage module provides a standardized utility for surfacing application errors to the user and, optionally, tracking those errors through telemetry. It acts as a wrapper around the core elements-sk error messaging system, tailored for the specific requirements of the Perf application.
The primary goal of this module is to ensure that critical errors are not missed by the user while maintaining observability for developers.
0. This forces the error message to remain visible until the user manually dismisses it, ensuring that transient network failures or complex logic errors are acknowledged.index.ts)The module exports two primary functions for handling errors:
errorMessage: A simplified wrapper that dispatches an error-sk event. This event is typically caught by a global <error-toast-sk> element (or similar) to display a notification. Its main contribution is overriding the default display duration to infinite (0).errorMessageWithTelemetry: Extends the standard error notification by incrementing a metric counter before showing the UI toast. It accepts a TelemetryErrorOptions object to categorize the error by source (e.g., a specific API endpoint) and error code (e.g., “404” or “500”).The telemetry functionality is designed to categorize errors using a specific metric source (defined by CountMetric). This allows the team to create dashboards based on the “source” and “errorCode” labels, providing a clear picture of application health.
The following diagram illustrates how an error propagates from a functional call through to the UI and the monitoring backend:
[ Function Call ] | V [ errorMessageWithTelemetry(msg, dur, options) ] | +----( If options.countMetricSource exists )----> [ telemetry.increaseCounter ] | | | V | [ External Metrics System ] | +-----------------------------------------------> [ elementsErrorMessage ] | V [ Dispatch "error-sk" event ] | V [ UI Toast Component ]
When using errorMessageWithTelemetry, the source field in TelemetryErrorOptions should be specific enough to identify the feature or component failing, while errorCode should represent the category of failure. If these are omitted, they default to 'default' and '500' respectively.
Both functions handle various message formats—including strings, objects with a message property, and raw Response objects—consistent with the underlying elements-sk implementation.
The existing-bug-dialog-sk module provides a modal dialog designed to associate performance anomalies (alerts) with an existing bug in a tracking system (e.g., Monorail/Issues). It is a key component of the Perf triage workflow, allowing users to consolidate multiple related regressions under a single bug ID.
When a user identifies a performance regression, they may want to link it to an ongoing investigation rather than filing a new bug. This module manages the UI for inputting a Bug ID, selecting the target project, and viewing other bugs already associated with the same group of anomalies.
The primary responsibility of the module is to communicate with the triage backend to establish a link between anomaly keys and a bug ID.
/_/triage/associate_alerts.anomalies data to reflect the new association and dispatches an anomaly-changed event. This event ensures that other UI components (like charts or lists) stay in sync without requiring a full page reload.To help users avoid duplicating work, the dialog can fetch and display a list of bugs already associated with the anomalies being triaged.
/_/anomalies/group_report. If the backend returns a “SID” (Session ID), the component handles the additional fetch step to resolve the full list of anomalies in that group./_/triage/list_issues to fetch the human-readable titles of these bugs, providing better context for the user before they commit the association.createRenderRoot() { return this; } to render directly into the custom element, allowing it to leverage global Perf themes and styles (SASS) defined in the project.spinner-sk and disables the submit button during active network requests to prevent duplicate submissions and provide visual feedback.The core logic of the dialog. It manages:
_projectId, _busy status, and the bugIdTitleMap (linking bug IDs to their descriptive titles).anomalies and traceNames as properties. These are used to construct the payload for the triage backend.anomaly-changed event to notify the rest of the application of state changes.Defines the layout of the dialog, ensuring it occupies a reasonable portion of the screen (25% width) and handles long lists of associated bugs via a scrollable container (#associated-bugs-table).
existing-bug-dialog-sk_po.ts)Provides an abstraction for testing the component. It encapsulates the selectors for the input fields and buttons, allowing Puppeteer and Karma tests to interact with the dialog without being brittle to internal DOM changes.
User Actions Component Logic Backend API -------------------------------------------------------------------------------- Open Dialog -----> Fetch Associated Bugs -----> /_/anomalies/group_report | v Fetch Bug Titles -----> /_/triage/list_issues | Render List + Form | Input Bug ID -----> | | Click Submit -----> Validate & Send -----> /_/triage/associate_alerts | Update Local State | Dispatch 'anomaly-changed' | Close Dialog
existing-bug-dialog-sk-demo.ts: Provides a mocked environment to test the dialog's behavior and layout in isolation.test_data.ts: Contains sample Anomaly objects used for both documentation and testing.The explore-multi-sk module provides a comprehensive interface for visual data exploration in Perf, allowing users to view and interact with multiple graphs simultaneously. It acts as an orchestrator for multiple explore-simple-sk instances, synchronizing their states (such as time ranges and X-axis scaling) to facilitate comparative analysis across different data dimensions.
The module serves two primary exploration modes:
The module utilizes stateReflector to persist the exploration state in the URL. To keep URLs manageable and logic simple, explore-multi-sk only tracks properties necessary to reconstruct the graphs.
begin/end timestamps in the URL. If missing, it falls back to a dayRange (e.g., “last 7 days”). If both are missing, it uses global defaults.shortcut ID. This ID maps to a collection of graph configurations in the backend database, allowing complex multi-graph layouts to be shared via a short link.The module ensures a unified experience across multiple internal elements through event-driven synchronization:
range-changing-in-multi and selection-changing-in-multi events trigger an update across all other graphs.domain state for all instances, ensuring the X-axis remains comparable.localStorage.To prevent browser performance degradation when loading dozens of graphs (e.g., splitting by a parameter with many values), the module implements Chunked Loading:
[User Clicks Plot]
|
V
[Calculate Groups] -> (e.g., 20 different OS values)
|
V
[Load Chunk 1] ----> (Load first 5 graphs, request range data)
|
[Load Chunk 2] ----> (Load next 5 graphs)
|
...
|
[Final Load] ------> (Fetch extended range data for all graphs in one batch)
This approach allows the UI to become interactive incrementally while minimizing the total number of expensive backend requests for historical data.
explore-multi-sk.ts: The core logic coordinator. It handles the State object, manages the lifecycle of explore-simple-sk elements, and implements the splitting/merging logic.explore-multi-sk.scss: Provides layout styling, ensuring that graphs are sized appropriately (e.g., shrinking height when multiple graphs are displayed) and handling the visibility of UI components like the Test Picker.explore-multi-sk_po.ts: A Page Object for Puppeteer testing, providing a clean API to interact with the multi-graph container and its children during integration tests.test-picker-sk: The module heavily relies on the Test Picker for selecting data. In “Split Mode”, the picker's state is used to determine how to partition the data into individual charts.When a user selects a “Split By” parameter (e.g., os):
explore-simple-sk instances are created for each group.Data can be removed in two ways:
TraceSet, updates the internal data models, and tells the relevant graphs to re-render without the specific trace.explore-simple-sk instance and updates the URL shortcut. In Split Mode, removing the last trace of a graph typically results in the removal of the entire graph instance.The explore-simple-sk module provides a comprehensive data exploration interface for the Perf tool. It serves as the primary component for querying, visualizing, and triaging performance traces, allowing users to interact with large datasets through charts, tables, and integrated triage tools.
explore-simple-sk is designed to be a versatile “explorer” that can operate in multiple modes (plotting, pivot tables, or simple querying). It manages the state of a data exploration session—including the time range, active queries, formulas, and selected data points—and reflects this state in the browser URL for shareability.
The module acts as a coordinator for several specialized sub-components:
DataFrameRepository to manage the underlying trace data and anomaly maps.plot-google-chart-sk for the main interactive graph and plot-summary-sk for long-range navigation.query-sk and pivot-query-sk to allow users to filter data.chart-tooltip-sk that facilitates bug filing, anomaly nudging, and bisection.The module utilizes a State class to track all parameters of the current view (e.g., begin, end, queries, formulas, domain).
state_changed event mechanism and a useBrowserURL method to sync internal variables with URL search parameters.To maintain performance when exploring large datasets, the module implements incremental fetching.
DataFrame in memory.Users can toggle the X-axis between “Commit Position” and “Date.”
The main class responsible for the lifecycle of the explorer.
addFromQueryOrFormula) and managing the response display mode (Graph vs. Pivot Table).A specialized utility for handling anomaly “nudges.”
MISSING_DATA_SENTINEL values) and calculates a list of NudgeEntry objects. This ensures that when an anomaly is moved, it always lands on a commit that actually contains data for that specific trace.A Page Object (PO) implementation for Puppeteer testing.
The following diagram illustrates how a user query is transformed into a visual plot:
User Input (Query/Formula) | V addFromQueryOrFormula() ----> Validates Query | V requestFrame() -------------> DataService (Backend API) | | |<-----------------------------| V UpdateWithFrameResponse() | +-----> DataFrameRepository (Stores & Merges Data) | +-----> plot-google-chart-sk (Renders Main Graph) | +-----> plot-summary-sk (Renders Navigation Bar) | +-----> paramset-sk (Updates Metadata Panel)
When a user interacts with a data point on the chart:
Chart Click Event | V onChartSelect() | +-----> enableTooltip() | +-----> Fetches Commit Links (Gitiles/Issue Tracker) | +-----> nudge-util (Calculates Nudge Steps) | +-----> Displays chart-tooltip-sk | +---[File Bug]---> NewBugDialog +---[Nudge]------> Update Anomaly Map +---[Bisect]-----> BisectDialog
The module uses a flexible layout defined in explore-simple-sk.scss that adapts based on the displayMode (e.g., .display_query_only, .display_plot). It uses CSS classes to hide/show components like the spinner, the pivot table, or the plot summary based on the current operation, ensuring a clean UI regardless of the data being explored.
The explore-sk module provides the primary entry point and high-level container for the Skia Perf data exploration interface. It acts as an orchestrator that integrates several complex sub-components—most notably the core graphing engine and the test selection tools—into a unified user experience.
The purpose of explore-sk is to provide a cohesive environment where users can query performance data, visualize traces, and interact with the resulting graphs. While the actual plotting and state management logic reside in sub-modules, explore-sk handles the high-level layout, global event routing, and the initialization of environment-specific defaults.
The module is structured as a custom element (explore-sk.ts) that manages the lifecycle and communication between several key pieces:
<explore-simple-sk>): This is the “heavy lifter” of the module. It handles the actual data fetching, state management for queries, and the rendering of the performance charts. explore-sk acts as its parent, passing down configuration and reflecting its state to the URL.<test-picker-sk>): A specialized UI component for building queries. It allows users to select specific parameters (like architecture, config, or test name) from dropdowns or autocomplete fields. explore-sk dynamically initializes this component based on the backend's configuration.stateReflector to ensure that the complex state of the exploration (selected traces, time ranges, etc.) is synchronized with the browser's URL. This allows users to share specific views or bookmarks of their performance analysis.alogin-sk to determine the user's login status. This is used to conditionally enable features like “Favorites,” which require a user identity to persist data.Instead of implementing plotting and querying logic directly, explore-sk serves as a thin wrapper. This design allows explore-simple-sk to remain focused on the core data/charting logic, while explore-sk manages the layout and the integration of optional UI elements like the test-picker-sk.
The module implements logic to switch between different querying interfaces. It checks for backend defaults and local storage flags (like v2_ui) to decide whether to show the traditional query dialog or the newer test-picker-sk. This allows for a staged rollout of new UI features without breaking the core exploration workflow.
To provide a consistent “app-like” feel, explore-sk captures global keyboard events (like the ? key for help) and delegates them to the appropriate child component (explore-simple-sk). This ensures that shortcuts work regardless of which sub-element currently has focus.
When the element is attached to the DOM, it follows a specific sequence to configure the environment:
[ explore-sk ]
|
|-- 1. Fetch /_/defaults/ --------> [ Backend ]
| | |
| <------- JSON Config --------|
|
|-- 2. Check Auth Status ---------> [ alogin-sk ]
| | |
| <------- Login Status -------|
|
|-- 3. Initialize State ----------> [ stateReflector ]
| | |
| <------- URL Params ---------|
|
|-- 4. Setup TestPicker (if enabled)
|
'-- 5. Pass state & defaults to [ explore-simple-sk ]
When a user interacts with the test-picker-sk, the communication flows through events:
test-picker-sk and clicks “Plot”.test-picker-sk emits a plot-button-clicked event.explore-sk catches this event, extracts the query string from the picker, and calls the addFromQueryOrFormula method on explore-simple-sk.explore-simple-sk fetches the data and updates the chart.If a user is looking at a graph and wants to refine their query based on a specific trace:
explore-sk receives the trace key.test-picker-sk to update its fields to match that specific trace, allowing the user to easily pivot their search.The extra-links-sk module provides a specialized custom element designed to display a curated list of external resources, documentation, or related tools. It serves as a dynamic landing area or sidebar within the Perf application, allowing administrators to surface relevant links that might otherwise be buried in external documentation sites.
The module is built on the principle of configuration-driven UI. Rather than hardcoding links or managing them through complex state management within the element itself, it leverages the global environment to determine its content.
The element relies on the window.perf.extra_links configuration object. This design choice decouples the UI component from the backend API calls. By assuming that the global window.perf object (typically populated at page load or via a global configuration fetch) contains the necessary metadata, the element remains lightweight and reactive to the environment it is placed in.
Using lit, the element implements a declarative template that handles two primary states:
window.perf.extra_links is populated, it renders a structured table featuring link titles and descriptions.This file defines the ExtraLinksSk class, which extends ElementSk. Its primary responsibility is the lifecycle management and rendering of the link table.
ExtraLink objects (containing text, href, and description) into a tabular format.connectedCallback), ensuring that the links are visible as soon as the component is attached.The styling is scoped to the extra-links-sk element. It utilizes CSS variables (like --primary and --on-surface) to maintain theme consistency with the rest of the application. The layout uses border-collapse: separate and specific padding to ensure the links are easily readable and touch-friendly.
The following diagram illustrates how data flows from the global configuration into the rendered component:
[ Global Scope ] [ extra-links-sk ] [ Browser DOM ]
| | |
| 1. Set window.perf. ---|------------------------------> |
| extra_links = {...} | |
| | |
| | 2. connectedCallback() |
| | <---------------------------- |
| | |
| | 3. Read window.perf |
| | loop through links |
| | |
| | 4. Generate HTML Table |
| | -----------------------------> |
| | |
The component expects the configuration to follow this structure within the global window.perf object:
The favorites-dialog-sk module provides a modal dialog designed for creating and editing user “favorites” within the Perf application. It encapsulates a form for capturing a name, description, and URL, and handles the asynchronous communication with the backend API to persist these changes.
The module is built as a LitElement and utilizes the native HTML <dialog> element for modal behavior. The design focuses on a Promise-based workflow for the calling component, allowing the parent to react differently depending on how the dialog was closed.
Instead of relying on external events to communicate success, the open() method returns a Promise. This allows the caller to await the user's action:
The component distinguishes between “create” and “edit” modes based on the presence of a favId.
favId is empty, the component defaults the URL to the current window location and targets the /_/favorites/new endpoint.favId is provided, the component populates the fields with existing data and targets the /_/favorites/edit endpoint.The following diagram illustrates the interaction between the UI and the backend:
[Parent Component] [favorites-dialog-sk] [Backend API] | | | |---- .open(id, name) ----->| | | |-- (User edits fields) | | | | | |---- Click "Save" ------------>| | | POST /_/favorites/ | | |<------- 200 OK / Error -------| | | | |<--- Resolve / Reject -----| |
This is the core logic of the module.
errorMessage toast if validation fails.updatingFavorite state to toggle a <spinner-sk> and disable action buttons while a network request is in flight.nextUniqueId counter to ensure that HTML id and for attributes are unique across multiple instances on the same page, maintaining accessibility and correct label-to-input binding.Defines the visual presentation using the Perf theme variables. It handles the layout of the form elements, specifically positioning the close icon and styling the input fields to occupy the standard modal width (500px for inputs).
Provides a reference implementation for how to trigger the dialog for both “New” and “Edit” scenarios. It demonstrates the use of the open() method and how to pass initial parameters.
The module interacts with the following endpoints:
POST /_/favorites/new: Used when creating a new favorite. The body includes name, description, and url.POST /_/favorites/edit: Used when updating existing favorites. The body includes the original id along with the updated fields.Errors from the API are captured and displayed to the user via the errorMessage utility, while the dialog remains open to allow the user to correct the issue or try again.
The favorites-sk module provides a specialized dashboard interface for managing and viewing bookmarked links within the Perf application. It distinguishes between global system-wide favorites and user-specific links, allowing for personal organization of performance data views.
The module is built around a centralized configuration fetched from the backend. The primary design goal is to provide a unified view where users can see pre-configured links (such as project-wide dashboards) alongside their own curated list of performance traces or search queries.
Upon mounting (connectedCallback), the element fetches the favorites configuration from /_/favorites/. This configuration drives the entire UI. The module uses an “optimistic-style” refresh pattern: whenever a change occurs (like a deletion or an edit), the component re-fetches the entire configuration to ensure the UI is synchronized with the server's state.
The implementation applies different business rules based on the section name:
favorites-dialog-sk to facilitate complex editing of link metadata (names, descriptions, and URLs).The deletion process includes a safety check to prevent accidental data loss:
[User Clicks Delete]
|
v
[Browser Confirm Dialog] --(Cancel)--> [Abort]
|
(OK)
v
[POST to /_/favorites/delete]
|
[Success?] --(No)--> [Show Error Message]
|
(Yes)
v
[Re-fetch /_/favorites/]
|
[Re-render]
favorites-sk.ts: The core logic container. It manages the state of the favoritesConfig and handles the asynchronous interactions with the backend API. It uses lit for templating, dynamically generating tables based on the presence of user-specific links.favorites-dialog-sk: While imported from a sibling module, it is a critical dependency for this module's “Edit” workflow. favorites-sk acts as the orchestrator, passing existing link data into this dialog and waiting for a resolution to refresh the view.favorites-sk.scss: Defines the layout for the favorites tables. It uses a spacious design with border-spacing and specific styling for primary links to ensure the dashboard remains readable even with a high density of saved traces.The module relies on the ElementSk base class for standard component lifecycle management. For user interactions:
.open() method, passing the id, text, description, and href.jsonOrThrow and errorMessage utilities to handle network failures gracefully, ensuring that server-side errors are surfaced to the user via a consistent UI toast/notification system.The gemini-side-panel-sk module provides a slide-out interface for interacting with a Gemini-powered AI assistant. It is designed as a persistent UI overlay that can be integrated into any page to provide contextual help or a general-purpose chat interface without navigating away from the current view.
The module is implemented as a Lit element, leveraging reactive properties to manage the chat state and visibility.
Slide-out Transition The panel is positioned using position: fixed with a negative right offset. This design choice allows the panel to exist in the DOM but remain hidden off-screen until activated. By toggling the open attribute, the CSS transitions the right property to 0, providing a smooth visual entry. This approach is preferred over display: none because it allows for CSS-driven animations and ensures the element's internal state remains preserved while hidden.
State Management and UI Feedback The element manages three primary pieces of state:
messages: An array of chat objects. This acts as the single source of truth for the conversation history.isLoading: A boolean that controls the visibility of a <spinner-sk>. This provides immediate visual feedback to the user during network latency.input: A string tracked via the live() directive. Using live() ensures that the input field remains synchronized with the internal state even if the DOM is updated externally or during rapid typing.API Interaction The component communicates with a backend via a POST request to /_/chat. It sends the user‘s query as a JSON body and expects a JSON response containing the assistant’s reply. The implementation includes robust error handling that captures both HTTP error codes (e.g., 500) and network-level failures, surfacing these errors directly in the chat history to keep the user informed.
GeminiSidePanelSk (gemini-side-panel-sk.ts) This is the core logic and UI controller. It encapsulates the styling, the chat history log, and the input footer. It exposes a public toggle() method and an open property/attribute, allowing parent components or global scripts to programmatically control its visibility.
Chat History Log The history is rendered as a list of message bubbles. The implementation distinguishes between user and model roles using CSS classes to align messages to the right or left, respectively. It uses aria-live="polite" on the history container to ensure that screen readers announce new incoming messages from the AI assistant.
Input Handling The footer contains a text input and a send button. To optimize user experience, the component listens for the Enter key on the input field, allowing for a standard messaging flow. The input is automatically cleared and focused upon a successful message submission.
The following diagram illustrates the data flow when a user sends a message:
[ User Input ] --> [ Update 'messages' (User) ] --> [ Set 'isLoading' = true ] | | | V | [ POST /_/chat ] | | V V [ Clear Input ] <--- [ Update 'messages' (Model) ] <--- [ Receive JSON Response ] | V [ Set 'isLoading' = false ]
The module includes two layers of testing:
gemini-side-panel-sk_test.ts): These tests use fetch-mock to simulate backend responses. They verify the internal logic, such as ensuring empty messages aren't sent, verifying that the input clears after sending, and checking that error messages are correctly appended to the history.gemini-side-panel-sk_puppeteer_test.ts): These tests focus on the visual and behavioral aspects, such as confirming the CSS transitions move the panel the correct number of pixels and verifying that the Shadow DOM elements (input, icons) are accessible and interactive.The graph-title-sk module provides a specialized header component designed for performance graphs. It dynamically translates a set of metadata (key-value pairs) into a structured, readable title, handling the complexity of displaying many parameters without cluttering the UI.
The primary purpose of this component is to provide context for a graph. Since performance data often involves many dimensions (e.g., bot name, benchmark, test, subtest, configuration), a simple string is insufficient. The component is designed to:
This is the core custom element, implemented using Lit.
set(titleEntries: Map<string, string> | null, numTraces: number) method. This approach allows the parent component to push data updates efficiently.title attribute on values, allowing users to hover over truncated text to see the full value.showShortTitle) to toggle between a collapsed view (limited by MAX_PARAMS, currently 8) and a full view.numTraces > 0 but the titleEntries map is empty, it renders a generic <h1> header indicating the number of traces.The styling uses a flexbox-based grid system.
#container uses flex-wrap: wrap, ensuring that if the title is too long for the horizontal space, it flows naturally into subsequent rows..param) styled smaller and lighter above the bolded value (.hover-to-show-text).The workflow for updating the title typically involves a parent graph-container or dashboard page:
[ Parent Component ]
|
| 1. Gathers metadata (e.g., from a trace ID or API)
| 2. Calls .set(map, count)
V
[ graph-title-sk ]
|
| 3. Checks numTraces (if 0, hide container)
| 4. Filters empty entries
| 5. Truncates list if > MAX_PARAMS
V
[ Rendered HTML ]
When the metadata exceeds the limit, the component provides an interactive expansion:
[ User Clicks "Show Full Title" ]
|
V
[ showFullTitle() ] sets showShortTitle = false
|
V
[ render() ] re-runs getTitleHtml() without the MAX_PARAMS limit
|
V
[ UI Updates ] All columns are revealed; button disappears
The module includes a Page Object (graph-title-sk_po.ts) to simplify integration and end-to-end testing. This PO abstracts the internal structure (selectors for params, values, and the “show more” button), allowing tests to verify title content without being brittle to changes in the internal DOM structure.
This module serves as the central repository for shared TypeScript type definitions and interfaces used across the Perf application. It acts as the “Source of Truth” for the data structures exchanged between the Go backend and the TypeScript frontend.
The primary goal of this module is to ensure type safety and consistency across the network boundary. Instead of manually maintaining duplicate type definitions in both Go and TypeScript, this module contains an automatically generated index.ts file. This file reflects the structures defined in the backend, providing a robust contract for API requests, responses, and internal data processing.
The module also implements Nominal Typing for primitive types to prevent logical errors (e.g., accidentally using a TimestampSeconds where a CommitNumber is expected), even though both are represented as numbers at runtime.
The module defines the fundamental entities of the Perf system:
DataFrame, TraceSet, and Trace represent the time-series data fetched for visualization. A DataFrame contains the actual values, the headers (commits/timestamps), and the paramset describing the metadata.Anomaly and Regression define the shape of detected performance changes, including statistical metadata (median before/after, p-value) and triage status.Alert interface defines the configuration for regression detection, including the query to monitor and the algorithm parameters used.FrameRequest and FrameResponse encapsulate the complex parameters needed to query the performance database and the resulting data structure used to render plots.To improve type safety, the module uses a “branding” pattern for common primitives. This forces developers to explicitly cast or use constructor functions when assigning values to these types, ensuring that the developer has consciously verified the data source.
Value (number) -> Constructor Function -> Branded Type (CommitNumber) | +--> Logic error if passed to TimestampSeconds function
Key branded types include:
CommitNumber: Represents an offset in the commit history.TimestampSeconds: Represents a Unix timestamp.Params and ParamSet: Specific dictionary shapes for metadata.Certain domains are grouped into namespaces to reflect their specific context within the application:
pivot: Definitions related to the “Pivot Table” functionality, including operations like sum, avg, and count.progress: Interfaces for long-running backend tasks that provide status updates (e.g., Running, Finished).ingest: Data formats for the file ingestion pipeline, defining how measurement results are structured before being stored.The index.ts file is marked with DO NOT EDIT. This choice ensures that the frontend types are never out of sync with the backend. Changes to the data contract must be initiated in the Go code and propagated here via the generation tool (e.g., go2ts).
Interfaces are used for complex objects (like Alert or Anomaly) to allow for potential extension and to provide clearer error messages in IDEs. Type aliases are reserved for unions (like Status or ClusterAlgo) and the aforementioned branded nominal types.
The interfaces strictly define which fields are optional (?) and which can be null. This forces frontend components to handle missing data explicitly, reducing runtime TypeError exceptions when processing API responses.
The json-source-sk module provides a specialized UI component for the Perf application that allows developers and analysts to inspect the raw JSON data associated with a specific data point in a performance trace. It acts as a bridge between high-level trace visualizations and the underlying ingested source files.
The primary responsibility of this module is to fetch and display the original JSON metadata and results for a given trace at a specific commit. Because performance traces can be backed by large amounts of data, the module provides options to view either the full ingested file or a “short” version (typically excluding voluminous results) to improve load times and readability.
The component remains hidden by default and only reveals its controls when a valid traceid and cid (Commit ID) are provided, ensuring it only occupies screen space when actionable data is available.
json-source-sk.ts)The core custom element. It manages the state of the retrieved JSON, the visibility of the modal dialog, and the communication with the backend.
_cid and _traceid. When either property is updated via setters, the internal JSON cache is cleared, and the component re-renders. This ensures that the user never sees stale data from a previous trace point._loadSourceImpl method encapsulates the logic for interacting with the /_/details/ endpoint. It uses a POST request containing the commit and trace identifiers. It also handles the results=false query parameter when the “Short” view is requested.spinner-sk to indicate background loading activity and uses the errorMessage utility to bubble up fetch failures to the application's global error reporting system.The component uses a <dialog> element for displaying the JSON content. This choice allows the JSON to be viewed in an overlay, preserving the user's context in the main performance graph or table.
results=false flag, useful for inspecting metadata without the overhead of every individual measurement.<pre> block for formatted JSON display and a sticky close button for easy navigation.The following diagram illustrates the lifecycle of a data request within the component:
User Interaction JSONSourceSk Component Backend Server | | | |-- Click "View Json" ------->| | | |-- Show Spinner | | | | | |-- POST /_/details/ -------->| | | {cid, traceid} | | | | | |<--------- JSON Response ----| | | | | |-- Hide Spinner | | |-- Format JSON string | | <--- Open Modal Dialog -----| | | with <pre> content | |
validKey(traceid). This prevents the component from attempting to fetch data using malformed or incomplete trace identifiers.JSON.stringify(json, null, ' ') before display. This ensures that regardless of the wire format (which is often minified), the user sees a human-readable, indented structure.closeJsonDialog, the internal _json string is cleared. This is a memory management choice to avoid keeping potentially large strings in the DOM when they are not actively being viewed.spinner-sk dimensions and use a flexbox layout for controls to maintain a compact footprint within the Perf UI toolbars.The module includes a Page Object (JsonSourceSkPO) located in json-source-sk_po.ts. This encapsulates the internal DOM structure (selectors for buttons, the dialog, and the pre-formatted text), allowing Puppeteer and Karma tests to interact with the component without being brittle to internal HTML changes. This is particularly important for testing the modal's visibility and the content of the fetch results.
The keyboard-shortcuts-help-sk module provides a standardized UI component for displaying available keyboard shortcuts to the user. It functions as a discovery mechanism, ensuring that keyboard-driven workflows are accessible and documented within the application interface itself.
The core design decision behind this module is to decouple the definition of shortcuts from their presentation. Instead of hard-coding a list of keys into a help dialog, this component acts as a consumer of the ShortcutRegistry (from perf/modules/common:keyboard-shortcuts_ts_lib).
This approach ensures that:
KeyboardShortcutHandler. If a shortcut is associated with a specific method that is not present on the current handler, it is hidden from the user, preventing confusion about unavailable actions.The KeyboardShortcutsHelpSk class is a Lit-based custom element that wraps a Material Design dialog (md-dialog). Its primary responsibilities include:
ShortcutRegistry to retrieve all registered shortcuts, grouped by category.handler property. When rendering, it iterates through registered shortcuts and checks if the handler actually implements the method associated with that shortcut. This ensures the help menu is relevant to the user's current context (e.g., different shortcuts for a graph view versus a table view).The following diagram illustrates how the component retrieves and filters data for display:
[ ShortcutRegistry ] <------- (1) Request Shortcuts
|
v
[ KeyboardShortcutsHelpSk ] <--- (2) Check 'handler' property
|
|--- (3) For each Shortcut:
| IF (shortcut.method exists AND handler lacks method)
| THEN: Skip
| ELSE: Add to Render List
|
v
[ md-dialog Content ] <------- (4) Render Table Rows
keyboard-shortcuts-help-sk.ts: Contains the logic for the Lit element, including the filtering logic and the open()/close() API for controlling the dialog programmatically.keyboard-shortcuts-help-sk.scss: Defines the layout for the shortcut table, ensuring consistent spacing and visual cues for keys and categories using the application's theme variables.keyboard-shortcuts-help-sk_test.ts: Validates that the component correctly pulls data from the ShortcutRegistry and renders the expected HTML structure.The new-bug-dialog-sk module provides a specialized modal dialog for the Perf triage workflow. It allows users to file Buganizer issues for one or more detected anomalies (performance regressions or improvements) directly from the Perf UI.
When a sheriff or developer identifies an untriaged anomaly in a performance chart, they need a streamlined way to report it. This module automates the boilerplate of bug creation by pre-populating fields based on the selected anomaly's metadata, such as the test path, the magnitude of the change, and the affected revision range.
The dialog is designed to minimize manual data entry. It implements logic to parse Anomaly objects and automatically generate:
The dialog supports filing a single bug for a collection of anomalies. This is common when a single underlying commit causes regressions across multiple related metrics. The implementation handles this by:
onMousedown, onMouseMove, onMouseUp) to allow users to move the dialog. This is helpful if the user needs to see the underlying chart data while filling out the bug report.<dialog> (#loading-popup) is used to provide visual feedback during the asynchronous fetch request to the backend.LoggedIn() to identify the current user and automatically adds them to the CC list.This is the primary logic hub. It manages the internal state of the form and interacts with the Perf backend.
getBugTitle(), getPercentChangeForAnomaly(), and getSuiteNameForAlert() contain the business logic for translating raw anomaly data into human-readable bug reports, mimicking legacy Chromeperf behavior.fileNewBug() method gathers data from the form (including dynamically generated checkboxes and radios), sends a POST request to /_/triage/file_bug, and processes the response.anomaly-changed event. This event notifies other components (like charts or lists) that the anomaly's bug_id has been updated and they should re-render to reflect the triaged status.[ User Clicks 'File Bug' ]
|
v
[ open() called: Fetch login status, show modal ]
|
v
[ UI populates Title, Labels, Components from Anomalies ]
|
[ User adjusts form & clicks 'Submit' ]
|
v
[ fileNewBug() ]----------------------> [ Server: /_/triage/file_bug ]
| |
[ Show Loading Popup ] [ Create Buganizer Issue ]
| |
[ Receive Bug ID ] <------------------------------'
|
v
[ 1. Update local Anomaly objects with Bug ID ]
[ 2. Dispatch 'anomaly-changed' event ]
[ 3. Open https://issues.chromium.org/issues/{ID} ]
[ 4. Close Dialogs ]
The styling ensures the dialog fits the Perf theme. It uses a flexible layout for the textarea and ensures the closeIcon is pinned to the top-right for easy dismissal.
Provides a Page Object for automated testing. This encapsulates the selectors for the title, description, assignee, and CC inputs, allowing Puppeteer tests to interact with the dialog without being brittle to internal DOM changes.
The paramtools module provides a suite of utility functions for manipulating and transforming “Structured Keys,” Params, and ParamSets. It acts as a client-side mirror of the Go implementation found in /infra/go/paramtools, enabling the frontend to handle Perf trace identifiers and query parameters consistently with the backend.
The module is designed around the concept of a Structured Key: a string representation of key-value pairs used to uniquely identify data traces (e.g., ,arch=x86,config=8888,os=linux,).
The implementation prioritizes:
Params) and external formats like URL query strings.{ "os": "linux" }). Represents a specific point or trace.{ "os": ["linux", "windows"] }). Represents a collection of possible values for various keys.The module provides logic to move between string identifiers and structured objects:
makeKey: Converts a Params object into a canonical structured key. It sorts the keys alphabetically and wraps the result in leading and trailing commas to ensure unambiguous matching.fromKey: Parses a structured key back into a Params object. It includes logic to strip away “Special Functions” (like norm()) that might wrap a key during calculation phases.validKey: A simple validator that checks for the standard ,key=value, format, primarily used to distinguish between raw trace IDs and calculated traces.Functions in this category handle the merging and expansion of parameter collections:
addParamsToParamSet: Merges a single Params instance into an existing ParamSet. This is useful when building a global index of available dimensions from a list of specific traces.addParamSet: Merges two ParamSet objects, ensuring that values remain unique within each key.paramsToParamSet: A convenience function to lift a single Params object into the ParamSet type.queryFromKey: Converts a structured key directly into a URL-encoded query string (e.g., a=1&b=2). This is essential for synchronization between the application state (trace keys) and the browser's URL for deep-linking.This workflow illustrates how a trace identifier from the backend is prepared for use in a frontend search query.
Structured Key: ",arch=arm,os=android,"
|
v
[ fromKey() ] --> Params: { arch: "arm", os: "android" }
|
v
[ queryFromKey() ] -> String: "arch=arm&os=android"
This workflow shows how individual trace IDs are aggregated to populate a user interface with all available filtering options.
Trace A: ",config=565," Trace B: ",config=888,"
| |
+----------+----------+
|
v
[ addParamsToParamSet() ]
|
v
ParamSet: { config: ["565", "888"] }
|
v
(Used to render dropdown menus)
The perf-scaffold-sk module provides the foundational layout and shell for all Skia Performance Monitoring (Perf) web pages. It serves as a master template, providing consistent navigation, branding, error handling, and a unified look-and-feel across the application.
This module defines the PerfScaffoldSk custom element, which acts as a wrapper for every page in the Perf application. Its primary responsibilities include:
alogin-sk (authentication), theme-chooser-sk (dark/light mode), and error-toast-sk (global error notifications).window.perf object to customize the UI per instance.The scaffold currently supports two distinct UI layouts: Legacy UI and V2 UI. The implementation allows for a phased transition between styles, controlled by both global configuration and user preference.
The choice of layout is determined at render time based on the following hierarchy:
localStorage under the key v2_ui.window.perf.enable_v2_ui boolean provided by the server.Users can manually toggle between these versions via a “Try V2 UI” button in the Legacy sidebar or a “Back to Legacy UI” button in the V2 header. This toggle action updates localStorage and triggers a page reload to re-initialize the scaffold.
+-----------------------------------------------------------+ | perf-scaffold-sk | | +-------------------------------------------------------+ | | | app-sk (Legacy or V2) | | | | +---------------------------------------------------+ | | | | | header (Top Bar) | | | | | | - Logo & Title | | | | | | - Auth & Theme Chooser | | | | | +---------------------------------------------------+ | | | | | aside (Sidebar - Legacy) OR nav (Header - V2) | | | | | | - Links (Explore, Alerts, Triage, etc.) | | | | | +---------------------------------------------------+ | | | | | main (perf-content) | | | | | | - User-provided child content injected here | | | | | +---------------------------------------------------+ | | | | | gemini-side-panel-sk (V2 Only) | | | | | +---------------------------------------------------+ | | | | | footer | | | | | | - error-toast-sk | | | | | | - Build/Version tags | | | | | +---------------------------------------------------+ | | | +-------------------------------------------------------+ | +-----------------------------------------------------------+
perf-scaffold-sk.ts)The core logic resides in PerfScaffoldSk. It manages the lifecycle of the application shell. A key feature is the content redistribution process:
#perf-content container within the <main> tag.sidebar_help and moves them into a specialized help area (a sidebar section in Legacy, or a dropdown menu in V2).perf-scaffold-sk.scss)The styles utilize CSS Grid and Flexbox to create responsive layouts.
aside#sidebar).The scaffold displays the current application version in the footer. It intelligently formats the version string:
getBuildTag() from the window module.window.perfThe scaffold is highly data-driven, relying on the window.perf object for:
header_image_url: Custom instance logos (with a fallback to an “alpine” logo).instance_name / instance_url: Displaying the instance identity.chat_url / feedback_url: Linking to support channels.show_triage_link: Conditionally hiding the Triage navigation item.perf-scaffold-sk.ts: The main TypeScript definition for the Lit-based custom element, containing the template logic for both UI versions.perf-scaffold-sk.scss: Theme-aware styles that define the grid layouts for both Legacy and V2 shells.perf-scaffold-sk-demo.ts & perf-scaffold-sk-v2-demo.ts: Demo entry points that mock the window.perf configuration to showcase the scaffold's capabilities in various states.perf-scaffold-sk_puppeteer_test.ts: Integration tests ensuring that layout transitions, version rendering, and content redistribution work as expected.The picker-field-sk module provides a specialized multi-selection component designed for choosing values from a pre-defined list. It wraps a Vaadin multi-select combo box with additional logic for bulk selection and data organization, specifically tailored for complex filtering workflows (such as performance test pickers).
The primary goal of picker-field-sk is to simplify the management of large sets of options while providing visual cues and high-level controls for common selection patterns.
Rather than being a generic text field, it addresses specific needs of hierarchical or categorized data:
index property).The underlying selection mechanism is handled by @vaadin/multi-select-combo-box. This provides the chip-based UI for selected items and the searchable dropdown. The module styles this component to integrate with the local theme, including specific “dark mode” transitions.
The component features a set of checkbox-sk elements located in a “split-by-container” above the main field. These controls appear based on the following logic:
index > 0). It allows selecting the entire list or resetting to the first item.split-by-changed event. This is used by parent components to decide if a visualization should be broken down by the attribute represented by this field.The internal state is managed through several private properties that trigger re-renders or logic updates:
options: Setting this property automatically triggers the filtering of primaryOptions and recalculates the dropdown width.selectedItems: Controls which chips are currently visible.index: Determines whether the bulk action checkboxes should be visible.The following diagram illustrates how user interaction flows through the component to notify the rest of the application:
User Interaction picker-field-sk External App +----------------+ +-------------------+ +------------------+ | Click "All" |------>| Update _selected | | | | Checkbox | | Items & Render | | | +----------------+ +---------+---------+ +------------------+ | v +----------------+ +-------------------+ +------------------+ | Select Item in |------>| onValueChanged() |------>| Listen for | | Dropdown | | | | 'value-changed' | +----------------+ +---------+---------+ +------------------+ | v +----------------+ +-------------------+ +------------------+ | Toggle "Split" |------>| splitOnValue() |------>| Listen for | | Checkbox | | | | 'split-by-changed'| +----------------+ +-------------------+ +------------------+
The component uses a vertical flex layout where the label and selection checkboxes sit atop the combo box.
ch (character) units to ensure the dropdown menu scales with the content length.direction: rtl to handle long strings gracefully within the limited horizontal space of the input field.//perf/modules/themes, utilizing CSS variables for background colors, focus states, and transitions to ensure a consistent look across different UI modes.The module includes a Page Object (PickerFieldSkPO) located in picker-field-sk_po.ts. This encapsulates the complexity of interacting with the Shadow DOM of both the picker-field-sk and the underlying Vaadin components. It provides high-level methods for:
The pinpoint-try-job-dialog-sk module provides a modal dialog designed to trigger Pinpoint A/B “Try jobs.” In the context of the Perf application, its primary purpose is to allow developers to request additional performance traces for specific benchmark runs to debug regressions or verify improvements.
This module is specifically tailored for the “Debug Traces” use case. It acts as a bridge between the Perf UI and the Pinpoint performance analysis system. Rather than being a general-purpose Pinpoint job creator, it focuses on taking existing performance data contexts—such as a specific trace found in a chart—and prepopulating a request to gather more detailed diagnostic information (e.g., Chrome trace categories).
Key design decisions include:
setTryJobInputParams, which extracts necessary metadata (bot, benchmark, story) from a “test path” string commonly used in Perf.toplevel, toplevel.flow, etc.) but allows users to override them to gather specific category data.pinpoint-try-job-dialog-sk.ts)The main class extending ElementSk. It manages the internal state of the dialog, including the commit hashes, story names, and the resulting Pinpoint job URL.
alogin-sk to identify the current user. This is crucial as Pinpoint requires a user email to associate with the created job.baseCommit, endCommit, and testPath. The testPath is specifically parsed during the submission process to identify the configuration (bot) and benchmark.postTryJob method handles the transformation of UI fields into a TryJobCreateRequest. It maps the user's input into the specific JSON structure expected by the / _ / try / endpoint.The dialog is rendered using lit-html and styled to match the Perf theme. It utilizes standard HTMLDialogElement functionality for modal behavior and includes a spinner-sk to provide visual feedback during the asynchronous submission process.
The typical lifecycle of the dialog involves an external component passing in performance context before the user interacts with the form.
External Component Dialog Component Pinpoint API
| | |
|-- setTryJobInputParams ->| |
| (commits, testPath) | |
| | |
|------- open() -------->| |
| | (User modifies args) |
| | |
| |------- POST /_/try/ ------>|
| | |
| |<------ { jobUrl } ---------|
| | |
| |-- Updates UI with Link ----|
When a user submits the form, the module performs a specific mapping from the human-readable “test path” to the Pinpoint API fields:
master/linux-perf/blink_perf.ext/test_case is split.configuration (linux-perf).benchmark (blink_perf.ext).traceArgs input is wrapped into a JSON string under --extra-chrome-categories and passed within extra_test_args.Tracing Debug on <config>/<benchmark>/<story>.pinpoint-try-job-dialog-sk.ts: Contains the logic for form validation, API interaction, and the Lit template.pinpoint-try-job-dialog-sk.scss: Defines the layout, specifically ensuring the dialog handles long input strings (like commit hashes and trace arguments) gracefully.pinpoint-try-job-dialog-sk_test.ts: Validates that the form correctly parses the input parameters and that the fetch request sent to the backend contains the expected payload structure.The pivot-query-sk module provides a specialized UI component for configuring data transformation requests, specifically for pivoting and aggregating performance trace data. It allows users to define how data should be grouped, what primary mathematical operation to perform on those groups, and which additional summary statistics should be calculated.
The primary purpose of this component is to build and edit a pivot.Request object. This object is used by the Perf backend to reshape time-series data into a tabular or grouped format. The component provides a high-level interface for three specific pivoting dimensions:
ParamSet) should be used to cluster traces together.The component requires a ParamSet to populate the “Group By” options. A key design choice in pivot-query-sk.ts is the handling of the intersection between the current pivot.Request and the provided ParamSet.
The allGroupByOptions() method merges keys from the current request with those in the ParamSet. This ensures that if a user loads a saved pivot request containing keys that are not present in the current data's ParamSet, the selection is preserved rather than silently dropped. This “additive” approach prevents data loss when switching between different data contexts.
Because the component uses ARIA attributes (aria-labelledby) to maintain accessibility, it implements a uniqueId system. Each instance of pivot-query-sk on a page increments a static counter. This ensures that internal element IDs (like group_by-0, group_by-1) remain unique, preventing label collisions when multiple query builders are present in the same view.
The component acts as a controlled input. It exposes a pivotRequest getter that utilizes validatePivotRequest from ../pivotutil. If the current internal state is invalid, the getter returns null. This forces consuming components to handle invalid states gracefully before attempting to dispatch a backend request.
The main class (pivot-query-sk.ts) manages the state and rendering logic. It uses lit-html for templating and leverages existing elements like multi-select-sk and select-sk to handle the heavy lifting of UI interactions.
_pivotRequest and _paramset. Any change to these properties via setters triggers a re-render.pivot-changed event containing the updated (and potentially null, if invalid) pivot.Request.The following diagram illustrates how data flows from user interaction to a valid request:
User Action (Click/Select)
|
v
Internal Event Handler (e.g., groupByChanged)
|
+--> Updates internal _pivotRequest
|
+--> Validation Check (via pivotutil)
|
v
Dispatches "pivot-changed" Event
|
+--> Parent component receives pivot.Request OR null
pivot-query-sk_po.ts: Provides a Page Object (PO) for testing. It abstracts the complexity of interacting with multiple multi-select-sk elements, allowing tests to select options by text content rather than implementation details.pivot-query-sk.scss: Handles the layout, ensuring that the selection lists are presented in a flexible, readable grid with scrollable areas for large ParamSet keys.pivot-table-sk is a custom element designed to display aggregated Performance (Perf) data in a tabular format. While traditional DataFrames in the Perf system are often visualized as time-series plots, this module handles cases where data has been “pivoted” and summarized into discrete values (e.g., averages, sums, or standard deviations).
The element transforms complex trace data—where keys are comma-separated parameter strings—into a human-readable table. It allows users to explore multi-dimensional data by grouping by specific parameters and viewing various statistical summaries side-by-side.
The core challenge the module solves is translating a DataFrame (optimized for storage and plotting) into a grid.
,arch=x86,config=8888,). The module uses keyValuesFromTraceSet to parse these strings and extract only the values corresponding to the group_by parameters requested by the user.pivot.Request. The “Key” columns (parameters) appear first, followed by the “Summary” columns (statistical operations).Rather than a simple per-column sort, pivot-table-sk implements a Sort History mechanism via the SortHistory and SortSelection classes. This approach mimics spreadsheet behavior:
dk2-us1). This allows the sorting state to be reflected in URL parameters, making table views shareable and bookmarkable.Because the component relies on a specific relationship between the DataFrame and the pivot.Request, it includes a validation layer. It uses validateAsPivotTable to ensure the incoming request is compatible with a tabular display (e.g., checking that the necessary grouping and summary fields are present) before attempting to render.
The main custom element. It manages the lifecycle of the data, reacting to changes in the df (DataFrame) and req (Request) properties. It uses lit for efficient rendering and manages internal state for the sortHistory and the resulting compare function used by the JavaScript native Array.sort().
These classes encapsulate the multi-column sorting logic.
SortSelection handles the metadata for a single column: its index, its type (keyValues vs summaryValues), and its direction.SortHistory manages an array of selections. It provides the buildCompare method, which generates a complex comparison function that iterates through the history stack until a non-zero comparison result is found.PivotTableSkPO)Located in pivot-table-sk_po.ts, this provides an abstraction for testing. It allows internal tests (Puppeteer) to interact with the table (clicking headers, reading cell values) without being coupled to the specific DOM structure or CSS classes.
The following diagram illustrates how data flows into the component and results in a sorted display:
[ DataFrame ] + [ pivot.Request ] | | v v +-----------------------------+ | willUpdate() | | 1. Extract KeyValues | <--- Maps Trace IDs to grouped params | 2. Init/Update SortHistory | <--- Restores state from encoded string | 3. Build Compare Function | +-----------------------------+ | v +-----------------------------+ | render() | | 1. Validate Request | | 2. Sort Keys via CompareFn | | 3. Generate <table> rows | +-----------------------------+ | +--> [ User Clicks Header ] --+ ^ | | v +----------- [ Emit 'change' event ] [ Re-run willUpdate() ]
detail contains the serialized SortHistory string, allowing parent components to sync the UI state with the application URL.The pivotutil module provides a set of client-side utilities designed to facilitate the configuration and validation of pivot operations within the Perf system. Its primary role is to bridge the gap between raw pivot request data structures defined in the backend and the user interface, ensuring that pivot configurations are both semantically valid and human-readable before being processed.
Pivot operations in Perf allow users to transform multi-dimensional trace data into aggregated summaries or reorganized table views. Because a pivot request involves several dependent parameters—such as grouping keys, summary operations, and aggregation methods—it is prone to configuration errors that could lead to empty results or server-side failures.
This module centralizes the logic for:
geo, avg) to user-friendly labels (e.g., “Geometric Mean”, “Mean”) for consistent display across the UI.pivot.Request objects to ensure they contain the minimum necessary information to be actionable.The module distinguishes between a “valid pivot request” and a “valid pivot table.” This distinction is necessary because the Perf system supports different ways of visualizing pivoted data:
group_by field. Without grouping, there is no dimension along which to pivot the data.summary operation. The validateAsPivotTable function enforces this stricter requirement, ensuring that the UI does not attempt to render an empty summary table when the user has only defined groupings.The operationDescriptions map serves as the single source of truth for how pivot operations are presented to the user. By centralizing these strings in pivotutil, the system ensures that different UI components (such as dropdowns, table headers, or chart legends) remain consistent in their terminology.
index.tsThis is the core of the module. It exports the validation functions and the description mappings used by UI components to interpret pivot.Request objects.
operationDescriptions: A lookup table mapping pivot.Operation types to their display names. It covers standard statistical aggregations like sum, mean (arithmetic and geometric), standard deviation, count, and extrema.validatePivotRequest: Checks for the existence of the request and ensures the group_by array is populated.validateAsPivotTable: Extends the basic validation by verifying that the summary field is also populated, which is a prerequisite for generating a statistical summary table.The typical workflow for utilizing this module involves a UI component gathering user input and validating it before dispatching a network request or updating a visualization.
[ User Input ] ----> [ pivotutil: validatePivotRequest ]
|
+----------------+----------------+
| |
[ Returns Error Msg ] [ Logic Proceeds ]
| |
[ UI displays Alert ] [ pivotutil: validateAsPivotTable ]
|
+------------------+------------------+
| |
[ Returns Error Msg ] [ Success ]
| |
[ UI hides Table View ] [ UI renders Pivot Table ]
plot-google-chart-skThe plot-google-chart-sk module provides a high-performance, interactive charting component built on top of the Google Visualization API (google-chart). It is specifically designed to handle time-series data, anomalies, and user-defined issues within the Perf framework.
Beyond simple data visualization, this module implements specialized interaction modes—such as panning, delta-Y calculations, and dual-axis zooming—to support deep analysis of performance regressions and improvements.
Rendering thousands of data points alongside complex icons (anomalies, regressions, bug icons) directly within the Google Chart SVG can lead to significant performance degradation during interactions like panning or resizing.
google-chart renders the lines, while anomalies and user issues are rendered as absolute-positioned HTML overlays in separate div containers (.anomaly, .userissue).The module leverages @lit/context to synchronize state across a complex hierarchy of components without prop-drilling.
dataTableContext and dataframeAnomalyContext to reactively update the view when the underlying performance data changes.traceColorMapContext. This ensures that if a trace is assigned “Blue” in the chart, the same color is used in the side-panel-sk (legend) and any associated tooltips.The module distinguishes between three primary mouse navigation modes to avoid UI clutter:
v-resizable-box-sk to measure the vertical distance (raw and percentage) between two points on the Y-axis.drag-to-zoom-box-sk to select a specific sub-region for zooming. This supports both horizontal and vertical zooming depending on the global isHorizontalZoom state.plot-google-chart-sk.tsThe primary element that orchestrates the charting logic.
google.visualization.DataTable, handles the “Domain” toggle (switching between Commit Position and Date), and coordinates the positioning of overlays.google.visualization.DataView to filter which traces are currently visible based on user selections in the side panel.side-panel-sk.tsA collapsible legend and control interface.
test, arch, etc.).v-resizable-box-sk.tsA specialized selection box for vertical measurements.
drag-to-zoom-box-sk.tsA transparent selection rectangle.
viewWindow.Data Update -> willUpdate() -> updateDataView() | V Create google.visualization.DataView | V Assign Colors to Traces | V updateOptions() (Scale/Axis) | V plot.redraw() -> onChartReady() | V drawAnomaly() & drawUserIssues()
Because the overlays are standard HTML elements, the module frequently translates “Data Values” (Commits/Dates/Values) into “Pixel Coordinates” using the Google Chart Layout Interface:
[Data Value: Commit 1234] | V [Chart Layout Interface] -> getXLocation(1234) | V [CSS Absolute Position] -> element.style.left = `${x}px`
selection-changed: Dispatched when the user finishes panning or zooming, providing the new range and domain.plot-data-select: Dispatched when a specific data point is clicked, returning the tableRow and tableCol.side-panel-toggle: Dispatched when the legend panel is opened or closed.The plot-summary-sk module provides a high-level “bird's-eye view” of performance data. It is designed to act as a navigation and overview tool for large datasets, allowing users to see trends across a wide time or commit range and select specific sub-sections to investigate in more detail.
This component renders a simplified area chart of performance traces using Google Charts. Its primary purpose is to facilitate range selection. Unlike a primary data plot, it focuses on performance and visual density rather than granular data point interaction.
It solves the problem of “information overload” when dealing with thousands of data points by implementing automatic downsampling and providing a specialized UI for horizontal range manipulation.
When the input DataTable contains a large number of rows (exceeding 1000), the component automatically applies a Min-Max bucketing algorithm.
The selection logic is separated into a sub-component called h-resizable-box-sk.
h-resizable-box-sk overlays the chart and translates raw pixel coordinates from mouse events into relative percentage-based ranges. plot-summary-sk then converts these relative positions into domain-specific values (timestamps or commit offsets) using the Google Chart ChartLayoutInterface.To ensure visual consistency between the summary plot and the main detail plots (e.g., plot-google-chart-sk), this module uses a shared utility (getTraceColor) to assign colors based on the trace name. This allows a user to identify the same trace across different UI components by color alone.
The main controller for the summary view.
DataTable objects via Lit context (dataTableContext) and converts them into a DataView optimized for the summary (filtering columns based on selectedTrace).DataFrameRepository to extend the available data range.A specialized UI primitive for horizontal range selection.
draw (creating a new selection), drag (moving an existing selection), left (resizing the start), and right (resizing the end).clamp utility to ensure the selection box never leaves the boundaries of the parent container and maintains a minWidth to prevent the selection from becoming unclickable.The following diagram illustrates how a user interaction is transformed into a system-wide range update:
User Action (Mouse) -> [h-resizable-box-sk]
|
(Pixel Range)
|
v
[plot-summary-sk]
|
(Convert Pixels via ChartLayout)
|
v
[summary_selected Event]
|
(Contains: {begin, end, domain})
When the underlying data changes, the component goes through the following lifecycle:
data, selectedTrace, or domain is updated.updateDataView checks row count; if > 1000, buckets are created.google-chart-ready, the h-resizable-box-sk is repositioned to match the cachedSelectedValueRange, as the axis scaling might have changed.detail contains a range object with begin and end values in the current domain (UNIX timestamp for dates, or integer offset for commits).This module provides a specialized UI component, PointLinksSk, designed to display context-sensitive links associated with specific data points in Perf. These links are typically sourced from ingestion files and represent metadata such as commit hashes, build logs, or trace artifacts.
The primary purpose of point-links-sk is to bridge the gap between a raw data point and the external systems that provide more context about it. It doesn't just list static URLs; it dynamically calculates ranges and cleans up metadata keys to present a user-friendly interface for navigating between performance results and source control or build systems.
A significant feature of this module is its ability to compare the selected commit with the previous commit to generate “diff” or “log” links.
+log/start..end syntax) to show all commits in that range.googlesource.com (using isRange) to determine if a range actually contains multiple commits, ensuring the UI text accurately reflects whether the user is looking at a single change or a list.The component fetches point-specific metadata through internal API endpoints. It implements a fallback strategy to ensure reliability:
/_/details/?results=false./_/links/ endpoint.CommitLinks as an argument to its load method. This allows the caller to provide cached data, preventing redundant network requests when switching back and forth between points.The module handles several platform-specific quirks to maintain a consistent UI:
[Build Log](url)), extracting the URL for proper anchor tag generation.The following diagram illustrates the process when a user selects a point and load() is called:
User selects point | v [load(commit, prev_commit, trace_id, ...)] | +--> Check cache? --(Found)--> Render cached links | | | (Missing) | v +--> Fetch current point links (/_/details/ or /_/links/) | | +--> Fetch previous point links (if range keys requested) | | +--> Compare hashes | |-- Same: Create direct commit link | '-- Different: Create +log/ range link | | +--> Filter for "Useful Links" (Build logs, etc.) | | +--> Normalize keys and extract URLs (e.g., Fuchsia regex) | | '--> [ _render() ] Update Lit-html template
load(...)This is the main entry point for the component. It triggers the data fetching and comparison logic. It returns the updated list of CommitLinks (including any newly fetched data) so the parent component can maintain an up-to-date cache.
displayUrls: A map of human-readable keys to their calculated destination URLs.displayTexts: A map of keys to the text that should appear inside the link (e.g., the short hash range f052b8c4 - 47f420e8).commitPosition: Tracks the current commit number being inspected to ensure the UI stays synchronized with the user's selection.AbortController to cancel pending network requests if a user rapidly clicks different points, preventing race conditions where old data might overwrite new data.until directive to show “Loading...” placeholders for individual link rows while asynchronous range validation (isRange) is performed.The progress module provides a standardized mechanism for triggering, monitoring, and retrieving results from long-running server-side tasks. It abstracts the complexity of asynchronous polling into a lifecycle-aware utility, allowing the frontend to handle heavy operations (like database queries or complex data processing) without blocking the main UI thread or timing out on a single HTTP request.
The module is designed around a state-machine approach where the server dictates the flow of the operation. Rather than the client guessing when a task is finished, the server provides a SerializedProgress object containing the current status and a URL for the next update.
The core function, startRequest, manages the following transition logic:
POST request to a starting URL with a JSON body.status field in the response.Running, it schedules a subsequent GET request to the URL provided in the previous response after a configurable interval.Running (e.g., Finished), it resolves the promise with the final data.This design decouples the UI from the specific endpoint of the task; as long as the server follows the progress.SerializedProgress schema, the client can follow the task through any number of intermediate steps or URL changes.
progress.ts)The primary entry point is startRequest. It is built to be flexible through a RequestOptions object, which provides hooks into the various stages of the request lifecycle:
onStart: Useful for updating UI state (e.g., showing a spinner) before the first network call is made.onProgressUpdate: Triggered every time the server returns a response while the task is still Running. This allows for real-time progress bars or status message updates.onSuccess: Triggered specifically when the task reaches a terminal successful state.onSettled: Acts like a finally block, executing when the process ends regardless of success or failure, making it ideal for cleanup tasks like hiding loading indicators.Since server responses often include a list of key-value pairs (progress.Message[]) to describe internal state, the module provides utilities to normalize this data for UI display:
messagesToErrorString: Prioritizes extracting a message with the key Error. If absent, it concatenates all available messages into a single string. This ensures that even if the server doesn't provide a specific error field, the user receives some context about what went wrong.messageByName: A safe lookup utility to extract a specific value from the message array by its key, providing a fallback to prevent UI breakage.The following diagram illustrates the lifecycle of a long-running request managed by this module:
[ Client ] [ Server ] | | |-- POST (Start URL) -->| | |-- [ Task Initiated ] |<-- 200 (Running) -----| | (JSON: status, url)| | | | [ Wait pollingInterval ] | | |-- GET (Poll URL) ---->| | |-- [ Task Processing... ] |<-- 200 (Running) -----| | | | [ Wait pollingInterval ] | | |-- GET (Poll URL) ---->| | |-- [ Task Finished ] |<-- 200 (Finished) ----| | | [ Resolve Promise ]
The module treats non-ok HTTP statuses (like 4xx or 5xx) as terminal failures, rejecting the promise immediately and triggering the onSettled callback. It does not automatically retry on network failure; it assumes that if the polling chain is broken, the caller should decide whether to restart the entire process.
The query-chooser-sk module provides a compact, interactive UI component for building and displaying search queries based on a set of parameters (a ParamSet). It acts as a high-level wrapper around the more complex query-sk component, offering a “summary-first” workflow that keeps the UI clean while allowing for detailed query editing.
In many data-heavy applications, users need to filter large datasets using multiple keys and values. Displaying a full query builder at all times occupies significant screen real estate. query-chooser-sk solves this by:
paramset-sk.query-sk interface within a toggleable dialog.query-count-sk to show how many items match the current selection as the user modifies it.This is the primary entry point and defines the custom element. Its responsibilities include:
current_query (a URL-formatted query string) and the paramset (the available options).#dialog element, which contains the editing tools.query-change events from the internal query-sk component, updates its own state, re-renders the summary, and propagates information to the parent application.The functionality of query-chooser-sk is composed of several specialized elements:
paramset-sk: Used in the main view to display a concise, non-interactive summary of the active query filters.query-sk: The core interactive builder revealed when editing. It handles the logic of selecting keys and values from the ParamSet.query-count-sk: Situated inside the edit dialog, it performs asynchronous lookups (via the count_url attribute) to provide real-time counts of data points matching the user's current selection.The component operates in a cycle of viewing and refining:
+---------------------------------------+ | [Edit] Key1: Val1, Val2 | <--- (Summary View: paramset-sk) +-----+---------------------------------+ | | (Click Edit) v +-----+---------------------------------+ | [Close] | | +-----------------------------------+ | | | (query-sk) | | <--- (Edit View: User selects filters) | | Key1: [x]Val1 [x]Val2 [ ]Val3 | | | +-----------------------------------+ | | Matches: 1,245 | <--- (Live Count: query-count-sk) +---------------------------------------+
paramset-sk._editClick handler adds a CSS class to display the hidden dialog.query-sk, the _queryChange handler updates the current_query property. This update is reactive:query-count-sk sees the new query and fetches a new count.paramset-sk summary updates to reflect the latest selections.key1=val1&key1=val2) as the primary data exchange format for current_query. This makes it trivial to sync the component state with the browser's address bar or use it directly in API requests.query-chooser-sk instances can exist on one page without managing z-index or global state conflicts.connectedCallback utilizes _upgradeProperty for attributes like paramset and key_order. This ensures that if the properties are set on the DOM element before the custom element definition is loaded, the values are correctly captured and processed.| Attribute | Property | Description |
|---|---|---|
current_query | current_query | The current selection formatted as a URL query string. |
count_url | count_url | The endpoint URL used by query-count-sk to fetch match counts. |
| N/A | paramset | (Property only) The object containing all available keys and values. |
key_order | key_order | An array of strings determining the order in which keys appear in the query builder. |
The query-count-sk module provides a specialized UI component designed to report the number of data points or traces that match a specific query string within the Perf system. It serves as a live feedback mechanism, allowing users to understand the scope of their selection (e.g., in a query builder) before executing a full search or visualization.
The component is built using Lit and leverages the @lit/task package to manage asynchronous data fetching. This architecture ensures that the component reacts efficiently to property changes while maintaining a responsive UI.
The core of the component is a Task that monitors two primary inputs:
url: The endpoint to which the count request is sent.current_query: The query string to be evaluated.Whenever either of these properties changes, the task automatically triggers a POST request. The component is designed to handle rapid changes; if a new query is provided while a previous fetch is still in flight, the previous request is aborted via an AbortSignal to prevent race conditions and unnecessary network traffic.
Unlike a simple display widget, query-count-sk performs two roles upon receiving a successful response from the server:
count and displays it.paramset-changed custom event. The response from the server includes a paramset (the set of all possible keys and values matching the current query), which the component bubbles up to notify parent components that the available filter options may have changed based on the current selection.The following diagram illustrates the lifecycle of a query count request:
Property Change Fetch Task Server DOM / Parent (current_query) | | | | |---- triggers ------->| | | | |---- POST (query) -->| | | | | | | |<--- JSON Response --| | | | (count, params)| | | | | | |<--- updates count ---| |--- Dispatch Event ->| | in render() | | (paramset-changed) |
Contains the element definition. It uses a spinner-sk to provide visual feedback during the loading state. To maintain consistency with legacy behaviors, the displayed count is reset to 0 whenever a new fetch task is initiated or pending.
Provides a Page Object (PO) for testing. This file is crucial for integration and end-to-end tests, offering an abstraction layer to query the internal state of the component (like the numeric value of the count or the visibility of the spinner) without coupling tests to the internal DOM structure.
current_query: A string representing the query to count.url: The destination for the CountHandlerRequest.paramset-changed: Dispatched when the server returns a new ReadOnlyParamSet. This allows other UI components (like dropdowns or filters) to update their available options dynamically.The component sends a CountHandlerRequest which includes a time range (defaulting to the last 24 hours). This design choice assumes that the “count” of a query is most relevant within the context of recent data, though the time window is currently hardcoded within the task logic. Error handling is integrated with errorMessage to toast notifications to the user if the backend fails to process the query.
regressions-page-skThe regressions-page-sk module provides a specialized dashboard for performance regression management. It allows “Sheriffs” (users responsible for monitoring performance) to view, filter, and triage anomalies associated with specific subscription configurations.
This module acts as a centralized interface for reviewing performance anomalies detected by the system. It connects to backend endpoints to fetch a list of active subscriptions (Sheriff configurations) and then retrieves the specific anomalies (regressions) associated with a selected subscription.
The page is designed to handle large datasets through pagination (via cursors) and provides filtering capabilities to distinguish between new regressions, triaged issues, and performance improvements.
regressions-page-sk.tsThis is the main entry point and logic controller for the page. It is a LitElement that manages the following:
fetch_anomalies_from_sql is enabled in the global perf configuration.<subscription-table-sk>: Displays metadata about the selected sheriff configuration (labels, components, CC list).<anomalies-table-sk>: Displays the actual list of detected regressions.The module uses a combination of URL parameters and localStorage to ensure a consistent user experience.
showTriaged and selectedSubscription so users can share links to specific views.perf-last-selected-sheriff so that when a user returns to the page, their last worked-on subscription is automatically reselected.The typical workflow involves selecting a sheriff configuration and refining the view to focus on actionable items.
[ Select Sheriff ] -> [ Fetch Subscription Metadata ] -> [ Fetch Anomalies ] | | | v v v [ Update URL/LS ] [ Render Subscription Table ] [ Render Anomalies Table ] | |---[ Show More ] ---+ | | +<---[ Append Data ]-+
The component supports two types of pagination depending on the backend:
anomaly_cursor in the JSON response; if present, it displays a “Show More” button and passes the cursor back in the next request.pagination_offset based on the current length of the cpAnomalies array.The page itself does not handle the rendering of individual anomaly rows or subscription details. Instead, it delegates these tasks to anomalies-table-sk and subscription-table-sk. This allows regressions-page-sk to focus strictly on the “Page” level logic: URL state, global spinners, and high-level filtering.
The module implements a dual-spinner strategy:
anomaliesLoadingSpinner: An “upper” spinner that activates during initial loads or filter changes, signaling a full data refresh.showMoreLoadingSpinner: A localized spinner within the “Show More” section, indicating that the page is appending more data to the existing list rather than replacing it.regressions-page-sk_test.ts): Focuses on state transitions, URL parameter parsing, and ensuring the correct API calls (with correct query strings) are made when filters are toggled.regressions-page-sk_puppeteer_test.ts): Uses Page Objects (regressions-page-sk_po.ts) to simulate user interactions like selecting a sheriff from a dropdown and verifying that the resulting tables render correctly via screenshots and DOM inspection.The report-page-sk module provides a comprehensive reporting view for performance anomalies. It serves as a centralized dashboard where users can review a list of detected regressions (anomalies), visualize them through interactive graphs, and inspect the shared commit history associated with those regressions.
The primary purpose of this module is to consolidate triage workflows. Instead of looking at anomalies in isolation, report-page-sk groups related issues together, allowing a developer to see how multiple performance shifts might be tied to the same set of commits.
The page logic is driven by URL parameters (such as bugID, anomalyIDs, or sid), which determine which anomalies are fetched from the backend and which graphs are automatically generated upon page load.
The module uses an internal AnomalyTracker class to maintain the state of all anomalies currently being viewed. This tracker manages the relationship between:
Anomaly data.ExploreSimpleSk graph instance associated with that specific anomaly.Timerange relevant to the regression.This separation ensures that the page can efficiently add or remove graphs from the DOM as the user toggles checkboxes in the list without losing the underlying data context.
The anomalies-table-sk component (referenced as anomaly-table) displays the metadata for each regression.
anomalies_checked event. report-page-sk listens for this to dynamically mount or unmount graph components.Graphs are rendered using multiple instances of explore-simple-sk. To prevent browser UI freezes when a large number of anomalies are reported (e.g., a massive regression affecting dozens of tests), the module implements chunked loading:
data-loaded event from the current batch before starting the next.Each graph is configured to show a “buffer” of one week before and after the anomaly's time range to help users determine if a regression has already been mitigated or if it represents a recurring pattern.
Since a report often contains multiple graphs representing the same time period or the same commit range, report-page-sk synchronizes user interactions across all visible explore-simple-sk instances.
If the instance uses integer-based commit numbers, the module calculates the intersection of commit ranges for all displayed anomalies.
URL Params -> Fetch /_/anomalies/group_report
|
v
Load AnomalyTracker <-------+
| |
+----------+----------+ |
| | |
Populate Table Find Commits |
(anomalies-table-sk) (lookupCids) |
| | |
+----------+----------+ |
| |
Load Graphs in Chunks <----+
(explore-simple-sk)
User Clicks Checkbox
|
v
anomalies-table-sk dispatches "anomalies_checked"
|
v
report-page-sk receives event
|
+----[ If Checked ]----> Create explore-simple-sk
| Initialize with Anomaly Query
| Append to #graph-container
|
+---[ If Unchecked ]---> Find graph in AnomalyTracker
Remove from DOM
Unset in Tracker
The revision-info-sk module provides a specialized component for investigating performance anomalies associated with specific source control revisions. It serves as a bridge between a revision ID and the various performance tests (benchmarks, bots, and test cases) that may have been impacted around that point in time.
When a regression or improvement is detected in the Skia Perf system, it is often tied to a range of revisions. This module allows users to input a specific revision ID and retrieve a comprehensive list of all anomalies and performance data associated with it.
Beyond simple display, the module facilitates deep-dive analysis by allowing users to select multiple performance traces and navigate to a multi-graph view. This allows for side-by-side comparison of different tests that were affected by the same revision.
This file contains the core logic for the custom element. It handles several distinct responsibilities:
stateReflector to sync the current revision ID with the URL query parameters. This ensures that a specific search state can be bookmarked or shared./_/revision/) to fetch RevisionInfo objects.Defines the layout for the results table and the loading indicator. It ensures the spinner is positioned consistently relative to the text and that the data table is readable.
Provides a mock environment for the component. It uses fetch-mock to simulate backend responses, allowing for UI development and testing without a running Perf server.
The choice to use stateReflector is driven by the need for deep-linking. Performance analysis is iterative; users often need to jump between the revision info page and graph pages. By keeping the revisionId in the URL, the component supports standard browser navigation (back/forward) and collaborative debugging.
The implementation of getMultiGraphUrl handles the complexity of “joining” different performance queries. Since each anomaly might belong to a different master, bot, or test, the component generates a GraphConfig for each row. Because these combined queries can result in extremely long URLs that exceed browser limits, it uses updateShortcut to store the configuration on the server and use a short ID in the resulting link.
The following diagram illustrates the flow of data within the module:
User Input (Revision ID) | v [ stateReflector ] <------> URL (?rev=123) | v [ getRevisionInfo() ] ----> API Request (/_/revision/) | |<---------------- Response (JSON) v [ Render Table ] ---------> User selects checkboxes | v [ viewMultiGraph() ] | +--> [ getGraphConfigs() ] +--> [ updateShortcut() ] ----> API Request (/_/shortcut/update) | | |<-------------------------------------+ v [ Window Redirect ] ------> /m/?shortcut=ABC&begin=...&end=...
The split-chart-menu-sk module provides a specialized UI component designed to facilitate the logical partitioning of performance data. It presents a list of available trace attributes (such as benchmark, story, or subtest) to the user, allowing them to select a criterion for splitting a unified data visualization into multiple, more granular charts.
The module is built on the principle of reactive data binding via context consumption. Instead of requiring manual property passing, the component integrates directly with the Perf application's data layer:
dataframeContext and dataTableContext. This ensures that the menu options are always synchronized with the current dataset being viewed. If the underlying data changes, the list of attributes available for splitting updates automatically.split-chart-menu-sk.tsThis is the core implementation file. It manages the following responsibilities:
getAttributes utility from the traceset module to parse the DataFrame and extract a unique list of keys present in the trace set.menuOpen) of the dropdown interface, ensuring a standard Material Design interaction pattern.split-chart-selection event. This event carries the SplitChartSelectionEventDetails interface, containing the selected attribute string.The component uses @material/web components (md-outlined-button, md-menu, and md-menu-item) to provide a consistent look and feel with the rest of the application. The styling in split-chart-menu-sk.css.ts focuses on ensuring the menu anchors correctly within relative layout containers and follows the system-level color palette.
The following diagram illustrates how data flows through the component to result in a user action:
[ Data Layer ] [ split-chart-menu-sk ] [ Parent Component ] | | | | DataFrame (via context) ------------>| | | |-- extract attributes | | |-- render md-menu | | | | | User Interaction <----------| | | (Select "benchmark") | | | |-- dispatch CustomEvent | | | "split-chart-selection" ------>| | | |-- Handle Split | | |-- Update Layout
While functional, this component is marked as deprecated in favor of “Split Checkboxes.” This suggests a transition in the UI design from a single-selection dropdown model to a multi-selection or checkbox-based model for defining chart splits. External modules should use this component with the understanding that its replacement offers different interaction semantics.
The subscription-table-sk module provides a specialized custom element designed to display metadata and configuration details for Perf subscriptions and their associated anomaly detection alerts. It serves as a read-only summary view, typically used in dashboards or report pages where users need to verify the settings governing performance monitoring and bug filing.
The component is built using LitElement and follows a reactive property model. It accepts two primary data structures: a Subscription object containing bug-filing metadata (owner, component, priority, etc.) and an array of Alert objects defining the statistical parameters for anomaly detection.
When data is loaded via the subscription and alerts properties, the component renders a summary “details” card. The detailed alerts table is hidden by default to keep the UI clean, but can be expanded by the user to inspect technical detection parameters like the algorithm (e.g., stepfit, mannwhitneyu), radius, and “interestingness” thresholds.
[ Data Source ] -> ( Subscription & Alert Objects ) | v +-----------------------------+ | subscription-table-sk | |-----------------------------| | [ Summary Card ] | <--- Formats emails, components, | | and Gerrit revisions as links. | | | [ Toggle Button ] | <--- Manages "showAlerts" state. | | | [ Hidden/Visible Table ] | <--- Renders Alert params and +-----------------------------+ uses <paramset-sk> for queries.
The module is responsible for transforming raw subscription JSON into a user-friendly summary. It implements specific formatting logic for:
When expanded, the component displays a dense table of alert parameters. A key implementation detail is the integration with paramset-sk. Since alert queries are often complex URL-encoded strings (e.g., source_type=image&sub_result=min_ms), the component utilizes the toParamSet utility to parse these strings into structured key-value pairs, which are then rendered by the paramset-sk element for better readability.
The visibility of the alerts table is managed via internal @state. Whenever a new subscription is loaded (via property assignment or the load() method), the table visibility is reset to false. This ensures that switching between different subscriptions provides a consistent initial view.
subscription-table-sk.ts: The main element logic. It handles property updates, state transitions for the toggle button, and the template generation for both the summary card and the alert table.subscription-table-sk.scss: Provides scoped styling, specifically handling the layout of the details card and ensuring the configuration table adheres to a compact, “small” font-size suitable for technical parameters.infra-sk/modules/paramset-sk: An external dependency used within the table to render the breakdown of the query parameters that define what data the alert is monitoring.The telemetry module provides a centralized mechanism for capturing and reporting frontend performance metrics and user interaction data. It is designed to provide visibility into the health and performance of the application without significantly impacting network performance or reliability.
The module facilitates the tracking of two primary types of data:
Rather than sending a network request for every individual event—which would be chatty and inefficient—the module buffers events locally and flushes them in batches.
To minimize the overhead on the user's browser and the backend, metrics are held in a local buffer.
A common challenge with frontend telemetry is losing data when a user closes a tab or navigates away before a scheduled flush occurs.
visibilitychange event. When the page state becomes hidden, it immediately triggers a flush of all pending metrics, bypassing the 5-second timer.telemetry.tsThis file contains the core logic and exports a singleton instance of the Telemetry class. This singleton ensures that all parts of the application share the same buffer and timing cycle.
CountMetric & SummaryMetric Enums: These serve as a “source of truth” for all supported metric names. Adding a new metric requires updating these enums, which provides type safety across the codebase.increaseCounter(name, tags): The primary method for incrementing a counter. It automatically sets the value to 1.recordSummary(name, value, tags): The method used for performance timing or recording specific sizes/counts.sendBufferedMetrics(): An internal asynchronous method that handles the fetch request to the /_/fe_telemetry endpoint. It handles the cloning and clearing of the buffer to prevent race conditions during the network request.The following diagram illustrates how an event triggered by a user eventually reaches the backend.
User Action / Event | v telemetry.increaseCounter() <-- Application code calls this | +-----> [ Buffer (Array) ] | | (Wait 5s OR Visibility Hidden) v +----------+-----------+ | sendBufferedMetrics | +----------+-----------+ | v POST /_/fe_telemetry --> [ Backend Server ] | {Success? Yes} ----> [ Clear local copy ] | {Success? No } ----> [ Re-queue metrics ]
To instrument a new part of the application:
telemetry.ts.telemetry singleton and call the relevant method.tags object to provide dimensions (e.g., specific sub-component names or error types) that allow for more granular filtering in dashboards.The test-picker-sk module provides a specialized UI component for exploring and selecting performance traces. It enforces a hierarchical selection process, guiding users through large datasets by dynamically fetching valid options for subsequent parameters based on previous choices.
The primary goal of test-picker-sk is to ensure users build valid queries for the Perf database. Rather than presenting all possible parameters at once, which could lead to empty results, the component reveals fields sequentially.
As a user selects values in one field (e.g., “Benchmark”), the component queries the backend to find available values for the next parameter in the hierarchy (e.g., “Bot”). This “drill-down” approach prevents invalid combinations and provides immediate feedback on the number of matching traces found.
The component relies on an ordered list of parameters (e.g., ['benchmark', 'bot', 'test']). This order is critical because it defines the dependency chain for data fetching. When a value is changed at index $i$ in the hierarchy, all fields at index $i+1$ and greater are invalidated and removed. This ensures that the state of the picker always represents a valid path through the data tree.
To prevent performance degradation on both the client and server, the component enforces a PLOT_MAXIMUM (defaulting to 200 traces).
In the Perf database, traces may not have a value for every possible parameter. The component maps these empty strings to a “Default” label in the UI. Internally, these are translated to a sentinel value (__missing__) when constructing queries, allowing users to explicitly select traces that lack a specific attribute.
FieldInfo)The internal state is managed via an array of FieldInfo objects. Each object tracks:
PickerFieldSk element instance.When a user makes a selection that narrows the results, the component initiates the following process:
User Selects Value | v callNextParamList() ----> POST /_/nextParamList/ (with current query) | | | v |<---- Returns {paramset: {next_param: [options]}, count: N} v addChildField() | |--> Create new PickerFieldSk |--> Populate with options (mapping '' to 'Default') |--> Attach 'value-changed' listeners |--> Update match count UI
The component supports complex “trigger” rules through applyConditionalDefaults. This allows the UI to automatically pre-select values in subsequent fields based on specific selections in earlier ones. For example, selecting a specific metric might automatically select a preferred stat (like ‘avg’), streamlining the user experience for common workflows.
Users can “split” the graph by a specific parameter. The component ensures only one parameter is split at a time. If a user enables “split” on a field, the component disables the split checkbox on all other fields and dispatches a split-by-changed event to notify the parent application to adjust the graph's grouping logic.
test-picker-sk.ts: The main logic for the hierarchical picker, state management, and event handling.test-picker-sk.scss: Styles the layout, specifically the “drill-down” field container and the match count indicator.test-picker-sk_po.ts: Page Object for automated testing, providing methods to interact with the fields and wait for async loading states.test-picker-sk-demo.ts: Provides a mock environment with a simulated backend (/_/nextParamList/) used for development and visual testing.plot-button-clicked: Dispatched when the user clicks “Plot”. Detail contains the full query string.add-to-graph / remove-trace: Dispatched during “Auto-add” mode to incrementally update an existing visualization.split-by-changed: Dispatched when the “Split” toggle is flipped on any field.The /modules/tests module contains end-to-end (E2E) integration tests for the Perf application. These tests leverage Puppeteer to simulate user interactions and verify the visual and functional integrity of the application's core pages.
The primary goal of this module is to provide a “sanity check” for the production-facing UI. Unlike unit tests that focus on individual components, these tests ensure that the integration between the frontend and the backend (or a mock representation of it) remains stable.
The tests are designed to be “perf-blocking,” meaning they represent the critical paths a user takes. If these tests fail, it indicates a high probability that a real user will encounter a broken experience.
The module uses a specialized testing infrastructure built around Puppeteer and a mock server:
frontend_mock_server as the sk_demo_page_server. This allows the tests to run against a predictable and stable backend environment, decoupling UI verification from actual database state or external network flakiness.loadCachedTestBed pattern. This optimizes execution by reusing browser instances where possible, reducing the overhead of spinning up a fresh Puppeteer instance for every test suite.takeScreenshot). These screenshots serve as a baseline to prevent accidental regressions in layout, CSS, or initial rendering.This component is responsible for verifying that the primary entry points of the application load correctly. It targets:
/e (Explore Page)/m (Multigraph Page)/a (Regressions/Alerts Page)It includes logic to handle common UI overlays, such as cookie consent banners, ensuring that screenshots represent the actual application state rather than transient UI elements.
Beyond simple loading, this component tests specific user workflows within the Multigraph and Explore views. It validates complex UI components (like Vaadin multi-select combo boxes) to ensure that event listeners, data binding, and dropdown behaviors are functioning correctly in a browser environment.
The following diagram illustrates how a typical test in this module interacts with the infrastructure:
[ Test Suite Start ] | V [ loadCachedTestBed() ] <---- Reuses browser instance for efficiency | V [ beforeEach() ] -----------+ | | | Set Viewport Size | Navigate to Target URL (e.g., /m) | | V | [ it() Test Case ] <--------+ | +---> [ Interaction ] (Click, Type, Wait for Selector) | +---> [ Screenshot ] (Capture state for visual diffing) | V [ Test Suite End ]
Because these tests aim to simulate real user visits, they must account for global UI elements that might obscure the application. The acceptCookieBanner helper is a design choice to ensure that “noise” from the base platform doesn't cause false positives in screenshot comparisons or block element visibility during functional tests.
The themes module serves as the centralized styling foundation for the project. Rather than defining an entirely new design system from scratch, it acts as a customization layer that bridges the project's specific aesthetic requirements with the base design tokens provided by the shared infra-sk infrastructure.
The primary design principle for this module is to maintain a minimal footprint. It is structured to follow a “delta-based” approach, where the styles defined here only represent deviations from the global shared themes or essential overrides for base HTML elements.
This approach was chosen to:
infra-sk/themes, the project automatically inherits updates to the core design system (such as color palettes, spacing units, and typography) without manual intervention.The module is responsible for pulling in the necessary external typography and iconography. Currently, it integrates the Material Icons library, making these glyphs globally available across all web components in the project. It also serves as the bridge to the shared SASS library, ensuring that variables and mixins from the infrastructure are accessible to local stylesheets.
The themes.scss file handles the “Sanitization” or “Reset” logic for the application's root. It enforces a consistent body configuration (zeroing out default browser margins/padding) to ensure that top-level layout components (like nav bars or sidebars) align perfectly with the viewport boundaries.
The module houses structural utilities that facilitate specific UX behaviors. A notable example is the #bottom-spacer implementation.
Workflow: Scroll Buffer Management
[ Viewport ] |-------------------| | Content Area | | | | [Element A] | | [Element B] | | | |-------------------| <--- End of content | #bottom-spacer | <--- Provides 500px of "breathing room" |-------------------|
The inclusion of a large bottom spacer is a deliberate implementation choice to ensure that users can scroll past the final pieces of interactive content, preventing UI elements (like floating action buttons or footer overlays) from obscuring the last items in a list or terminal output.
The module is exposed as a sass_library. This allows other modules in the project to depend on themes_sass_lib, ensuring that the global styles and infrastructure dependencies are bundled correctly during the Sass compilation process. By depending on //infra-sk:themes_sass_lib, it ensures that the dependency graph correctly resolves the cascading nature of the styles.
The trace-details-formatter module provides a standardized way to translate internal trace data (parameters and keys) into human-readable strings and, conversely, to reconstruct query parameters from those strings.
Trace IDs in Perf are often complex sets of key-value pairs. Depending on the specific domain (e.g., standard Skia traces vs. Chrome-specific performance benchmarks), the desired visual representation of these traces and the logic required to query them varies significantly. This module abstracts those differences behind a common interface.
The module defines a central TraceFormatter interface that ensures consistency across different formatting implementations:
formatTrace(params: Params): Converts a dictionary of trace parameters into a displayable string.formatQuery(trace: string): Parses a formatted trace string back into a URL query string compatible with the Perf backend.The module selects an implementation at runtime based on the global window.perf.trace_format configuration.
Used when no specific format is defined. It provides a fallback by simply returning the unique Trace ID (the joined key-value pairs). It does not support converting strings back into queries.
Designed specifically for Chrome‘s hierarchical performance data. It handles the mapping between the legacy Chrome “test path” structure and Skia’s parameter-based system.
master, bot, benchmark, test, and three levels of subtest.formatTrace produces a slash-delimited string (e.g., master/bot/benchmark/...).formatQuery splits these paths back into their constituent keys.A significant responsibility of this module is handling the transition from Chromeperf-style “test paths” to Skia's “stat” parameters. In the Chrome ecosystem, statistical aggregations (like averages or maximums) are often encoded as suffixes in the test name.
When enable_skia_bridge_aggregation is active, the ChromeTraceFormatter automatically extracts these suffixes and maps them to standard Skia stat values:
| Suffix | Skia Stat Value |
|---|---|
avg | value |
std | error |
max, min, count, sum | (remains same) |
If a test name lacks a known suffix, the formatter defaults the stat parameter to value to prevent the system from accidentally loading all available statistical variations (which would result in 6x more data being fetched than intended).
The following diagram illustrates how the module resolves a formatter and processes data:
[ Global Config ]
|
| window.perf.trace_format
v
[ GetTraceFormatter() ]
|
+---- "chrome" ----> [ ChromeTraceFormatter ]
| |
| +-- formatTrace: join(keys, '/')
| +-- formatQuery: split('/') + Stat Mapping
|
+---- (default) ---> [ DefaultTraceFormatter ]
|
+-- formatTrace: makeKey(params)
traceformatter.ts: Contains the TraceFormatter interface, the concrete implementations for Chrome and Default styles, and the factory function GetTraceFormatter.traceformatter_test.ts: Validates the logic for path splitting and the conditional application of statistical mappings based on global window settings.The triage-menu-sk module provides a unified interface for managing and triaging performance anomalies in bulk. It serves as a central control point within the Perf UI, allowing users to categorize detected regressions or improvements by filing bugs, associating them with existing reports, ignoring them, or “nudging” their detected revision range.
Instead of implementing bug-filing logic directly, triage-menu-sk acts as an orchestrator. It encapsulates and manages two specialized dialog components: new-bug-dialog-sk and existing-bug-dialog-sk. This separation of concerns allows the menu to focus on the high-level workflow (selecting anomalies and choosing an action) while delegating complex form handling and bug-tracker integration to the specific dialog modules.
One unique feature of this module is the “Nudge” functionality. Anomalies are detected over a revision range, but the detection might not perfectly align with the actual point of regression. The “Nudge” buttons (typically ranging from -2 to +2) allow users to shift the anomaly's revision boundaries.
AnomalyData (coordinates x and y) locally and dispatches an event so the parent chart can immediately reflect the shift without a full page reload./_/triage/edit_anomalies with the NUDGE action, ensuring the database reflects the refined revision range.The component relies heavily on an event-driven model to maintain synchronization with the rest of the application:
anomaly-changed Event: This is the primary output of the module. Whenever an anomaly is ignored, nudged, or associated with a bug, this event is dispatched. It carries the updated anomaly details and trace IDs, signaling to parent components (like graphs or tables) that they need to invalidate their caches and re-render.anomalies and traceNames properties. By calling setAnomalies(), a parent component can dynamically update which data points the menu is currently acting upon.The core logic of the module. It handles:
new-bug-dialog-sk.existing-bug-dialog-sk.IGNORE request to the backend. It sets the bug_id to -2 (a convention for ignored anomalies) and displays a confirmation toast./_/triage/edit_anomalies. This endpoint is polymorphic, handling IGNORE, RESET, and NUDGE actions based on the provided body.telemetry module to track which triage actions are most frequently taken by users.User Interaction
|
V
[ Triage Menu ] --------------------------+
| |
| (Action: New/Existing) | (Action: Ignore/Nudge)
V V
[ Dialog Components ] [ Backend API Call ]
(new-bug-dialog-sk) (/_/triage/edit_anomalies)
(existing-bug-dialog-sk) |
| |
+------------> [ Success ] <--------+
|
V
[ Dispatch anomaly-changed ]
[ Show Toast / Update UI ]
A data structure used to represent potential “nudge” states. It maps a display index (e.g., +1) to specific revision ranges (start_revision, end_revision) and UI coordinates (x, y). This allows the menu to render a sequence of buttons that correspond to valid shifts in the data.
Provides the Page Object for testing. It abstracts the internal DOM structure, including the nested dialogs and the ignore toast, allowing Puppeteer tests to interact with the triage flow without being coupled to the specific HTML structure or CSS classes.
The triage-page-sk module provides a comprehensive dashboard for reviewing and triaging performance regressions in the Perf system. It allows users to scan a matrix of commits and alerts, visualize clusters of data, and record triage decisions (e.g., positive, negative, or untriaged).
The page is designed as a high-density “triage queue.” It presents a grid where rows represent commits and columns represent different configured alerts. This layout allows a developer or performance engineer to quickly identify which commits caused regressions across multiple benchmarks or metrics.
The primary workflow involves:
The component uses stateReflector to sync its internal state (time range, subset filters, alert filters) with the URL. This ensures that triage views can be bookmarked or shared.
updateRange(): This is the core data-fetching method. It sends a RegressionRangeRequest to the /_/reg/ endpoint. The server responds with a RegressionRangeResponse containing the headers (alerts) and the table data (commits and their regression status).calc_all_filter_options(): This logic processes the categories returned by the server to populate the “Which alerts to display” filter, allowing users to focus on specific teams or components.The grid is constructed dynamically based on the server's response:
commit-detail-sk to show the commit hash, author, and message.triage-status-sk elements. If a cluster is found, it shows the current triage status. If no cluster is found, it displays a “∅” symbol, which links to the generic cluster view for that commit/query combination.When a user clicks a status in the grid, the triage_start event is captured, opening a <dialog> containing a cluster-summary2-sk.
cluster-summary2-sk: Responsible for rendering the actual plot and summary statistics of the regression.triaged(): When a triage decision is made inside the dialog, this method handles the POST request to /_/triage/. If the triage results in a new bug, it automatically opens the bug reporting URL in a new window.The following diagram illustrates how a user moves from the high-level grid to a specific data investigation:
[ Triage Page Grid ]
|
| (User clicks a 'triage-status-sk')
v
[ triage_start event ] --------------------------+
| |
v |
[ Open <dialog> ] |
| |
+--> [ cluster-summary2-sk ] <------------+
|
| (User analyzes plot)
|
+-----------+-----------+
| |
(Press 'p' / 'n') (Press 'g')
| |
v v
[ Update Status ] [ Open Dashboard ]
| (Full Explore View)
v
[ POST /_/triage/ ]
|
+--> (Optional: Open Bug Tracker)
To facilitate rapid triaging, the module implements KeyboardShortcutHandler. When the triage dialog is open, the following shortcuts are available:
p: Mark the current regression as Positive.n: Mark the current regression as Negative.g: Go to the full Explore page for this cluster to perform a deeper analysis of the underlying traces.?: Open the keyboard shortcuts help overlay.These shortcuts are managed via handleKeyboardShortcut in the keyDown listener, ensuring they only trigger when the user is not actively typing in an input field.
direction of the alert (UP, DOWN, or BOTH). This minimizes horizontal scrolling by only showing “High” or “Low” columns when relevant to that specific alert's configuration.subset parameter (all, regressions, untriaged) allows the server to prune the data significantly, which is critical for performance when viewing large time ranges (e.g., several weeks of data across hundreds of alerts).The triage-status-sk module provides a visual indicator and interaction point for the triage state of performance clusters. It serves as a compact UI element that communicates the current classification of a detected anomaly (e.g., untriaged, positive, or negative) and initiates the workflow for modifying that state.
In the Perf system, data anomalies are grouped into clusters that require human intervention to determine if they represent actual regressions or false positives. This module encapsulates the visual representation of that status.
Rather than managing the complex logic of the triage dialog itself (which involves data visualization and form inputs), triage-status-sk acts as a trigger. When a user interacts with the component, it broadcasts the necessary context—including the current triage state, the associated alert configuration, and the cluster summary—to be handled by a parent container or a global dialog manager.
triage-status-sk.ts)The primary class is an ElementSk that renders a stylized button. Its responsibilities include:
TriageStatus (status and message) provided via its properties. The visual state is driven by CSS classes that correspond to the status strings (positive, negative, untriaged).alert (the configuration that triggered the detection), full_summary (the statistical data of the cluster), and cluster_type (indicating if the cluster represents a high or low change).start-triage custom event.triage-status-sk.scss)The module uses a specialized icon component (tricon2-sk) inside the button. The styling logic is tightly coupled with the status:
--positive, --negative, --untriaged) to ensure consistency across the application.The following diagram illustrates how the component interacts with the rest of the application to start a triage action:
[ User ] | | (Clicks Button) v [ triage-status-sk ] | |-- 1. Bundles: { triage, alert, full_summary, cluster_type } |-- 2. Dispatches 'start-triage' Event v [ Parent / Dialog Manager ] | |-- 3. Receives Event Detail |-- 4. Opens Triage Dialog for User Input v [ Backend API ] (Updated via parent)
start-triageThe start-triage event is the primary output of this module. Its detail object contains:
| Property | Description |
|---|---|
triage | The current TriageStatus object. |
full_summary | The FullSummary data structure representing the cluster statistics. |
alert | The Alert configuration associated with this detection. |
cluster_type | Whether the regression is ‘high’ or ‘low’. |
element | A reference to the TriageStatusSk instance that fired the event, allowing the receiver to update the element directly upon successful triage. |
The triage2-sk module provides a specialized UI component for managing the classification of data points or alerts within the Perf system. It allows users to toggle between three distinct states: positive, negative, and untriaged.
This component is designed to be a compact, intuitive control that provides immediate visual feedback on the current classification of an item while making it easy to change that state with a single click.
The module is built as a custom element using the Lit library and extends ElementSk.
The primary state of the component is driven by its value attribute. The component synchronizes this attribute with a property of the same name. To ensure data integrity, it uses a guard function (isStatus) to validate that any assigned value conforms to the Status type (defined in perf/modules/json).
The internal logic follows a reactive pattern:
value attribute is updated programmatically.attributeChangedCallback triggers a re-render and dispatches a change event.?selected attribute on the internal buttons to highlight the active state, which is then styled via CSS.The component consists of a group of three buttons, each containing a specific icon:
check-circle-icon-sk: Represents a positive result.cancel-icon-sk: Represents a negative (false positive) result.help-icon-sk: Represents an untriaged (unknown) state.The styling is implemented in triage2-sk.scss with specific support for “light” and “dark” modes. It relies on CSS variables (e.g., --surface, --on-disabled) to integrate seamlessly with the broader Perf application themes. Deselected buttons are intentionally dimmed to draw focus to the currently selected state.
The following diagram illustrates how an interaction with the UI flows through the component to notify the parent application:
User Click Component Property DOM Attribute Parent Application ---------- ------------------ ------------- ------------------ | | | | [Click .positive] ------> [set value] | | | | | | | | ----------------> [attr: value] | | | | | | | [attributeChanged] | | | | | | [_render()] <----------------- | | | | | | | | -----------------------------------------> [Event: 'change']
triage2-sk.ts: Contains the TriageSk class logic. It handles the attribute observation, property synchronization, and the dispatching of custom events when the triage status changes.triage2-sk.scss: Defines the visual representation. It uses sophisticated selectors to handle both legacy color schemes and modern theme-based variables, ensuring the icons are appropriately colored (Green for positive, Red for negative, Brown for untriaged) and that “raised” or “hover” states provide tactile feedback.index.ts: The entry point that defines the custom element in the global customElements registry.value (Attribute/Property): Reflects the current status. Defaults to untriaged if not set or if set to an invalid value.change (Event): Dispatched whenever the status changes. The detail property of the event contains the new Status value string.The tricon2-sk module provides a specialized UI component for displaying triage states. It translates semantic status strings—“positive”, “negative”, or “untriaged”—into consistent visual indicators (icons and colors) used across the application to represent the state of performance regressions or test results.
The component is designed around a single point of truth: the value attribute. By mirroring this attribute to a JavaScript property, the component ensures that updates made via HTML or direct property assignment trigger a re-render.
The implementation uses a declarative template approach. Instead of manually manipulating the DOM to swap icons, the TriconSk class uses a switch statement within its Lit template to determine which underlying icon element to mount:
Value ("positive") ------> <check-circle-icon-sk>
Value ("negative") ------> <cancel-icon-sk>
Value (default) ------> <help-icon-sk>
This design simplifies the component's internal state management, as the visual output is a pure function of the value property.
The styling logic in tricon2-sk.scss is decoupled into three distinct layers to ensure legibility across different UI contexts:
--green, --red) for basic integration..body-sk): Provides specific hex code overrides to ensure that the triage colors meet brand and contrast requirements when the application's standard theme is applied.By encapsulating these color mappings within the component's SCSS, the module prevents “color leak” and ensures that a “positive” icon always appears in the correct shade of green regardless of where it is placed in the application.
tricon2-sk.ts)The primary class extending ElementSk. It manages the lifecycle of the element and observes the value attribute. It is responsible for importing and registering the specific icon elements from elements-sk needed for the three states.
tricon2-sk.scss)Rather than relying on the parent container to style the icons, this file explicitly defines the fill properties for the internal icon components (check-circle-icon-sk, cancel-icon-sk, and help-icon-sk). This ensures that the semantic meaning of the icon (success, failure, or unknown) is always tied to its visual representation.
The module includes a demo page (tricon2-sk-demo.html) that showcases the component in all three triage states across light and dark modes. This is used by the Puppeteer test suite (tricon2-sk_puppeteer_test.ts) to perform visual regression testing, ensuring that the icons render correctly and maintain their color associations during UI changes.
The user-issue-sk module provides a custom LitElement component designed to manage the association between performance data points (traces at specific commit positions) and external bug tracking system (Buganizer) issues. It acts as a bridge between the Perf monitoring UI and the issue management backend, allowing users to view, link, create, and remove bug references directly within the context of a performance trace.
The component‘s appearance is heavily dictated by the user’s authentication state and the presence of an existing bug.
alogin-sk module, the component detects if a user is logged in. If a user is anonymous, they are restricted to viewing existing bug links; they cannot add, delete, or modify issue associations.bug_id property serves as a state indicator. A value of -1 hides the element entirely, 0 indicates no bug is currently associated, and a positive integer indicates an active link to an external issue.The module implements a specific workflow for adding issues (findOrAddIssue) that prioritizes data integrity:
/pre/_/triage/file_bug endpoint before saving the association. This ensures that every “Add Issue” action results in a valid, tracked entity in the bug host.Rather than managing the global state of the application, user-issue-sk utilizes a user-issue-changed Custom Event. When an issue is saved or deleted, the component dispatches this bubbling event. This allows parent components or data providers to react to changes (e.g., refreshing a list of anomalies or updating a graph) without the user-issue-sk component needing deep knowledge of the application's architecture.
user-issue-sk.tsThis is the primary implementation file containing the UserIssueSk class. Its responsibilities include:
user_id, bug_id, trace_key, and commit_position to contextualize the issue./_/user_issues: To query existing associations./_/user_issue/save: To persist a link between a trace point and a bug./_/user_issue/delete: To remove an association./_/triage/file_bug: To programmatically create a new bug in Buganizer.user-issue-sk_test.tsThe test suite ensures the component reacts correctly to different state combinations. It mocks the global window.perf configuration and API responses to verify that:
bug_host_url.The following diagram illustrates the logic flow when a logged-in user interacts with the “Add Issue” button:
[ Click "Add Issue" ] -> ( Show Input Field ) | [ Enter Bug ID ] | [ Click Checkmark ] | V { Check /_/user_issues } | _________________/ \_________________ | | (Issue Exists?) (Issue Not Found?) | | [ Set bug_id ] [ Call /_/triage/file_bug ] | | | [ Receive New bug_id ] \_________________ ________________/ \ / V [ Call /_/user_issue/save ] | [ Dispatch 'user-issue-changed' ] | ( Update UI to Link View )
[ Click Close-Icon ] -> [ Call /_/user_issue/delete ] | [ Dispatch 'user-issue-changed' ] | [ Reset bug_id = 0, exists = false ] | ( Update UI to "Add" Button )
The window module provides type definitions for global configuration and utility functions for parsing build-specific metadata from the browser's global environment. It serves as the primary bridge between the backend-injected configuration and the frontend application logic.
A central responsibility of this module is extending the global Window interface to include the perf property. This property holds the SkPerfConfig, which is typically populated by the server-side template during the initial page load.
By defining this in a centralized window.ts file, the project ensures type safety across all frontend modules when accessing environment-specific settings, such as the current image tag or deployment configuration.
The module implements logic to parse image tags used in the Skia Perf infrastructure. This is necessary for displaying versioning information to developers and operators, allowing them to quickly identify which specific build of the software is currently running.
The implementation focuses on identifying three distinct deployment patterns:
tag:git- prefix. The logic extracts the first seven characters of the git hash to provide a short, recognizable revision identifier.-louhi- string. These represent specific automated build pipeline outputs, where the logic extracts the specific build hash following the “louhi” marker.tag:latest or tag:v1.0), where the prefix is stripped to show the human-readable label.The getBuildTag function follows a specific sequence to normalize the raw image string provided by the backend:
Raw Tag String (e.g., image@tag:git-12345...) | +--- Split by '@' to isolate the tag portion | | | [No '@' found] --> Return 'invalid' | +--- Check if starts with 'tag:' | | | [False] ------> Return 'invalid' | +--- Pattern Matching | |-- Starts with 'tag:git-'? ----> [type: 'git'] (7-char hash) | |-- Contains '-louhi-'? --------> [type: 'louhi'] (build hash) | |-- Else -----------------------> [type: 'tag'] (full tag value)
window.ts: Contains the global type augmentation for the Window object and the logic for getBuildTag. It imports SkPerfConfig from the JSON schema definitions to ensure the global state remains synchronized with the backend data structures.window_test.ts: Validates the parsing logic against various real-world container image tag formats, ensuring that changes to the deployment pipeline's tagging convention do not break version reporting in the UI.The word-cloud-sk module provides a specialized data visualization component designed to represent the distribution and frequency of key-value pairs within a dataset, such as a cluster of performance traces. Despite its name, it renders data as a structured table with integrated bar charts rather than a randomized “cloud” of text, prioritizing legibility and precise comparison of relative frequencies.
The primary role of this module is to take a collection of data points (values and their associated percentages) and render them in a format that allows users to quickly identify dominant traits within a selected group. It is specifically designed for the Skia Perf UI to show which metadata keys or configurations (e.g., arch=x86, config=8888) are most prevalent in a given performance cluster.
The module follows a declarative pattern using lit for rendering and ElementSk as a base class.
ValuePercent objects. Each object contains a value (string) and a percent (number). The choice of a percentage-based input simplifies the rendering logic, as the component does not need to calculate totals or handle raw counts; it assumes the data is pre-processed.<table> for layout. This ensures that labels remain aligned while the distribution is visualized via “micro-bars”—div elements whose widths are set directly to the percentage value..body-sk or .darkmode), ensuring consistent integration with the rest of the application’s UI.Data Binding and Rendering When the items property is updated on the element, it triggers a re-render cycle. The component maps the data into table rows where the percentage is represented both numerically and visually.
[Data Update] | v [setter: items(val)] -> updates private _items | v [_render()] calls [WordCloudSk.template] | +--> [WordCloudSk.rows] maps items to: | +-- <td> {value} </td> (The label) +-- <td> {percent}% </td> (Numeric value) +-- <td> [---bar (width: Xpx)---] </td> (Visual representation)
word-cloud-sk.ts: Contains the logic and template for the custom element. It handles property shadowing for items and manages the rendering of the table rows.word-cloud-sk.scss: Defines the layout and theme-aware styling. It uses a fixed width for the percentage bars (100px) so that the percentage value maps 1:1 to a pixel width, providing a consistent scale across different instances.word-cloud-sk-demo.ts/html: Provides a sandbox for testing the component in different CSS contexts (standard vs. themed), demonstrating how the component adapts to its environment.nanostat is a command-line utility used to perform statistical comparisons between two sets of benchmark results generated by Skia's nanobench. It identifies whether performance changes (deltas) are statistically significant or merely the result of measurement noise.
In performance engineering, comparing the means of two benchmark runs is often insufficient because execution environments are noisy. A small change in the mean might be significant if the variance is low, while a large change might be statistically meaningless if the variance is high.
nanostat addresses this by applying statistical hypothesis testing to the raw samples collected during benchmarking. It consumes two JSON files (typically an “old” baseline and a “new” experimental run), performs a comparative analysis, and outputs a formatted table showing the magnitude of change alongside a p-value to indicate confidence.
The tool focuses on the “why” of a performance change by evaluating the probability that the observed difference happened by chance.
samplestats package). This is a non-parametric test, meaning it does not assume the benchmark samples follow a normal distribution, which is ideal for performance data that often contains outliers or asymmetrical distributions. Users can optionally switch to a Welch’s T-test for normally distributed data.0.05. If the calculated p-value is greater than this threshold, the change is considered “insignificant” and is represented by a tilde (~) instead of a percentage delta to prevent developers from chasing ghosts in the noise.--iqrr flag enables the Interquartile Range Rule to strip outliers from the sample sets before analysis. This is a design choice to provide a “cleaner” look at the core performance characteristics of the code, independent of transient system spikes.nanostat doesn't just compare files line-by-line; it understands the structure of Skia benchmark data.
paramtools and parser to group samples. It identifies benchmarks by their parameters (e.g., config, test, name). It automatically detects which parameters vary across the dataset and includes them in the output columns so the user can distinguish between different test configurations (e.g., gl vs gles).format.ParseLegacyFormat to maintain compatibility with the standard JSON output format used by nanobench.main.go)The core entry point manages the lifecycle of a comparison:
samplestats.Config object, defining the statistical “rules” for the session.parser.SamplesSet structures. Each set contains the raw execution times (samples) for every benchmark identified in the file.samplestats.Analyze. This produces a set of “Rows,” where each row represents a unique benchmark found in both files.formatRows function dynamically determines which metadata (like config or arch) is relevant. If all results share the same arch, that column is hidden to reduce clutter; if they differ, it is shown.[ File A (Old) ] [ File B (New) ] | | v v [ Parse Samples ] [ Parse Samples ] | | +----------+-----------+ | [ Match by Parameters ] (e.g., name, config) | [ Apply Outlier Filter ] (Optional: IQRR) | [ Run Statistical Test ] (Mann-Whitney U or T-Test) | [ Filter by Significance ] (p < Alpha?) | v [ Format Table: Mean, StdDev, Delta, p-value, Metadata ]
The tool uses a tabwriter to produce aligned, human-readable terminal output. A key implementation detail in formatRows is the calculation of the “Important Keys.” The tool scans all results and identifies which parameters (keys) have multiple values. It then prioritizes these keys in the output string, ensuring that the user sees exactly what differentiates one row from another without redundant information.
The /nanostat/testdata module serves as the regression testing suite and ground-truth repository for the nanostat tool. It contains paired performance benchmark results and the corresponding expected output (golden files) used to verify the tool's statistical analysis, formatting, and filtering logic.
The data in this directory is structured to facilitate end-to-end testing of how nanostat processes nanobench JSON output. The primary goal is to ensure that the statistical comparisons (Mann-Whitney U tests, p-values, and percentage deltas) remain accurate across code changes.
The design relies on “Golden File Testing”:
nanostat compares these files using various flags (e.g., sorting, significance thresholds)..golden file to ensure the results match expectations down to the whitespace and p-value calculation.desk_googledocs.skp_1_1000_1000).nanostat uses these raw sample arrays to calculate the mean, standard deviation, and statistical significance of the change between the “old” and “new” datasets.These files represent the expected CLI output under different operational modes. They contain fixed-width tables with columns for baseline (old), current (new), percentage delta, statistical significance (p-value and sample size), and test identification.
~).n) and p-values compared to the standard test.The data in this module illustrates how raw performance samples are transformed into human-readable insights:
[ nanobench_old.json ] [ nanobench_new.json ]
| |
| Comparison |
+------------> (v) <----------+
|
[ Mann-Whitney U Test ]
[ Mean / StdDev Calc ]
|
v
[ Formatting & Filtering ] ----> (Compare against .golden)
The test data specifically includes varied scenarios to stress-test the tool:
desk_googleimagesearch).desk_googledocs).± 15%) which result in different p-values and significance classifications.max_rss_mb and sksl_compiler bytes to verify that the tool handles different units of measurement correctly.The /pages directory serves as the entry point layer for the Perf application's web interface. It defines the structure, styling, and composition of individual HTML pages by orchestrating high-level custom elements (defined in /modules) into functional views.
The module follows a “Thin Page” architecture. Rather than containing complex business logic, each page acts as a declarative shell. This approach ensures:
perf-scaffold-sk component, providing a unified navigation, header, and footer across the entire application.explore-sk, alerts-page-sk). This makes the pages easy to maintain and the components easy to test in isolation.window.perf object, allowing the TypeScript modules to access instance-specific context (like git_repo_url or Nonce for security) immediately upon load.Most files in this directory follow a strict triplet pattern:
.html: Defines the DOM structure, typically wrapping a single major functional element inside a scaffold..ts: Handles the side-effect of importing the necessary custom element definitions so they are registered with the browser..scss: Imports global styles (like body.scss) and applies page-specific layout tweaks.When a user navigates to a Perf page, the following initialization process occurs:
[ Backend ] -> Injects JSON context into <script>
|
v
[ HTML Page ] -> Renders <perf-scaffold-sk>
|
+--------> Sets window.perf = { ... } (Context)
|
v
[ TS Entry ] -> Imports Modules -> Custom Elements Registered
|
v
[ Browser ] -> Upgrades <page-element-sk> -> Fetches data using window.perf
newindex.ts, multiexplore.ts): The primary interfaces for data visualization. newindex hosts the main explore-sk component for deep-diving into individual traces, while multiexplore allows for side-by-side comparisons. These pages also include sidebar help for keyboard and mouse navigation (Zoom/Pan/Delta).alerts.ts, triage.ts, regressions.ts): These pages manage the lifecycle of performance anomalies. They wrap components that allow users to view configured alerts, triage new regressions, and track existing performance issues.clusters2.ts, dryrunalert.ts, playground.ts): Focused on the statistical side of Perf. The “Playground” is specifically designed for experimenting with anomaly detection algorithms on sample data without affecting production configurations.revisions.ts, favorites.ts, help.ts): Support pages that provide context. The help.html page is unique as it uses Go templates to dynamically iterate over and describe available query functions (.Funcs) directly from the backend documentation.The body.scss file provides a shared CSS baseline, resetting margins and paddings to ensure the scaffold occupies the full viewport. The BUILD.bazel file manages the distribution of static assets (like SVG icons for various platforms like Chrome, V8, and Fuchsia) to the /dist path, ensuring they are available for the UI regardless of which specific page is loaded.
The /res module is the core resource management hub for the application. It serves as the single source of truth for all non-programmatic assets, including user interface layouts, string constants, graphical elements, and configuration values.
The primary design philosophy of this module is the decoupling of UI definition from application logic. By externalizing these resources, the system achieves two critical architectural goals:
R Class): To bridge the gap between static files and executable code, the build system maps every resource in this directory to a unique integer ID. This allows logic files to reference resources via a typesafe “alias” (e.g., R.layout.main) rather than error-prone string paths.layout/, values/, drawable/). This design choice ensures that the resource compiler can optimize asset processing (like shrinking unused images or pre-compiling XML) and provides a predictable mental model for developers.values/, the project is architected for global deployment. Translating the application requires adding a qualified subdirectory (e.g., values-es/) rather than refactoring code.The /res module is partitioned into specialized subdirectories, each managing a specific aspect of the application's presentation layer:
/layout): Responsible for defining the structural arrangement of the UI. These files determine where components are placed and how they behave in relation to one another./drawable & /mipmap): Manage visual content. /drawable handles standard UI graphics (vectors, bitmaps, shapes), while /mipmap is specifically reserved for launcher icons to ensure they are available at the highest possible resolution regardless of the device's default density./values): A critical repository for primitive types and styles. It typically contains:strings.xml: All user-facing text.colors.xml: The application's color palette.styles.xml: Reusable UI attribute sets that ensure visual consistency./raw & /menu): /raw holds arbitrary files (like audio or JSON config) that are needed in their original format, while /menu defines the structure of navigation and context menus.The following diagram demonstrates how the application retrieves a resource at runtime, highlighting the “Config-Aware” selection process:
[ App Logic ] [ R.java / ID ] [ /res System ] [ Active Config ] | | | | | 1. Request Asset | | | |---(e.g. R.string.ok)-->| | | | | 2. Lookup ID | | | |----------------------->| | | | | 3. Query State | | | |----------------------->| | | | | 4. Match (e.g. Locale=FR) | | |<-----------------------| | 5. Return Value | | |<---("D'accord")--------| |
When a change is made to a resource, the build system automatically updates the reference mapping. This ensures that the application logic remains stable even as the visual or textual content of the /res module evolves.
The /res/img directory serves as the centralized repository for static image assets used across the application's user interface. Rather than being a mere storage bucket, this module is organized to ensure that brand identity elements and UI-specific graphics are decoupled from the application logic and styling code.
The primary design goal for this module is asset consistency. By centralizing images here, the project avoids duplication, simplifies path resolution within stylesheets and components, and ensures that updates to visual branding (such as logos or icons) propagate throughout the entire system from a single source of truth.
The architecture of this module favors a flat or shallow hierarchy to minimize path complexity in imports. The choice of file formats follows standard web optimization practices:
By isolating these assets into /res/img, the project implements a “Resource-Based” separation of concerns. This allows developers to reference assets via consistent aliases or relative paths, reducing the risk of broken links during refactoring of component or page structures.
The directory is categorized by the functional role of the images rather than just their file types:
favicon.ico): Specifically responsible for the application's presence outside the viewport, such as browser tabs, bookmark bars, and shortcut icons. The inclusion of favicon.ico in this directory ensures that the “brand at a glance” is managed alongside other visual resources.The following diagram illustrates how an asset moves from this module into the rendered application:
[ /res/img/ ] [ Styles/Components ] [ Client Browser ] | | | | 1. Asset Definition | | |---- logo.svg -------------------->| | | | 2. URL Reference | | | (e.g., background-image) | | |---------------------------->| | | | 3. Fetch & Render | | |<-- [ GET /res/img/logo.svg ]
When a visual change is required, the workflow focuses on replacing the file within /res/img while maintaining the filename. This allows the application to update its visual state without requiring changes to CSS selectors or JSX/HTML templates.
The samplevariance module provides a command-line tool designed to analyze the stability and noise levels of performance benchmarks. Specifically, it processes “nanobench” results stored in Google Cloud Storage (GCS), where each benchmark run typically contains multiple samples (e.g., 10 repetitions).
By calculating the ratio between the median and the minimum values of these samples, the tool helps engineers identify “flaky” or high-variance benchmarks that may yield inconsistent performance data.
The tool is built as a high-throughput data processing pipeline that fetches, parses, and analyzes large sets of JSON telemetry data.
The core metric used is the ratio of median to minimum.
The sampleInfo struct captures these metrics alongside the traceid, which uniquely identifies the specific benchmark configuration (e.g., test name, device, OS).
To handle thousands of JSON files efficiently, the tool employs a worker pool pattern using golang.org/x/sync/errgroup.
GCS Bucket ----> List Files ----> [Filename Channel] | +----------------------------+----------------------------+ | | | [Worker 1] [Worker 2] [Worker n] Download & Parse Download & Parse Download & Parse | | | +----------------------------+----------------------------+ | [Mutex Protected] | [Global Slice] ----> Sort ----> CSV Output
main.go)initialize(): Handles command-line flags and sets up GCS clients and output writers. It defaults to a rolling 24-hour window if no prefix is provided.filenamesFromBucketAndObjectPrefix(): Uses an attribute-selection query to fetch only the names of files, reducing metadata overhead.traceInfoFromFilename(): The core logic unit. It integrates with perf/go/ingest/parser to extract raw sample values and uses the go-moremath library for statistical calculations.writeCSV(): Formats the final report, supporting truncation via the --top flag to focus only on the most problematic benchmarks.The tool leverages the common go/query package. This allows users to pass complex filters via the --filter flag using a URL-query-like syntax (e.g., arch=arm64&config=8888). Only traces matching these key-value pairs are included in the variance analysis.
The module includes a Makefile that simplifies common operations, such as running the tool against specific GCS paths or piping the results directly to temporary files for quick inspection.
The /scripts module provides a collection of administrative and developer tools designed to manage the lifecycle of data within the Perf ecosystem. This includes seeding local environments for development, migrating production-scale data for safe experimentation, and manually triggering the ingestion pipeline.
The module serves three primary purposes:
A significant portion of this module is dedicated to moving data between Spanner instances. The implementation chooses PGAdapter over native Spanner SDKs for data movement. This decision allows the migration logic to treat Spanner as a PostgreSQL-compatible database, enabling the use of the COPY protocol via the pgx library. This is significantly more efficient than standard INSERT statements for bulk data, as it streams raw data directly into the database's ingestion buffer.
To handle the multi-terabyte scale of production tables like tracevalues, the implementation avoids direct time-based filtering on the values themselves, which would be prohibitively slow. Instead, it utilizes a JOIN on the sourcefiles table, filtering by the file's creation timestamp to isolate a manageable subset of data for migration.
The scripts incorporate safeguards to prevent destructive operations:
PARTITIONED_NON_ATOMIC DML mode on destination instances. This allows Spanner to handle massive bulk operations across multiple partitions without hitting transaction size limits.For local development, the module provides automated SQL seeding. Rather than relying on manual database entry, scripts use PostgreSQL Here Documents to inject complex JSON structures (like Alert configurations) into local databases. This ensures that developers can quickly replicate specific bug states or UI layouts with consistent data.
copy_data_to_experimental_db)This sub-module manages the complex flow of data from production to development environments. It coordinates the lifecycle of PGAdapter containers and the Go-based streaming logic.
[ Production Spanner ] [ Experimental Spanner ] | | | (Spanner Protocol) | (Spanner Protocol) v v [ PGAdapter :5432 ] [ PGAdapter :5433 ] | | +----[ copy_data (Go Binary) ]--+ | (PostgreSQL Protocol) | 1. Query source (via JOIN on sourcefiles) 2. Stream rows via CopyFrom interface 3. Apply to destination with Partitioned DML
add_demo_alert_to_demo_db.sh)This script automates the population of the Alerts and Subscriptions tables. It is designed to create a “ready-to-use” state for the Perf UI by:
Alert JSON payload that covers common regression detection parameters (e.g., stepfit algo, absolute step).EXTRACT(EPOCH FROM NOW()) to ensure time-sensitive fields are current, preventing immediate expiration of demo data.upload_extracted_json_files.sh)This utility bridges local performance test results to the cloud ingestion pipeline. It enforces a strict directory structure required by the Perf ingester:
gs://skia-perf/nano-json-v1/YYYY/MM/DD/HH.This module provides a utility for copying data from a production Google Cloud Spanner database to an experimental or development Spanner instance. It is designed to facilitate testing and debugging with real-world data volumes and distributions without risking the integrity of production environments.
The migration process leverages PGAdapter, a proxy that allows Cloud Spanner to be accessed via the PostgreSQL wire protocol. By running two instances of PGAdapter, the migration script can treat both the source (Production) and the destination (Experimental) as standard PostgreSQL databases, using the pgx library's efficient CopyFrom functionality to stream data between them.
Rather than using Spanner-specific SDKs for row-by-row manipulation, the module uses PGAdapter to expose Spanner through a PostgreSQL interface. This allows the use of the COPY protocol, which is significantly faster for bulk data movement than individual INSERT statements.
To prevent accidental corruption of production or existing experimental data:
run_two_spanners.sh script includes hardcoded checks to prevent known production instances from being targeted as the destination.copy_data.go checks if the destination table already contains data for the requested time range. If data exists, the script aborts for that table to avoid duplication.The tracevalues table in Perf is typically massive (multi-terabyte). Standard time-based filtering on this table is inefficient. To address this, the script implements a specific optimization:
JOIN with the sourcefiles table.createdat timestamp of the source file rather than the trace value itself, allowing for manageable subsets of data to be migrated based on ingestion time.The script sets SPANNER.AUTOCOMMIT_DML_MODE='PARTITIONED_NON_ATOMIC' on the destination connection. This is a Spanner-specific optimization for bulk operations that allows the database to execute changes across multiple partitions independently, avoiding the overhead of a single massive transaction that would exceed Spanner's mutation limits.
run_two_spanners.shThis bash script manages the environment setup. It launches two Docker containers running PGAdapter:
chrome_int).copy_data.goThe core logic of the migration. It is responsible for:
tableToColumns map which defines the schema for Perf tables (e.g., regressions2, commits, postings).pgx.CopyFromSource interface to pipe rows directly from the source query results into the destination's CopyFrom command.BUILD.bazelDefines the Go binary and library dependencies. Notably, it links to //perf/go/sql/spanner, ensuring that the script uses the same column definitions as the main Perf application.
The following diagram illustrates how data flows from the production Spanner instance to the experimental one through the proxy layer:
[ Production Spanner ] [ Experimental Spanner ] | | | (Spanner Protocol) | (Spanner Protocol) | | [ PGAdapter :5432 ] [ PGAdapter :5433 ] | | +-------< copy_data.go >-----+ (PostgreSQL Protocol) 1. Query source for recent data 2. Stream rows via CopyFrom interface 3. Insert into destination
The migration follows a three-step logic:
--duration. If --table-name all is used, the script iterates through all known tables in the tableToColumns map; otherwise, it targets a specific table.The /secrets module provides a set of automated tools for managing the sensitive credentials, service accounts, and OAuth tokens required by various Skia Perf components. It ensures that services—such as data ingestion, email alerting, and database backups—have the necessary permissions to interact with Google Cloud Platform (GCP) resources and external APIs securely.
The module is designed to handle three primary types of sensitive information:
skia-perf), Pub/Sub topics, and Cloud Trace.The module heavily relies on automated scripting to ensure reproducible and consistent permission sets. Most scripts (e.g., create-perf-ingest-sa.sh, create-perf-sa.sh) follow a standardized workflow:
roles/pubsub.editor, roles/cloudtrace.agent) required for a service to function.../bash/ramdisk.sh to perform sensitive operations in memory, ensuring that temporary secret files or JSON keys are never written to physical disk.The create-email-secrets.sh script manages the complex process of authorizing Perf to send emails. It bridges the gap between Google’s OAuth2 requirements and Kubernetes secrets:
client_secret.json (obtained from the GCP Console) and then executes a local Go tool (three_legged_flow) to generate a client_token.json.@ and . to -).The following diagram illustrates the lifecycle of a service account creation within this module:
Local Script Execution | v [ RAMDISK Creation ] --------> [ GCP IAM API ] | | | +-- Create Service Account | +-- Assign IAM Roles (PubSub, GCS, Trace) v +-- Bind Workload Identity (K8s <-> GCP) [ Generate JSON Key ] (Optional) | v [ kubectl create secret ] ----> [ Kubernetes Cluster ] | v [ RAMDISK Cleanup ]
create-email-secrets.sh): Handles OAuth2 token generation for Gmail integration. Specifically creates secrets for the alertserver.create-perf-ingest-sa.sh, create-perf-sa.sh): Configures permissions for the core Perf processes to read from GCS buckets (skia-perf, cluster-telemetry-perf) and write to Pub/Sub for data processing.create-flutter-perf-service-account.sh: Tailored permissions for the Flutter-specific Perf instance.create-perf-cockroachdb-backup-service-account.sh: Minimalist account with roles/storage.objectAdmin specifically for database backup cronjobs.The smoke_tests module provides a suite of high-level integration tests for the Perf application. These tests use Puppeteer to automate a headless (or headed) Chrome browser, simulating real user interactions to ensure that critical pages and components load correctly and function as expected in a live environment.
The primary goal of these tests is to verify the “health” of the system rather than exhaustive feature testing. They focus on:
Most tests interact with instances protected by Google Identity-Aware Proxy (IAP) or a local auth-proxy.
alerts_nodejs_test.ts and cluster_nodejs_test.ts use the google-auth-library to fetch an ID token. This token is injected into the Puppeteer page's extra HTTP headers, allowing the automated browser to bypass the IAP login screen.http://localhost:8003. The utils.ts file manages the PERF_BASE_URL and provides helper functions to apply standard test configurations (like cookies and logging).The module supports different execution modes depending on the developer's needs:
DEBUG_VIA_CRD environment variable is set, the tests can launch a browser visible through Chrome Remote Desktop. This introduces a startup delay to allow the developer to switch windows and watch the interaction.+------------------+ +-------------------+ +-----------------------+
| Bazel / Node | | Puppeteer/Utils | | Target Perf Instance |
+------------------+ +-------------------+ +-----------------------+
| 1. Launch Test | ----> | 2. Launch Browser | | |
| | | 3. Auth Injection | ----> | 4. Request Page |
| | | | | 5. Return HTML/JS |
| 7. Check Result | <---- | 6. Wait for Select| <---- | |
+------------------+ +-------------------+ +-----------------------+
| |
+---- (On Failure) -------+----> [ Take Screenshot & Log Errors ]
utils.ts)The utils.ts file centralizes common logic to keep individual test files clean:
applyPageDefaults: Attaches event listeners to the Puppeteer page to capture console, pageerror, and network failures. It also sets a puppeteer=true cookie, which signals the Perf frontend to disable non-deterministic behaviors like animations or simulated RPC latency.browserForSmokeTest: Abstractly handles the creation of the browser instance, switching between headless and debug modes based on environment variables.regression_page_nodejs_test.ts): These tests are more complex than simple load tests. They navigate to specific subscription views (e.g., V8 or Fuchsia) and use Promise.race to wait for either a populated anomaly table or a “clear” message. This handles the non-deterministic nature of production data where anomalies may or may not exist at any given time.perf-chrome-public-load-a_nodejs_test.ts verify that the primary routing endpoints (/a, /m, /e) render their main functional components (like #anomaly-table or #test-picker) within the 5-second budget.Tests are tagged as manual in the BUILD.bazel file, indicating they are typically run against a specific local or development instance rather than a hermetic build environment.
The Makefile provides shorthand commands for running these tests:
test-regressions: Runs the standard regression suite.test-regressions-crd: Runs the suite with settings optimized for debugging via Chrome Remote Desktop, streaming the output to the terminal.Developers can point the suite to any instance by overriding the PERF_BASE_URL environment variable during the Bazel invocation.