Module: /

Skia Perf Technical Documentation

1. High-Level Overview

Project Objectives: Skia Perf is a performance monitoring system designed to ingest, store, analyze, and visualize performance data for various projects, with a primary focus on Skia and related systems (e.g., Flutter, Android, Chrome). Its core objectives are:

  1. Centralized Performance Data Storage: Provide a robust and scalable repository for performance metrics collected from diverse benchmark runs.
  2. Interactive Data Exploration: Offer web-based dashboards that allow users to query, explore, and visualize performance trends over time, across different configurations and code revisions.
  3. Automated Regression Detection: Implement algorithms to automatically identify statistically significant performance regressions (and improvements) as new data is ingested.
  4. Alerting and Notification: Notify relevant stakeholders (developers, performance engineers) about detected regressions.
  5. Triage and Investigation Support: Provide tools to help users triage regressions, associate them with code changes, and track their resolution.
  6. Integration with Developer Workflows: Connect performance data with version control systems (Git) and issue trackers.

Functionality: Perf consists of several key components that work together:

  • Data Ingestion: Processes performance data files (typically JSON) uploaded to Google Cloud Storage (GCS). These files are parsed, validated, and their metrics are stored against corresponding Git commit hashes.
  • Data Storage: Uses SQL databases (primarily CockroachDB, with support for Spanner) to store metadata (commits, alert configurations, regression statuses) and specialized trace data.
  • Clustering & Regression Detection: Employs k-means clustering and step detection algorithms on time-series trace data to identify regressions or significant performance shifts. This can be run continuously or triggered by new data events.
  • Frontend UI: A web application (built with Go on the backend and TypeScript/Lit/Web Components on the frontend) that provides interactive dashboards for:
    • Plotting performance metrics over commit ranges.
    • Querying traces based on various parameters.
    • Configuring alerts.
    • Triaging detected regressions.
    • Viewing commit details and associated performance changes.
  • Alerting System: Allows users to define alerts based on specific queries and thresholds. When regressions matching these alerts are found, notifications can be sent (e.g., email, issue tracker integration).
  • Command-Line Tools: Provides perfserver (to run the different services) and perf-tool (for administrative tasks, data inspection, and database backups/restores).

2. Project-Specific Terminology

  • Trace: A single time series of performance measurements for a specific test under a specific configuration (e.g., memory usage for draw_a_circle on arch=x86,config=8888). Trace IDs are structured key-value strings like ,arch=x86,config=8888,test=draw_a_circle,units=ms,.
  • CommitNumber: An internal, monotonically increasing integer assigned by Perf to each Git commit as it's processed. This provides a linear sequence for ordering data.
  • Tile: A logical grouping of commits. Trace data is stored in relation to these tiles. The tile_size (number of commits per tile) is configurable and affects how data is sharded and queried.
  • ParamSet: A collection of unique parameter key-value pairs observed in the data within a certain commit range (or tile range). Used to populate UI query builders.
  • DataFrame: A tabular data structure, similar to R's dataframes or Pandas DataFrames, used on the backend and frontend. It holds trace values indexed by commit and trace ID, along with header information (commit details).
  • Cluster / Clustering: The process of grouping similar traces together using k-means clustering. This is a core part of regression detection, as a significant change in the centroid of a cluster can indicate a regression.
  • Regression (Statistic): A numerical value (StepSize / LeastSquaresError) calculated for a cluster‘s centroid after fitting a step function. It measures how much the centroid’s behavior resembles a step change. High absolute values are “Interesting.”
  • Alert (Configuration): A user-defined configuration that specifies a query to select traces, a detection algorithm, grouping parameters, and notification settings for finding regressions.
  • Ingestion Format: A specific JSON structure (documented in FORMAT.md) that Perf expects for input data files.
  • Shortcut: A saved URL or configuration, often represented by a short, hashed ID, for quickly accessing a specific view or set of traces.
  • Triage: The process of reviewing a detected regression, determining if it's a genuine issue or an expected change/noise, and marking it accordingly (e.g., “Bug,” “Ignore”).

3. Overall Architecture

Perf follows a services-oriented architecture, where the main perfserver executable can run in different modes (frontend, ingest, cluster, maintenance). Data flows from external benchmark systems into Perf, where it's processed, stored, analyzed, and finally presented to users.

Data Flow and Main Components:

External Benchmark Systems
        |
        V
[Data Files (JSON) in Perf Ingestion Format]
        | (Uploaded to Google Cloud Storage - GCS)
        V
GCS Bucket (e.g., gs://skia-perf/nano-json-v1/)
        | (Pub/Sub event on new file arrival)
        V
Perf Ingest Service(s) (`perfserver ingest` mode)
        |   - Parses JSON files (see /go/ingest/parser)
        |   - Validates data (see /go/ingest/format)
        |   - Associates data with Git commits (see /go/git)
        |   - Writes trace data to TraceStore (SQL, tiled) (see /go/tracestore)
        |   - Updates ParamSets (for UI query builders)
        |   - (Optionally) Emits Pub/Sub events for "Event Driven Alerting"
        V
SQL Database (CockroachDB / Spanner)
    |   - Trace Data (values, parameters, indexed by commit/tile)
    |   - Commit Information (hashes, timestamps, messages)
    |   - Alert Configurations
    |   - Regression Records (details of detected regressions, triage status)
    |   - Shortcuts, User Favorites, etc.
    |
    +<--> Perf Cluster Service(s) (`perfserver cluster` or `perfserver frontend --do_clustering` mode)
    |       - Loads Alert configurations
    |       - Queries TraceStore for relevant data
    |       - Performs clustering (k-means) (see /go/clustering2, /go/ctrace2)
    |       - Fits step functions to cluster centroids (see /go/stepfit)
    |       - Calculates Regression statistic
    |       - Stores "Interesting" clusters/regressions in the database
    |       - Sends notifications (email, issue tracker) (see /go/notify)
    |
    +<--> Perf Frontend Service (`perfserver frontend` mode)
    |       - Serves HTML, CSS, JS (see /pages, /modules)
    |       - Handles API requests from the UI (see /go/frontend, /API.md)
    |       - Queries database for trace data, alert configs, regressions
    |       - Formats data for UI display (often as DataFrames)
    |       - Manages user authentication (via X-WEBAUTH-USER header)
    |
    +<--> Perf Maintenance Service (`perfserver maintenance` mode)
            - Git repository synchronization
            - Database schema migrations (see /migrations)
            - Old data cleanup
            - Cache refreshing (e.g., ParamSet cache)

Rationale for Key Architectural Choices:

  • Decoupled Ingestion via GCS and Pub/Sub:
    • Why: This decouples data producers from Perf's internal processing. Producers only need to drop files in a GCS bucket. Pub/Sub provides a scalable and reliable way to notify ingesters about new files, allowing multiple ingester instances to pull work.
    • How: Ingesters subscribe to a Pub/Sub topic. GCS is configured to publish a message to this topic when a new file is finalized in the designated ingestion bucket/prefix.
  • SQL Database for Structured Data:
    • Why: SQL databases like CockroachDB and Spanner provide transactional consistency, scalability, and powerful querying capabilities needed for metadata, alert configurations, and regression tracking. CockroachDB offers PostgreSQL compatibility, which is widely used. Spanner provides horizontal scalability for very large datasets.
    • How: Go's database/sql package is used, with schema defined and managed by /go/sql and migration scripts in /migrations.
  • Specialized TraceStore:
    • Why: Performance trace data is time-series and can be voluminous. A generic relational model might not be optimal for the typical queries (fetching traces over commit ranges for specific parameter sets). The tiled approach with inverted indexes for parameters is designed for more efficient retrieval.
    • How: The TraceStore (/go/tracestore) implementation uses SQL tables but structures them to represent tiles of commits. ParamSets and Postings tables act as inverted indexes for fast lookup of traces matching specific key-value parameters.
  • Monolithic Executable (perfserver) with Modes:
    • Why: Simplifies deployment and reduces the number of distinct binaries to manage. A single executable can be configured to run as a frontend, an ingester, a clusterer, or a maintenance task.
    • How: perfserver uses command-line flags and subcommands to determine its operational mode. Configuration files (/configs/*.json) further dictate behavior within each mode.
  • K-Means Clustering for Regression Detection:
    • Why: K-means is a well-understood clustering algorithm suitable for grouping traces with similar performance characteristics. Changes in these groups over time can signal regressions. Traces are normalized before clustering to make them comparable despite different scales.
    • How: Implemented in /go/clustering2 and /go/kmeans. ctrace2 handles trace normalization.
  • Frontend/Backend Separation:
    • Why: Standard practice for web applications. Allows independent development and scaling of the UI and the backend logic.
    • How: Backend (Go) serves JSON APIs. Frontend (TypeScript/Lit) consumes these APIs to render interactive views.
  • Event-Driven Alerting (Optional):
    • Why: For very large and sparse datasets (like Android), continuous clustering over all alerts can be resource-intensive and slow. Event-driven alerting processes only the data relevant to recently updated traces, reducing latency and computational load.
    • How: Ingesters publish Pub/Sub events containing IDs of updated traces. Clusterers subscribe to these events and run relevant alert configurations only for the affected data.

4. Module Responsibilities and Key Components

This section focuses on significant modules beyond simple file/directory descriptions.

  • /go/config:

    • Responsibility: Defines and validates the structure for instance configuration files (InstanceConfig). This is the central place where all settings for a Perf deployment (database, ingestion sources, Git repo, UI features, notification settings) are specified.
    • Why: Configuration files allow Perf to be deployed for different projects with different data sources and requirements without code changes. A strongly-typed Go struct ensures that configurations are well-defined and can be validated.
    • How: InstanceConfig is a Go struct with fields for various aspects of the system. JSON files in /configs are unmarshaled into this struct. The module provides functions to load and validate these configurations.
  • /go/ingest:

    • Responsibility: Orchestrates the entire data ingestion pipeline. This includes watching for new files, parsing them according to the format.Format specification, extracting performance metrics and metadata, associating them with Git commits, and writing the data to the TraceStore.
    • Why: This module is the entry point for all performance data into the Perf system. It needs to be robust, handle various data formats (though primarily the standard JSON format), and ensure data integrity.
    • Key Sub-components:
    • ingest/format: Defines the expected structure of input JSON files (format.Format Go struct) and provides validation. This ensures data consistency.
    • ingest/parser: Contains logic to parse the format.Format structure and extract individual trace measurements and their associated parameters.
    • ingest/process: Coordinates the steps: reading from a source (e.g., GCS via /go/file), parsing, resolving commit information (via /go/git), and writing to the TraceStore.
    • Workflow:
    • A Source (e.g., GCSSource via PubSub) indicates a new file.
    • process reads the file.
    • parser and format validate and extract Results.
    • For each Result, its git_hash is resolved to a CommitNumber using /go/git.
    • Traces are constructed and written to /go/tracestore.
  • /go/tracestore:

    • Responsibility: Manages the storage and retrieval of performance trace data. This is a critical component for efficient querying.
    • Why: Trace data is time-series and multi-dimensional. The TraceStore is designed to efficiently retrieve trace values for specific parameter combinations over ranges of commits.
    • How: It uses a “tiled” storage approach. Commits are grouped into tiles.
    • TraceValues table: Stores the actual metric values, often sharded by tile.
    • ParamSets table: Stores unique key-value pairs found in trace identifiers within each tile.
    • Postings table: An inverted index mapping (tile, param_key, param_value) to a list of trace IDs that contain that key-value pair within that tile. This structure allows queries like “get all traces where config=8888 and arch=x86” to be resolved efficiently by intersecting posting lists. SQLTraceStore is the primary implementation using the SQL database.
  • /go/git:

    • Responsibility: Interacts with Git repositories to fetch commit information (hashes, authors, timestamps, messages). It also caches this information in the SQL database to avoid repeated Git operations.
    • Why: Perf needs to correlate performance data with specific code changes. This module provides the link between git_hash values in ingested data and Perf's internal CommitNumber sequence.
    • How: It can use either a local Git checkout (via git CLI) or a Gitiles service API. It maintains a Commits table in the SQL database, mapping commit hashes to CommitNumbers and storing other metadata. It periodically updates its local Git repository clone or queries Gitiles for new commits.
  • /go/regression:

    • Responsibility: Handles the detection, storage, and management of performance regressions.
    • Why: This is a core function of Perf. It provides the logic to identify when performance has changed significantly and to track the status of these findings.
    • How:
    • It uses clustering results (from /go/clustering2) and step-fit analysis (from /go/stepfit) to identify “Interesting” clusters.
    • Store interface (implemented by sqlregression2store): Persists information about detected regressions, including the cluster summary, owning alert, commit hash, regression statistic, and triage status (New, Ignore, Bug).
    • The “Alerting” algorithm described in DESIGN.md (comparing new interesting clusters with existing ones based on trace fingerprints) is implemented here to manage the lifecycle of a regression.
    • Key Workflow for Alerting/Regression Tracking: Run clustering (e.g., hourly or event-driven) | V Identify "Interesting" new clusters (high |Regression| score) | V For each new Interesting Cluster: Compare fingerprint (top N traces) with existing relevant Clusters in DB | +-- No match? --> New Regression: Store in DB with status "New". | +-- Match found? --> Update existing Regression if new one has better |Regression| score. Keep triage status of existing.
  • /go/frontend:

    • Responsibility: Implements the backend for the Perf web user interface. It handles HTTP requests, interacts with data stores (TraceStore, AlertStore, RegressionStore, etc.), processes data, and serves JSON responses to the frontend.
    • Why: This module connects the user‘s browser interactions to Perf’s data and analytical capabilities.
    • How: It uses Go's standard net/http package to define HTTP handlers for various API endpoints (e.g., fetching data for plots, listing alerts, updating triage statuses). It authenticates users based on the X-WEBAUTH-USER header. It often fetches data, converts it into DataFrame structures, and then serializes these to JSON for the frontend.
  • /modules (Frontend TypeScript):

    • Responsibility: Contains the TypeScript source code for all frontend custom elements (web components) and UI logic. These modules are compiled into JavaScript and CSS that run in the user's browser.
    • Why: This is where the user interface is built. Modularity (one component per file/directory) makes the frontend codebase manageable. Custom elements (often using Lit) provide encapsulation and reusability.
    • How: Each subdirectory typically defines one or more custom elements (e.g., plot-simple-sk, alert-config-sk, query-sk). These elements handle rendering, user interaction, and making API calls to the Go backend.
    • perf-scaffold-sk: Provides the main page layout (header, sidebar, content area).
    • explore-simple-sk / explore-sk: Core components for querying data and displaying plots.
    • json/index.ts: Contains TypeScript interfaces mirroring Go backend structs for type-safe API communication. This is crucial for ensuring frontend and backend data structures are compatible. It's often generated from Go source using /go/ts/ts.go.
  • /pages:

    • Responsibility: Defines the top-level HTML structure for each distinct page of the Perf application (e.g., alerts page, exploration page).
    • Why: These files serve as the entry points for specific views. They are kept minimal, primarily including the perf-scaffold-sk and the main page-specific custom element.
    • How: Each HTML file (e.g., alerts.html) includes the perf-scaffold-sk and the relevant page element (e.g., <alerts-page-sk>). An associated TypeScript file (e.g., alerts.ts) imports the necessary custom element definitions. Server-side Go templates inject initial context data (window.perf = {%.context %};) into the HTML.
  • DESIGN.md:

    • Significance: This document is the primary source for understanding the high-level architecture, design rationale, and core algorithms of Perf, particularly for clustering and alerting.
    • Key Concepts Explained:
    • Clustering: Details the use of k-means clustering on normalized traces, the Euclidean distance metric, and the calculation of the “Regression” statistic (StepSize / LeastSquaresError) to identify “Interesting” clusters.
    • Alerting Algorithm: Explains how Perf identifies and tracks unique regressions over time by fingerprinting clusters and comparing new interesting clusters to existing ones. It outlines the schema for the clusters table (though the actual schema is in /go/sql and may have evolved to Regressions table).
    • Event Driven Alerting: Describes an alternative to continuous clustering, triggered by PubSub events when new data arrives. This is beneficial for large, sparse datasets.
  • FORMAT.md:

    • Significance: Defines the precise JSON structure that Perf ingesters expect for input data files.
    • Key Elements: Specifies fields like git_hash, key (for global parameters), and results (an array of measurements). Each result can have its own key (for test-specific parameters like test name and units) and either a single measurement or a more complex measurements object for statistics (min, max, median). This document is crucial for data producers who need to integrate with Perf.
  • BUILD.bazel (Root):

    • Significance: Defines how the Perf application is built using Bazel. It specifies container images (perfserver, backendserver) that package the Go executables and necessary static resources (configs, frontend assets).
    • How: Uses skia_app_container rules to assemble Docker images. It copies the perfserver and perf-tool binaries, configuration files from /configs, and compiled frontend assets (HTML, JS, CSS from /pages built output) into the image. The entrypoint for the perfserver image is the perfserver executable itself.

5. Key Workflows Illustrated (Pseudographic Diagrams)

A. New Alert Creation via UI and API:

User (in Perf UI, e.g., on /alerts page)
  |
  | Fills out Alert configuration form (<alert-config-sk> element)
  | Clicks "Save"
  |
  V
Frontend JS (<alert-config-sk>)
  |
  | 1. If new alert, GET /_/alert/new
  |    (Server responds with a pre-populated Alert JSON with id: -1)
  |
  | 2. Modifies this Alert JSON based on form input
  |
  | 3. POST modified Alert JSON to /_/alert/update
  |    (Authorization: Bearer token if auth is enabled)
  |
  V
Perf Backend (`/go/frontend/service.go` - UpdateAlertHandler)
  |
  | Receives Alert JSON
  | If alert.ID == -1, it's a new alert.
  | Validates Alert configuration
  | Persists Alert to SQL Database (via `alerts.Store`)
  | Responds 200 OK
  |
  V
SQL Database (Alerts Table)
  |
  | New Alert record is created or existing one updated.

Rationale:

  • The GET /_/alert/new step is a convenience. It provides the frontend with a valid Alert structure, including any instance-default values, simplifying new alert creation logic on the client.
  • Using id: -1 to signify a new alert during the POST to /_/alert/update is a common pattern to allow a single endpoint to handle both creation and updates. The backend inspects the ID to determine the correct action.
  • The API interactions are documented in API.md.

B. Data Ingestion and Event-Driven Regression Detection:

Benchmark System
  |
  | Produces performance_data.json (Perf Ingestion Format)
  | Uploads to GCS: gs://[bucket]/[path]/YYYY/MM/DD/HH/performance_data.json
  |
  V
Google Cloud Storage
  |
  | File "OBJECT_FINALIZE" event
  | Publishes message to PubSub Topic (e.g., "perf-ingestion-topic")
  |
  V
Perf Ingest Service(s) (Subscribed to "perf-ingestion-topic")
  |
  | 1. Receives PubSub message (contains GCS file path)
  | 2. Downloads performance_data.json from GCS
  | 3. Parses JSON, validates data (see /go/ingest/format, /go/ingest/parser)
  | 4. Looks up git_hash in /go/git to get CommitNumber
  | 5. Writes trace data to TraceStore (SQL tables)
  | 6. If Event Driven Alerting enabled for this instance:
  |    Constructs a list of Trace IDs updated by this file
  |    Publishes message (containing gzipped Trace IDs) to another PubSub Topic (e.g., "trace-update-topic")
  |
  V
Perf Cluster Service(s) (Subscribed to "trace-update-topic")
  |
  | 1. Receives PubSub message (with updated Trace IDs)
  | 2. For each Alert Configuration (/go/alerts):
  |    If Alert's query matches any of the updated Trace IDs:
  |      Run clustering & regression detection for THIS Alert,
  |      focusing on the commit range and data relevant to the updated traces.
  |      (Reduces scope compared to full continuous clustering)
  | 3. If regressions found:
  |    Store in SQL Database (Regressions table)
  |    Send notifications (email, issue tracker)

Rationale:

  • GCS as Entry Point: As described in FORMAT.md and DESIGN.md, GCS is the standard way data enters Perf. The YYYY/MM/DD/HH path structure is a convention.
  • Pub/Sub for Decoupling and Scalability: Ingesters don't need to poll GCS. Pub/Sub handles event delivery, and multiple ingesters can process files in parallel.
  • Event-Driven Clustering Optimization: DESIGN.md explicitly states this is for large/sparse datasets. Sending only updated Trace IDs significantly narrows the scope of clustering for each event, making it faster and less resource-intensive than re-clustering everything. PubSub's 10MB message limit is considered for gzipped trace ID lists.

This documentation provides a comprehensive starting point for a software engineer to understand the Skia Perf project. It covers its purpose, architecture, core concepts, and the rationale behind key design and implementation choices, referencing existing documentation and source code structure where appropriate.

Module: /cockroachdb

The /cockroachdb module provides a set of shell scripts designed to facilitate interaction with a CockroachDB instance, specifically one named perf-cockroachdb, which is presumed to be running within a Kubernetes cluster. These scripts abstract away some of the complexities of kubectl commands, offering streamlined access for common database operations.

The primary motivation behind these scripts is to simplify development and administrative workflows. Instead of requiring users to remember and type lengthy kubectl commands with specific flags and resource names, these scripts provide convenient, single-command access points.

Key Components and Responsibilities:

  • admin.sh: This script focuses on providing access to the CockroachDB administrative web interface.

    • Why: The web UI is a crucial tool for monitoring database health, performance, and managing cluster settings. Direct access via kubectl port-forward can be cumbersome to set up repeatedly.
    • How: It executes kubectl port-forward to map the local port 8080 to the port 8080 of the perf-cockroachdb-0 pod. Crucially, it then immediately attempts to open this local address in Google Chrome, providing an instant user experience. This assumes Google Chrome is installed and available in the system's PATH. User runs admin.sh | V Script executes: kubectl port-forward perf-cockroachdb-0 8080 | V Local port 8080 now forwards to CockroachDB pod's port 8080 | V Script executes: google-chrome http://localhost:8080 | V CockroachDB Admin UI opens in Chrome
  • connect.sh: This script is designed to provide a SQL shell connection to the CockroachDB instance.

    • Why: Developers and administrators frequently need to execute SQL queries directly against the database for debugging, data manipulation, or schema inspection. Setting up an interactive kubectl run command with the correct image and arguments can be error-prone.
    • How: It uses kubectl run to create a temporary, interactive pod named androidx-cockroachdb. This pod uses the cockroachdb/cockroach:v19.2.5 Docker image. The --rm flag ensures the pod is deleted after the session ends, and --restart=Never prevents it from being restarted. The crucial part is the command passed to the pod: sql --insecure --host=perf-cockroachdb-public. This starts the CockroachDB SQL client, connecting insecurely to the database service exposed at perf-cockroachdb-public. User runs connect.sh | V Script executes: kubectl run androidx-cockroachdb -it --image=... --rm --restart=Never -- sql --insecure --host=perf-cockroachdb-public | V Temporary pod 'androidx-cockroachdb' is created | V CockroachDB SQL client starts inside the pod, connecting to 'perf-cockroachdb-public' | V User has an interactive SQL shell | V User exits shell -> Pod 'androidx-cockroachdb' is deleted
  • skia-infra-public-port-forward.sh: This script sets up a port forward for direct database connections, typically for use with a local CockroachDB SQL client or other database tools.

    • Why: While connect.sh provides an in-cluster SQL shell, sometimes a direct connection from the local machine is preferred, for instance, to use graphical SQL clients or specific client libraries that are not available within the temporary pod created by connect.sh. The perf-cockroachdb instance is likely within a private network in the Kubernetes cluster (namespace perf), and this script makes it accessible locally.
    • How: It leverages a helper script ../../kube/attach.sh skia-infra-public (the details of which are outside this module‘s scope but presumably handles Kubernetes context or authentication for the skia-infra-public cluster). This helper script is then used to execute kubectl port-forward specifically for the perf-cockroachdb-0 pod within the perf namespace. It maps local port 25000 to the pod’s CockroachDB port 26257. The script also helpfully prints instructions for the user on how to connect using the cockroach sql command once the port forward is active. The set -e command ensures the script exits immediately if any command fails, and set -x enables command tracing for debugging. User runs skia-infra-public-port-forward.sh | V Script prints connection instructions | V Script executes: ../../kube/attach.sh skia-infra-public kubectl port-forward -n perf perf-cockroachdb-0 25000:26257 | V Port forward is established: local:25000 -> perf-cockroachdb-0:26257 (in 'perf' namespace) | V User can now run 'cockroach sql --insecure --host=127.0.0.1:25000' in another terminal

These scripts collectively aim to make interacting with the perf-cockroachdb instance as straightforward as possible by encapsulating the necessary kubectl commands and providing context-specific instructions or actions. They rely on the Kubernetes cluster being correctly configured and accessible, and on kubectl and potentially google-chrome being available on the user's system.

Module: /configs

The /configs directory houses JSON configuration files for various instances of the Perf performance monitoring system. Each file defines the specific behavior and data sources for a particular Perf deployment. These configurations are crucial for tailoring Perf to different projects and environments, enabling developers and performance engineers to monitor and analyze performance data effectively.

The core idea is to provide a declarative way to set up a Perf instance. Instead of hardcoding settings, these JSON files act as blueprints. Each file serializes to and from a Go struct named config.InstanceConfig. This struct serves as the canonical schema for all instance configurations, and its Go documentation provides detailed explanations of each field. This approach ensures consistency and makes it easier to manage and evolve the configuration options.

Key Components and Responsibilities:

The primary responsibility of this module is to define and store these instance configurations. Each JSON file represents a distinct Perf instance, often corresponding to a specific project or a particular version of a project (e.g., a public vs. internal build, or a stable vs. experimental branch).

  • Instance-Specific Configuration Files (e.g., android2.json, chrome-public.json):

    • Why: Each project or system being monitored by Perf has unique requirements. These include where its performance data is stored (e.g., GCS buckets), how it's ingested (e.g., Pub/Sub topics), which Git repository tracks its code changes, how users authenticate, and how notifications for regressions are handled.
    • How: These files use a JSON structure that maps directly to the config.InstanceConfig Go struct.
    • URL: The public-facing URL of the Perf instance.
    • data_store_config: Defines the backend database (e.g., CockroachDB, Spanner), connection strings, and parameters like tile_size which can impact query performance and data retrieval efficiency. The choice between CockroachDB and Spanner often depends on scalability needs and existing infrastructure.
    • ingestion_config: Specifies how performance data is brought into Perf. This includes the source_type (e.g., gcs for Google Cloud Storage, dir for local directories), the specific sources (e.g., GCS bucket paths or local file paths), and Pub/Sub topics for real-time ingestion. This section is vital for connecting Perf to the data producers.
    • git_repo_config: Links Perf to the source code repository. This allows Perf to correlate performance data with specific code changes (commits). It includes the repository url, the provider (e.g., gitiles, git), and sometimes a commit_number_regex to extract meaningful commit identifiers from commit messages.
    • notify_config: Configures how alerts and notifications are sent when regressions are detected. This can range from none to html_email, markdown_issuetracker, or anomalygroup. It often includes templates for notification subjects and bodies, leveraging placeholders like {{ .Alert.DisplayName }} to include dynamic information.
    • auth_config: Defines the authentication mechanism, commonly using a header like X-WEBAUTH-USER for integration with existing authentication systems.
    • query_config: Customizes how users can query and view data, including which parameters are available for filtering (include_params), default selections, and URL value defaults to tailor the user experience. It can also include caching configurations (e.g., using Redis) to improve query performance by specifying cache_config with level1_cache_key and level2_cache_key.
    • anomaly_config: Contains settings related to anomaly detection, such as settling_time which defines how long Perf waits before considering new data for anomaly detection, helping to avoid flagging transient issues.
    • Other fields like contact, ga_measurement_id (for Google Analytics), feedback_url, trace_sample_proportion (to control the volume of detailed trace data collected), and favorites (for pre-defined links on the Perf UI) further customize the instance.
    • Example Workflow (Data Ingestion and Alerting for android2.json):
    • Data Production: Android benchmarks generate performance data.
    • Data Upload: This data is uploaded to GCS buckets specified in ingestion_config.source_config.sources (e.g., gs://android-perf-2/android2).
    • Pub/Sub Notification: A message is sent to the Pub/Sub topic perf-ingestion-android2-production.
    • Perf Ingestion Service: The Perf ingestion service, subscribed to this topic, reads the new data file from GCS.
    • Data Processing & Storage: Perf processes the data, associates it with the corresponding commit from the git_repo_config (e.g., https://android.googlesource.com/platform/superproject), and stores it in the CockroachDB instance defined in data_store_config.
    • Anomaly Detection: Perf's anomaly detection algorithms analyze the new data points.
    • Regression Found: If a regression is detected based on the anomaly_config.
    • Notification Sent: A notification is generated according to notify_config. For android2.json, this means an issue is filed in an issue tracker ("notifications": "markdown_issuetracker") with a subject and body formatted using the provided templates, including details like affected tests and devices.
  • local.json:

    • Why: Provides a standardized configuration for local development and manual testing of Perf. It's designed to be self-contained and not rely on external production services.
    • How: It typically points the ingestion_config to a local directory (integration/data) that contains sample data. This data is often the same data used for unit tests, ensuring consistency between testing environments. The database connection will also point to a local instance.
  • demo.json and demo_spanner.json:

    • Why: These configurations are likely used for demonstration purposes or for setting up small-scale, illustrative Perf instances. They showcase Perf's capabilities with sample data.
    • How: Similar to local.json, demo.json uses a local directory for data ingestion ("./demo/data/") and a local CockroachDB instance. demo_spanner.json is analogous but configured to use Spanner as the backend, demonstrating flexibility in data store choices. They often include simpler git_repo_config pointing to public demo repositories (e.g., https://github.com/skia-dev/perf-demo-repo.git). The favorites section in demo.json shows how to add curated links to the Perf UI.
  • /spanner subdirectory:

    • Why: This subdirectory groups configurations for Perf instances that specifically use Google Cloud Spanner as their backend data store. Spanner is chosen for its scalability, strong consistency, and global distribution capabilities, making it suitable for large-scale Perf deployments.
    • How: Files within this directory (e.g., spanner/chrome-public.json, spanner/skia-public.json) will have their data_store_config.datastore_type set to "spanner". They often include Spanner-specific settings or optimizations. For example, enable_follower_reads might be set to true in data_store_config for Spanner instances to distribute read load. Many of these configurations also define redis_config within their query_config.cache_config to further enhance query performance for frequently accessed data.
    • The optimize_sqltracestore flag, often set to true in Spanner configurations, indicates that specific optimizations for the SQL-based trace store are enabled, likely tailored to Spanner's characteristics.
    • Configurations like chrome-internal.json and chrome-public.json demonstrate sophisticated setups, including:
    • commit_number_regex in git_repo_config to extract structured commit positions.
    • temporal_config for integrating with Temporal workflows for tasks like regression grouping and bisection.
    • enable_sheriff_config to integrate with sheriffing systems for managing alerts.
    • trace_format: "chrome" indicates that the performance data adheres to the Chrome trace event format.

The choice of fields and their values within each JSON file reflects a series of design decisions aimed at balancing flexibility, performance, and operational manageability for each specific Perf instance. For instance, the tile_size in data_store_config is adjusted based on expected data characteristics and query patterns. Similarly, trace_sample_proportion is set to manage storage costs and processing load while still capturing enough data for meaningful analysis. The notify_config templates are crafted to provide actionable information to developers when regressions occur.

Module: /csv2days

csv2days Module Documentation

Overview

The csv2days module is a command-line utility designed to process CSV files downloaded from the Perf performance monitoring system. Its primary purpose is to simplify time-series data by consolidating multiple data points from the same calendar day into a single representative value. This is particularly useful when analyzing performance trends over longer periods, where daily granularity is sufficient and finer-grained timestamps can introduce noise or unnecessary complexity.

The core problem this module solves is the overabundance of data points when Perf exports data at a high temporal resolution (e.g., multiple commits per day). For certain types of analysis, this level of detail is not required and can make it harder to discern broader trends. csv2days transforms such CSVs by keeping only the first encountered data column for each unique day and aggregating subsequent values from the same day into that single column using a “max” aggregation strategy.

Design and Implementation

The module operates as a streaming processor. It reads the input CSV file row by row, processes the header to determine which columns to modify or drop, and then transforms each subsequent data row accordingly before writing it to standard output.

Key Design Choices:

  1. Command-Line Interface: The tool is designed as a simple command-line application for ease of integration into scripting workflows. It takes an input file path via the --in flag and outputs the transformed CSV to stdout. This follows common Unix philosophies for tool interoperability.
  2. Streaming Processing: Instead of loading the entire CSV into memory, which could be problematic for very large files, csv2days processes the file line by line. This makes the tool memory-efficient.
  3. Date-Based Grouping: The core logic revolves around identifying columns that represent timestamps. It uses a regular expression (datetime) to match RFC3339 formatted dates in the header row. The date part (YYYY-MM-DD) of these timestamps is used for grouping.
  4. “First Seen” Column for a Day: For each unique calendar day encountered in the header, only the first column corresponding to that day is retained in the output header. Subsequent columns from the same day are marked for removal.
  5. “Max” Aggregation: When multiple columns from the same day are encountered in a data row, the values from these columns are aggregated. The csv2days tool currently implements a “max” aggregation strategy: for the set of values corresponding to a single day, the maximum numerical value is chosen. If non-numerical values are encountered, the first value in the sequence is typically used.
  6. Reverse Sorted Index Removal: When removing columns, the indices of columns to be skipped (skipCols) are sorted in reverse order. This is crucial because removing an element from a slice shifts the indices of subsequent elements. Processing removals from right-to-left (largest index to smallest) ensures that the indices remain valid throughout the removal process.

Workflow:

The main workflow within transformCSV can be visualized as follows:

Read Input CSV File (--in flag)
        |
        v
Parse Header Row
        |
        +----------------------------------------------------------------------+
        | Identify Timestamp Columns (using RFC3339 regex)                     |
        | For each timestamp:                                                  |
        |   Extract Date (YYYY-MM-DD)                                          |
        |   If new date:                                                       |
        |     Add Date to Output Header                                        |
        |     Record current column as start of a new "run" for this day       |
        |   Else (same date as previous timestamp):                            |
        |     Mark current column for skipping (`skipCols`)                    |
        |     Increment length of current day's "run" (`runLengths`)           |
        |                                                                      |
        | Non-timestamp columns are added to Output Header as-is               |
        +----------------------------------------------------------------------+
        |
        v
Write Transformed Header to Output
        |
        v
Sort `skipCols` in Reverse Order
        |
        v
For each Data Row in Input CSV:
        |
        +----------------------------------------------------------------------+
        | Apply "Max" Aggregation:                                             |
        |   For each "run" of columns belonging to the same day (from header): |
        |     Find the maximum numerical value in the corresponding cells      |
        |     Replace the first cell of the run with this max value            |
        +----------------------------------------------------------------------+
        |
        v
Remove Skipped Columns (based on `skipCols` from header processing)
        |
        v
Write Transformed Data Row to Output
        |
        v
Flush Output Buffer

Key Components/Files

  • main.go: This is the heart of the module.
    • main() function: Handles command-line flag parsing (--in for the input CSV file). It orchestrates the reading of the input file and calls transformCSV to perform the core logic. Error handling and logging are also managed here.
    • transformCSV(input io.Reader, output io.Writer) error: This is the core function responsible for the CSV transformation.
    • It initializes csv.Reader for input and csv.Writer for output.
    • Header Processing: It reads the first line (header) of the CSV. It iterates through the header cells.
      • A regular expression (datetime = regexp.MustCompile(...)) is used to identify columns containing RFC3339 timestamps.
      • It maintains lastDate to detect when a new day starts in the header sequence.
      • skipCols (a slice of integers) stores the indices of columns that represent subsequent entries for an already seen day and should thus be removed from the data rows.
      • runLengths (a map of int to int) stores, for each column that starts a sequence of same-day entries, how many columns belong to that day. This is used later for aggregation. For example, if columns 5, 6, and 7 are all for “2023-01-15”, runLengths[5] would be 3.
      • The output header (outHeader) is constructed by keeping the date part (YYYY-MM-DD) for the first occurrence of each day and omitting subsequent columns for the same day. Non-date columns are passed through unchanged.
    • Data Row Processing: It then reads the rest of the CSV file row by row.
      • applyMaxToRuns(s []string, runLengths map[int]int) []string: For each “run” of columns identified in the header as belonging to the same day, this function takes the corresponding values from the current data row and replaces the value in the first column of that run with the maximum of those values. The max(s []string) string helper function is used here to find the maximum float value, falling back to the first string if parsing fails.
      • removeAllIndexesFromSlices(s []string, skipCols []int) []string: After aggregation, this function removes the data cells corresponding to the skipCols identified during header processing. It uses removeValueFromSliceAtIndex repeatedly. It's crucial that skipCols is sorted in reverse order for this to work correctly.
    • The transformed row is then written to the output CSV.
    • Helper Functions:
    • removeValueFromSliceAtIndex(s []string, index int) []string: A utility to remove an element at a specific index from a string slice.
    • max(s []string) string: Iterates through a slice of strings, attempts to parse them as floats, and returns the string representation of the maximum float found. If no floats are found or parsing errors occur, it defaults to returning the first string in the input slice. This function underpins the aggregation logic.
  • main_test.go: Contains unit tests for the transformCSV function.
    • TestTransformCSV_HappyPath: Provides a simple input CSV string and the expected output string. It then calls transformCSV with these and asserts that the actual output matches the expected output. This serves as a concrete example of the module's behavior.
  • BUILD.bazel: Defines how the csv2days Go binary and its associated library and tests are built using Bazel. It specifies source files, dependencies (like skerr, sklog, util), and visibility.

The design decision to use strconv.ParseFloat and handle potential errors by continuing or defaulting implies that the tool is somewhat lenient with non-numeric data in columns expected to be numeric. The “max” operation will effectively ignore non-convertible strings unless all strings in a run are non-convertible, in which case the first string is chosen.

Module: /demo

The demo module provides the necessary data and tools to showcase the capabilities of the Perf performance monitoring system. Its primary purpose is to offer a tangible and reproducible example of how Perf ingests and processes performance data. This allows users and developers to understand Perf's functionality without needing to set up a complex real-world data pipeline.

The core of this module revolves around a set of pre-generated data files and a Go program to create them.

Key Components:

  • /demo/data/ (Directory): This directory houses the actual demo data files in JSON format. Each file represents performance measurements associated with a specific commit hash.

    • Why: These static files serve as the input for a ‘dir’ type ingester in a demo Perf instance. They are structured according to the format.Format specification (defined in perf/go/ingest/format), which Perf understands. This allows for a simple and direct way to feed data into Perf for demonstration purposes.
    • How: Each JSON file (e.g., demo_data_commit_1.json) contains a git_hash, key (identifying the test environment like architecture and configuration), and results. The results section includes measurements for various tests (like “encode” and “decode”) across different units (like “ms” and “kb”). Some files also include links which can point to external resources relevant to the data point or the overall commit. The data in these files is designed to show some variation over commits to demonstrate Perf's ability to track changes and detect regressions/improvements. For instance, the decode measurement and encodeMemory show a deliberate shift in values starting from demo_data_commit_6.json.
  • generate_data.go: This Go program is responsible for creating the JSON data files located in the /demo/data/ directory.

    • Why: While the static data files are sufficient for running the demo, this program provides the means to regenerate or modify the demo dataset. This is crucial if the demo requirements change, if new Perf features need to be showcased with different data patterns, or if the underlying format.Format evolves. It ensures the demo data remains relevant and can be adapted.
    • How:
    • It defines a list of Git commit hashes. These hashes are specifically chosen from the skia-dev/perf-demo-repo repository, establishing a direct link between the performance data and a version control history, a common scenario in real-world Perf usage.
    • It iterates through these hashes. For each hash: _ It programmatically generates performance values (e.g., encode, decode, encodeMemory). The generation includes some randomness (rand.Float32()) to make the data appear more realistic. _ A deliberate change in the data generation logic is introduced for commits at index 5 and onwards (e.g., multiplier = 1.2), which leads to a noticeable shift in decode and encodeMemory values in the corresponding JSON files. This is done to demonstrate how Perf can track and visualize such changes. _ It populates a format.Format struct (from go.skia.org/infra/perf/go/ingest/format) with the generated data, including the Git hash, environment keys, and the measurement results. _ The format.Format struct is then marshaled into JSON with indentation for readability. * Finally, the JSON data is written to a file named according to the commit sequence (e.g., demo_data_commit_1.json) within the data subdirectory. The program uses the runtime.Caller(0) function to determine its own location, ensuring that the data directory is created relative to the Go file itself, making the script more portable.

Workflow for Demo Data Usage:

generate_data.go --(generates)--> /demo/data/*.json files
                                         |
                                         V
Perf Ingester (type 'dir', configured to read from /demo/data/)
                                         |
                                         V
Perf System (stores, analyzes, and visualizes the data)

The demo data is specifically designed to be used in conjunction with the perf/configs/demo.json configuration file and the https://github.com/skia-dev/perf-demo-repo.git repository. This linkage provides a complete, albeit simplified, end-to-end scenario for demonstrating Perf.

Module: /go

Module: /go

This main module, located at /go, serves as the root for all Go language components of the Perf performance monitoring system. It encompasses a wide array of functionalities, from data ingestion and storage to analysis, alerting, and user interface backend logic. The design promotes modularity, with specific responsibilities delegated to sub-modules.

The system is designed to handle large volumes of performance data, track it against code revisions, detect regressions automatically, and provide tools for developers and performance engineers to investigate and manage performance.

Key Design Philosophies and Architectural Choices:

  1. Modularity: The system is broken down into numerous sub-modules (e.g., /go/alerts, /go/ingest, /go/regression, /go/frontend), each with a well-defined responsibility. This promotes separation of concerns, making the system easier to develop, test, and maintain.
  2. Interface-Based Design: Many modules define interfaces for their core components (e.g., tracestore.Store, alerts.Store, regression.Store). This allows for different implementations to be swapped in (e.g., SQL-based stores vs. in-memory mocks for testing) and promotes loose coupling.
  3. Configuration-Driven Behavior: The /go/config module defines a comprehensive InstanceConfig structure, which is loaded from a JSON file. This configuration dictates many aspects of an instance's behavior, including database connections, data sources, alert settings, and UI features. This allows for flexible deployment and customization of Perf instances.
  4. Asynchronous Processing and Workflows: For long-running tasks like data ingestion, regression detection, and bisection, the system leverages asynchronous processing.
    • Go routines are widely used for concurrent operations.
    • The /go/progress module provides a mechanism for tracking and reporting the status of such tasks to the UI.
    • The /go/workflows module utilizes Temporal to orchestrate complex, multi-step processes like triggering bisections and processing their results. Temporal provides resilience and fault tolerance for these critical operations.
  5. Data Storage and Retrieval:
    • SQL Database: A relational database (primarily targeting CockroachDB, with Spanner compatibility) is the main persistence layer for most structured data, including alert configurations (/go/alerts), regression details (/go/regression), commit information (/go/git), user favorites (/go/favorites), subscriptions (/go/subscription), and more. The /go/sql module manages the database schema.
    • Trace Data (/go/tracestore): Performance trace data is stored in a tiled fashion, with inverted indexes to allow for efficient querying. This specialized storage approach is optimized for time-series performance metrics.
    • File Storage (GCS): Raw ingested data files and potentially other large artifacts are often stored in Google Cloud Storage. The /go/file and /go/filestore modules provide abstractions for interacting with these files.
  6. Caching: Various caching strategies are employed to improve performance:
    • In-memory LRU caches for frequently accessed data (e.g., in /go/git, /go/progress).
    • A dedicated /go/tracecache for trace IDs.
    • The /go/psrefresh module manages caching of ParamSets (used for UI query builders), potentially using Redis (/go/redis).
    • /go/graphsshortcut offers an in-memory cache for graph shortcuts, especially for development.
  7. External Service Integration:
    • Git: The /go/git module interacts with Git repositories (via local CLI or Gitiles API) to fetch commit information.
    • Issue Trackers: Modules like /go/issuetracker and /go/culprit integrate with issue tracking systems (e.g., Buganizer) for automated bug filing.
    • Chrome Perf: The /go/chromeperf module allows communication with the Chrome Performance Dashboard for reporting regressions or fetching anomaly data.
    • Pinpoint: The /go/pinpoint module provides a client for the Pinpoint bisection service.
    • LUCI Config: The /go/sheriffconfig module integrates with LUCI Config for managing alert configurations.
  8. Command-Line Tools:
    • /go/perfserver: The main executable for running different Perf services (frontend, ingestion, clustering, maintenance).
    • /go/perf-tool: A CLI for various administrative and data inspection tasks.
    • /go/initdemo: A tool to initialize a database for demo or development.
    • /go/ts: A utility to generate TypeScript definitions from Go structs for frontend type safety.

Core Workflows (Conceptual High-Level):

  1. Data Ingestion:

    External Data Source (e.g., GCS event)
        |
        V
    /go/file (Source Interface: DirSource, GCSSource) --> Raw File Data
        |
        V
    /go/ingest/process (Orchestrator)
        |
        +--> /go/ingest/parser (Parses file based on /go/ingest/format) --> Extracted Traces & Metadata
        |
        +--> /go/git (Resolves Git hash to CommitNumber)
        |
        V
    /go/tracestore (Writes traces, updates inverted index & ParamSets)
        |
        V
    /go/ingestevents (Publishes event: "File Ingested")
    
  2. Regression Detection (Event-Driven Example):

    /go/ingestevents (Receives "File Ingested" event)
        |
        V
    /go/regression/continuous (Controller)
        |
        +--> /go/alerts (Loads matching Alert configurations)
        |
        +--> /go/dfiter & /go/dataframe & /go/dfbuilder (Prepare DataFrames for analysis)
        |
        V
    /go/regression/detector (Core detection logic)
        |
        +--> /go/clustering2 (KMeans clustering)
        |
        +--> /go/stepfit (Individual trace step detection)
        |
        V
    Detected Regressions
        |
        +--> /go/regression (Store results using Store interface, e.g., sqlregression2store)
        |
        +--> /go/notify (Format & send notifications via Email, IssueTracker, Chromeperf)
        |
        +--> /go/workflows (MaybeTriggerBisectionWorkflow for potential bisection)
    
  3. User Interaction (Frontend Request for Graph):

    User in Browser (Requests graph)
        |
        V
    /go/frontend (HTTP Handlers, e.g., graphApi)
        |
        +--> /go/ui/frame (ProcessFrameRequest)
        |    |
        |    +--> /go/dataframe/dfbuilder (Builds DataFrame based on query)
        |    |      |
        |    |      +--> /go/tracestore (Fetch trace data)
        |    |      +--> /go/git (Fetch commit data)
        |    |
        |    +--> /go/calc (If formulas are used)
        |    |
        |    +--> /go/pivot (If pivot table requested)
        |    |
        |    +--> /go/anomalies (Fetch anomaly data to overlay)
        |
        V
    FrameResponse (JSON data for UI) --> User in Browser
    
  4. Automated Bisection via Temporal Workflow: /go/workflows.MaybeTriggerBisectionWorkflow (Triggered by significant regression) | +--> Waits for related anomalies to group | +--> /go/anomalygroup (Loads anomaly group details) | +--> If GroupAction == BISECT: | | | +--> /go/gerrit (Activity: Get commit hashes from positions) | | | +--> Executes Pinpoint.CulpritFinderWorkflow (Child Workflow) | | (Pinpoint performs bisection) | V | Pinpoint calls back to /go/workflows.ProcessCulpritWorkflow | | | +--> /go/culprit (Activity: Persist culprit & Notify user) | +--> If GroupAction == REPORT: | +--> /go/culprit (Activity: Notify user of anomaly group)

Sub-Module Summaries (Illustrative, not exhaustive):

  • /go/alertfilter: Constants for alert filtering modes (e.g., ALL, OWNER). Ensures consistent filter definitions.
  • /go/alerts: Manages Alert configurations, their storage (sqlalertstore), and efficient retrieval (ConfigProvider with caching). Defines how performance regressions are detected.
  • /go/anomalies: Retrieves anomaly data, often by proxying to Chrome Perf, with a caching layer to improve performance.
  • /go/anomalygroup: Groups related anomalies to consolidate actions like bug filing or bisection. Uses a gRPC service and SQL store.
  • /go/backend: A gRPC backend service for internal, non-UI-facing APIs, promoting stable interfaces.
  • /go/builders: Centralized factory for creating core components (data stores, Git client) based on instance configuration, preventing cyclic dependencies.
  • /go/bug: Generates URLs for reporting bugs to issue trackers using configurable URI templates.
  • /go/calc: (Not detailed in provided docs, but generally) Evaluates formulas on trace data.
  • /go/chromeperf: Client for interacting with the Chrome Performance Dashboard API (reporting regressions, fetching anomalies).
  • /go/clustering2: Implements k-means clustering for grouping similar performance traces.
  • /go/config: Defines and validates the InstanceConfig structure (loaded from JSON) that governs a Perf instance.
  • /go/ctrace2: Adapts trace data (normalization, handling missing points) for use with k-means clustering.
  • /go/culprit: Manages identified culprits (commits causing regressions), their storage, and notification.
  • /go/dataframe: Provides the DataFrame structure for handling performance data in a tabular, commit-centric way, inspired by R's dataframes.
  • /go/dfbuilder: Constructs DataFrame objects from TraceStore, handling query logic and data aggregation.
  • /go/dfiter: Iterates over DataFrames, typically by slicing a larger fetched frame. Used in regression detection.
  • /go/dryrun: Allows testing alert configurations without creating actual alerts, simulating regression detection.
  • /go/favorites: Manages user-saved favorite configurations/views, stored in SQL.
  • /go/file: Defines File and Source interfaces for abstracting file access from different origins (local, GCS via Pub/Sub).
  • /go/filestore: Implements fs.FS for local and GCS file access, providing a unified way to read files.
  • /go/frontend: Backend for the Perf web UI, handling HTTP requests, rendering templates, and interacting with data stores.
  • /go/git: Abstraction for Git repository interaction, caching commit data in SQL, with providers for local CLI and Gitiles.
  • /go/graphsshortcut: Manages shortcuts for collections of graph configurations, using hashed IDs for de-duplication.
  • /go/ingest: Orchestrates the data ingestion pipeline: reading files, parsing formats, and writing to TraceStore.
  • /go/ingestevents: Defines and handles serialization/deserialization of ingestion completion events for PubSub.
  • /go/initdemo: CLI tool to initialize a database (CockroachDB or Spanner emulator) with the Perf schema.
  • /go/issuetracker: Provides an interface and implementation for interacting with Google Issue Tracker (Buganizer).
  • /go/kmeans: Generic k-means clustering algorithm implementation using interfaces for flexibility.
  • /go/maintenance: Runs background tasks like Git repo sync, regression migration, query cache refresh, and old data deletion.
  • /go/notify: Framework for formatting and sending notifications (email, issue tracker) about regressions.
  • /go/notifytypes: Defines constants for different notification mechanisms and data providers.
  • /go/perf-tool: CLI for Perf administration (config management, data inspection, database maintenance).
  • /go/perfclient: Client for pushing performance data (typically from trybots) to Perf's GCS ingestion endpoint.
  • /go/perfresults: Fetches, parses, and processes performance results from Telemetry benchmarks (Chromium).
  • /go/perfserver: Main executable for Perf, consolidating frontend, ingestion, clustering, and maintenance services.
  • /go/pinpoint: Client for the Pinpoint (Chromeperf) bisection service.
  • /go/pivot: Aggregates and summarizes trace data within a DataFrame based on specified grouping criteria (like pivot tables).
  • /go/progress: Tracks the progress of long-running backend tasks and exposes it to the UI via HTTP polling.
  • /go/psrefresh: Manages and caches paramtools.ParamSet instances (used for UI query builders) to improve performance.
  • /go/redis: Manages interaction with Redis for caching, primarily to support the query UI.
  • /go/regression: Core module for detecting, storing, and managing performance regressions.
  • /go/samplestats: Performs statistical analysis on sets of performance data to identify significant changes between “before” and “after” states.
  • /go/sheriffconfig: Manages Sheriff Configurations (alerting rules defined in Protobuf, stored in LUCI Config), importing them into Perf.
  • /go/shortcut: Manages shortcuts for lists of trace keys, using hashed IDs.
  • /go/sql: Central module for SQL database schema management (definition, generation, validation, migration).
  • /go/stepfit: Analyzes time-series data to detect significant changes (“steps”) using various statistical algorithms.
  • /go/subscription: Manages alerting subscriptions, defining how to react to anomalies (e.g., bug filing details).
  • /go/tracecache: Caches trace identifiers for specific tiles and queries to improve performance.
  • /go/tracefilter: Filters trace data based on hierarchical paths, identifying “leaf” traces.
  • /go/tracesetbuilder: Efficiently constructs TraceSet and ReadOnlyParamSet objects from multiple, potentially disparate chunks of trace data using a worker pool.
  • /go/tracestore: Defines interfaces and SQL implementations for storing and retrieving performance trace data, using a tiled storage approach.
  • /go/tracing: Initializes and configures distributed tracing capabilities using OpenCensus.
  • /go/trybot: Manages performance data from trybots (pre-submit tests), including ingestion, storage, and analysis.
  • /go/ts: Utility to generate TypeScript definition files from Go structs for frontend type safety.
  • /go/types: Defines core data types used throughout Perf (e.g., CommitNumber, TileNumber, Trace).
  • /go/ui: Handles frontend requests and prepares data for display, bridging UI interactions with backend data sources.
  • /go/urlprovider: Generates URLs for various pages within the Perf application consistently.
  • /go/userissue: Manages associations between specific data points (trace key + commit position) and Buganizer issues.
  • /go/workflows: Defines and implements Temporal workflows for automating tasks like bisection triggering and culprit processing.

This comprehensive suite of modules works together to provide the Skia Perf performance monitoring system.

Module: /go/alertfilter

This module, go/alertfilter, provides constants that define different filtering modes for alerts. These constants are used throughout the Perf application to control which alerts are displayed or processed.

The primary motivation behind this module is to centralize the definition of alert filtering options. By having these constants in a dedicated module, we avoid scattering magic strings like “ALL” or “OWNER” throughout the codebase. This improves maintainability, reduces the risk of typos, and makes it easier to understand and modify the filtering logic. If new filtering modes are needed in the future, they can be added here, providing a single source of truth.

Key Components/Files:

  • alertfilter.go: This is the sole file in this module. It defines the string constants used for alert filtering.
    • ALL: This constant represents a filter that includes all alerts, irrespective of their owner or other properties. It is used when a user or a system process needs to view or operate on the entire set of active alerts.
    • OWNER: This constant represents a filter that includes only alerts assigned to a specific owner. This is crucial for user-specific views where individuals only want to see alerts relevant to their responsibilities.

Workflow/Usage Example:

Imagine a user interface for viewing alerts. The user might have a dropdown to select how they want to filter the alerts.

User Interface:
  [Alert List]
  Filter: [Dropdown: "ALL", "OWNER"]

Backend Logic:
  func GetAlerts(filterMode string, userID string) []Alert {
    if filterMode == alertfilter.ALL {
      // Fetch all alerts from the database.
      return database.GetAllAlerts()
    } else if filterMode == alertfilter.OWNER {
      // Fetch alerts owned by the current user.
      return database.GetAlertsByOwner(userID)
    }
    // ... other filter modes or error handling
  }

In this scenario, the backend uses the constants from the alertfilter module to determine the correct query to execute against the database. This ensures consistency and clarity in how filtering is applied.

Module: /go/alerts

The /go/alerts module is responsible for managing alert configurations within the Perf application. These configurations define the conditions under which users or systems should be notified about performance regressions. The module handles the definition, storage, retrieval, and caching of these alert configurations.

A core design principle is the separation of concerns between defining an alert's structure (config.go), providing access to these configurations (configprovider.go), and persisting them (store.go and its SQL implementation in sqlalertstore). This modularity allows for flexibility in how alerts are stored (e.g., potentially different database backends) and accessed.

Key Components and Responsibilities:

  • config.go: This file defines the Alert struct, which is the central data structure representing a single alert configuration.

    • Why: It encapsulates all the parameters necessary to define an alert, such as the query to select relevant performance traces, the notification destination (email or issue tracker), thresholds for triggering, clustering algorithms, and the desired action (e.g., report, bisect).
    • How: The Alert struct includes fields for:
    • IDAsString: A string representation of the alert's unique identifier. This is used for JSON serialization to avoid potential issues with large integer handling in JavaScript. The BadAlertID and BadAlertIDAsAsString constants represent an invalid/uninitialized ID.
    • Query: A URL-encoded string that defines the criteria for selecting traces from the performance data.
    • GroupBy: A comma-separated list of parameter keys. If specified, the Query is expanded into multiple sub-queries, one for each unique combination of values for the GroupBy keys found in the data. This allows for more granular alerting. The GroupCombinations and QueriesFromParamset methods handle this expansion.
    • Alert: The email address for notifications.
    • IssueTrackerComponent: The ID of the issue tracker component to file bugs against. A custom SerializesToString type is used for this field to handle JSON serialization of the int64 component ID as a string, with 0 serializing to "".
    • DirectionAsString: Specifies whether to alert on upward (UP), downward (DOWN), or both (BOTH) changes in performance. This replaces the deprecated StepUpOnly boolean.
    • StateAsString: Indicates if the alert is ACTIVE or DELETED. This is managed internally and affects whether an alert is processed.
    • Action: Defines what action to take when an anomaly is detected (e.g., types.AlertActionReport, types.AlertActionBisect).
    • Other fields like Interesting, Algo, Step, Radius, K, Sparse, MinimumNum, Category control the specifics of regression detection and reporting.
    • The file also defines enums like Direction and ConfigState and helper functions for ID conversion and validation (Validate). The Validate function ensures consistency, for example, that GroupBy keys do not also appear in the main Query.
  • store.go: This file defines the Store interface, which abstracts the persistence mechanism for Alert configurations.

    • Why: Decoupling the alert logic from the specific storage implementation (e.g., SQL, Datastore) makes the system more adaptable and testable.
    • How: The Store interface specifies methods for:
    • Save: Saving a new or updating an existing alert. It takes a SaveRequest which includes the Alert configuration and an optional SubKey (linking the alert to a subscription).
    • ReplaceAll: Atomically replacing all existing alerts with a new set. This is useful for bulk updates, often tied to configuration subscriptions. It requires a pgx.Tx to ensure transactional integrity.
    • Delete: Marking an alert as deleted.
    • List: Retrieving alerts, with an option to include deleted ones. Alerts are typically sorted by DisplayName.
    • ListForSubscription: Retrieving all active alerts associated with a specific subscription name.
  • configprovider.go: This file implements a ConfigProvider that serves Alert configurations, incorporating a caching layer.

    • Why: To provide efficient and responsive access to alert configurations, especially in a high-traffic system. Repeatedly fetching from the underlying Store for every request would be inefficient.
    • How:
    • configProviderImpl implements the ConfigProvider interface.
    • It maintains two internal caches (cache_active for active alerts and cache_all for all alerts including deleted ones) using the configCache struct.
    • Upon initialization (NewConfigProvider), it performs an initial refresh and starts a background goroutine that periodically calls Refresh to update the caches from the Store.
    • GetAllAlertConfigs and GetAlertConfig serve data from these caches.
    • A sync.RWMutex is used to protect concurrent access to the caches.
    • The Refresh method explicitly fetches data from the alertStore and updates both caches.
    • The refresh interval is configurable.
  • Submodule sqlalertstore: This submodule provides a SQL-based implementation of the alerts.Store interface.

    • sqlalertstore.go:
    • Why: To persist alert configurations in a relational database (specifically CockroachDB, with Spanner compatibility).
    • How: The SQLAlertStore struct holds a database connection pool (pool.Pool) and a map of SQL statements.
      • Alerts are stored as JSON strings in the Alerts table (schema defined in sqlalertstore/schema/schema.go). This simplifies schema evolution of the Alert struct itself, as changes to the struct don't always require immediate SQL schema migrations, though it makes querying based on specific alert fields harder directly in SQL.
      • Save: For new alerts (ID is BadAlertIDAsAsString), it performs an INSERT and retrieves the generated ID. For existing alerts, it performs an UPSERT (or an INSERT ... ON CONFLICT DO UPDATE for Spanner).
      • Delete: Marks an alert as deleted by setting its config_state to 1 (representing alerts.DELETED) and updates last_modified. It doesn't physically remove the row.
      • ReplaceAll: Within a transaction, it first marks all existing active alerts as deleted, then inserts the new set of alerts.
      • List and ListForSubscription: Query the Alerts table, deserialize the JSON alert column into alerts.Alert structs, and sort them by DisplayName.
    • spanner.go: Contains Spanner-specific SQL statements. This is necessary because CockroachDB and Spanner have slightly different SQL syntax for certain operations like UPSERTs and RETURNING clauses. The correct set of statements is chosen in sqlalertstore.New based on the dbType.
    • sqlalertstore/schema/schema.go: Defines the Go struct AlertSchema representing the Alerts table in the SQL database. Key fields include id, alert (TEXT, storing the JSON serialized alerts.Alert), config_state (INT), last_modified (INT, Unix timestamp), sub_name, and sub_revision.

Key Workflows:

  1. Creating/Updating an Alert:

    • User/System constructs an alerts.Alert struct.
    • alerts.Store.Save() is called.
    • If SQL-backed:
      • sqlalertstore.Save() serializes the Alert to JSON.
      • If IDAsString is BadAlertIDAsAsString, an INSERT statement is executed, and the new ID is populated back into the Alert struct.
      • Otherwise, an UPSERT or INSERT ... ON CONFLICT DO UPDATE statement is executed.
    • The ConfigProvider's cache will eventually be updated during its next refresh cycle.
    [Client/Service] -- Alert Data --> [alerts.Store.Save()]
                                            |
                                            v
                              [sqlalertstore.Save()] -- Serializes Alert to JSON --> [Database]
                                            | (If new, DB returns ID)
                                            <---------------------------------------
                                            | (Updates Alert struct with ID)
                                            v
    [ConfigProvider.Refresh() periodically] --> [alerts.Store.List()]
                                                        |
                                                        v
                                          [sqlalertstore.List()] --> [Database]
                                                        | (Reads & deserializes)
                                                        v
                                              [ConfigProvider Cache Update]
    
  2. Retrieving All Active Alerts:

    • A service requests alert configurations via alerts.ConfigProvider.GetAllAlertConfigs(ctx, false).
    • configProviderImpl.GetAllAlertConfigs() checks its cache_active.
    • If the cache is up-to-date (within refresh interval), it returns the cached []*Alert.
    • If the cache needs refresh (or it's the first call), the background refresher (or an explicit Refresh call) would have populated it by:
      • Calling alerts.Store.List(ctx, false).
      • Which in turn calls sqlalertstore.List(ctx, false).
      • sqlalertstore queries the database for alerts where config_state = 0 (ACTIVE), deserializes them, and returns the list.
    [Service] -- Request All Active Alerts --> [ConfigProvider.GetAllAlertConfigs(includeDeleted=false)]
                                                    | (Checks cache_active)
                                                    |
                                                    +-- [Cache Hit] ----> Returns cached []*Alert
                                                    |
                                                    +-- [Cache Miss/Stale (via periodic Refresh)]
                                                        |
                                                        v
                                         [alerts.Store.List(includeDeleted=false)]
                                                        |
                                                        v
                                     [sqlalertstore.List(includeDeleted=false)] -- SQL Query (WHERE config_state=0) --> [Database]
                                                        | (Reads & deserializes)
                                                        v
                                              [Updates & Returns from Cache]
    
  3. Expanding GroupBy Queries:

    • When an alert with a GroupBy clause is processed (e.g., by the regression detection system), Alert.QueriesFromParamset(paramset) is called.
    • Alert.GroupCombinations(paramset) is invoked to find all unique combinations of values for the keys specified in GroupBy from the provided paramtools.ReadOnlyParamSet.
    • For each combination, a new query string is generated by taking the original Alert.Query and appending the key-value pairs from the combination.
    • This results in a list of specific queries to be executed against the trace data.
    [Alert Processing System] -- Has Alert with GroupBy="config,arch", Query="metric=latency" & ParamSet --> [Alert.QueriesFromParamset()]
                                                                                                                |
                                                                                                                v
                                                                                                  [Alert.GroupCombinations()]
                                                                                                                | (e.g., finds {config:A, arch:X}, {config:B, arch:X})
                                                                                                                v
                                                                                                  [Generates specific queries:]
                                                                                                     - "metric=latency&config=A&arch=X"
                                                                                                     - "metric=latency&config=B&arch=X"
                                                                                                                |
                                                                                                                <-- Returns []string (list of queries)
    

The use of SerializesToString for IssueTrackerComponent highlights a common challenge when interfacing Go backend systems with JavaScript frontends: JavaScript's limitations with handling large integer IDs. Serializing them as strings is a robust workaround.

The existence of a mock subdirectory with generated mocks for Store and ConfigProvider (using stretchr/testify/mock) is standard Go practice, facilitating unit testing of components that depend on these interfaces without needing a real database or complex setup.

Module: /go/anomalies

The /go/anomalies module is responsible for retrieving anomaly data. Anomalies represent significant deviations in performance metrics. This module acts as an intermediary between the application and the chromeperf service, which is the source of truth for anomaly data. It provides an abstraction layer, potentially including caching, to optimize anomaly retrieval.

Key Components and Responsibilities

1. anomalies.go:

  • Purpose: Defines the Store interface. This interface dictates the contract for any component that aims to provide anomaly data. It ensures that different implementations (e.g., a cached store or a direct passthrough store) can be used interchangeably.
  • Why: Separating the interface from the implementation promotes loose coupling and testability. It allows for different strategies for fetching anomalies without changing the consuming code.
  • Key Methods:
    • GetAnomalies: Retrieves anomalies for a list of trace names within a specific commit position range. This is useful for analyzing performance regressions or improvements tied to code changes.
    • GetAnomaliesInTimeRange: Fetches anomalies within a given time window. This is helpful for time-based analysis, independent of specific commit versions.
    • GetAnomaliesAroundRevision: Finds anomalies that occurred near a particular revision (commit). This helps pinpoint performance changes related to a specific code submission.

2. impl.go:

  • Purpose: Provides a basic, non-caching implementation of the Store interface. It directly forwards requests to the chromeperf.AnomalyApiClient.
  • Why: This serves as a foundational implementation. It's simple and directly reflects the capabilities of the underlying chromeperf service. It can be used when caching is not desired or not yet implemented.
  • How: Each method in the store struct (the implementation of Store) makes a corresponding call to the ChromePerf client. For example, GetAnomalies calls ChromePerf.GetAnomalies. Error handling is included to log failures from the chromeperf service. Trace names are sorted before being passed to chromeperf which might be a requirement or an optimization for the chromeperf API.

3. /go/anomalies/cache/cache.go:

  • Purpose: Implements a caching layer for the Store interface. This is designed to improve performance by reducing the number of direct calls to the chromeperf service, which can be network-intensive.
  • Why: Repeatedly fetching the same anomaly data can be inefficient. A cache stores frequently accessed or recent anomalies locally, leading to faster response times and reduced load on the chromeperf service.
  • How:
    • LRU Cache: Uses two Least Recently Used (LRU) caches: testsCache for anomalies queried by trace names and commit ranges, and revisionCache for anomalies queried around a specific revision. LRU ensures that the least accessed items are evicted when the cache reaches its cacheSize limit.
    • Cache Invalidation:
    • TTL (Time-To-Live): Cache entries have a cacheItemTTL. A periodic cleanupCache goroutine removes entries older than this TTL. This ensures that stale data doesn't persist indefinitely.
    • invalidationMap: This map tracks trace names for which anomalies have been modified (e.g., an alert was updated). If a trace name is in this map, any cached anomalies for that trace are considered invalid and will be re-fetched from chromeperf.
      • The invalidationMap itself is cleared periodically (invalidationCleanupPeriod) to prevent it from growing too large. This is a trade-off: it's simpler and has lower memory overhead but can lead to inaccuracies if a trace is invalidated and then the map is cleared before the next fetch for that trace.
    • Metrics: Tracks the numEntriesInCache to monitor cache utilization.
  • Key Methods (store struct in cache.go):
    • GetAnomalies:
    • Attempts to retrieve anomalies from testsCache.
    • Checks the invalidationMap. If a trace is marked invalid, it's treated as a cache miss.
    • For any cache misses or invalidated traces, it fetches the data from as.ChromePerf.GetAnomalies.
    • Populates the testsCache with newly fetched data. Client Request (traceNames, startCommit, endCommit) | v [Cache Store] -- GetAnomalies() | +---------------------------------+ | For each traceName: | | 1. Check testsCache | ----> Cache Hit? -----> Add to Result | (Key: trace:start:end) | | | 2. Check invalidationMap | No (Cache Miss or Invalidated) +---------------------------------+ | | (traceNamesMissingFromCache) | v | [ChromePerf Client] -- GetAnomalies() -----------+ | v [Cache Store] -- Add new data to testsCache | v Return Combined Result
    • GetAnomaliesInTimeRange: This method currently bypasses the cache and directly calls as.ChromePerf.GetAnomaliesTimeBased. The decision to not cache time-based queries might be due to the potentially large and less frequently reused nature of such requests, or it might be a feature planned for later.
    • GetAnomaliesAroundRevision: Similar to GetAnomalies, it first checks revisionCache. If it's a miss, it fetches from as.ChromePerf.GetAnomaliesAroundRevision and updates the cache.
    • InvalidateTestsCacheForTraceName: Adds a traceName to the invalidationMap. This is likely called when an external event (e.g., user updating an anomaly in Chrome Perf) indicates that the cached data for this trace is no longer accurate.

4. /go/anomalies/mock/Store.go:

  • Purpose: Provides a mock implementation of the Store interface, generated using the testify/mock library.
  • Why: Essential for unit testing. It allows other components that depend on the anomalies.Store to be tested in isolation, without needing a real chromeperf instance or a fully functional cache. Developers can define expected calls and return values for the mock store.
  • How: It‘s an auto-generated file. The mock.Mock struct from stretchr/testify is embedded, providing methods like On(), Return(), and AssertExpectations() to control and verify the mock’s behavior during tests.

Design Decisions and Rationale

  • Interface-based Design (anomalies.Store): This is a common and robust pattern in Go. It allows for flexibility in how anomalies are fetched and managed. For example, a new caching strategy or a different backend data source could be implemented without affecting code that consumes anomalies, as long as the new implementation adheres to the Store interface.
  • Caching Strategy (cache.go):
    • LRU: A good general-purpose caching algorithm when memory is limited and recent/frequently accessed items are more likely to be requested again.
    • TTL for Cache Items: Prevents indefinitely storing stale data.
    • invalidationMap: A pragmatic approach to handling external data modifications. While not perfectly accurate (invalidates all anomalies for a trace even if only one changed, and susceptible to the invalidationCleanupPeriod timing), it's simpler and less memory-intensive than more granular invalidation schemes. This suggests a balance was struck between accuracy, complexity, and resource usage.
    • Separate Caches (testsCache, revisionCache): Likely done because the query patterns and cache keys for these two types of requests are different. testsCache uses a composite key (traceName:startCommit:endCommit), while revisionCache uses the revision number as the key.
  • Error Handling: The implementations generally log errors from chromeperf but often return an empty AnomalyMap or nil slice to the caller in case of an error from the underlying service. This design choice means that callers might receive no data instead of an error, simplifying the caller's error handling logic but potentially obscuring issues if not monitored through logs.
  • Sorting Trace Names: Before calling chromeperf.GetAnomalies or chromeperf.GetAnomaliesTimeBased, the list of traceNames is sorted. This could be a requirement of the chromeperf API for deterministic behavior, or an optimization to improve chromeperf's internal processing or caching.
  • Tracing (go.opencensus.io/trace): Spans are added to some methods (GetAnomaliesInTimeRange, GetAnomaliesAroundRevision). This is crucial for observability, allowing developers to track the performance and flow of requests through the system, especially in a distributed environment.

Workflows

Typical Anomaly Retrieval (with Cache):

  1. A service needs anomalies (e.g., for displaying on a dashboard).
  2. It calls one of the GetAnomalies* methods on an anomalies.Store instance (which is likely the cached store from cache.go).
  3. Cache Check:
    • The cached store first checks its internal LRU cache(s) (testsCache or revisionCache) for the requested data.
    • For GetAnomalies, it also consults the invalidationMap to see if any relevant traces have been marked as stale.
  4. Cache Hit: If valid data is found in the cache, it's returned directly. Caller -> anomalies.Store.GetAnomalies(traces, range) | v Cache.GetAnomalies() | +--> Check testsCache (e.g., trace1:100:200) -> Found & Valid | +--> Check testsCache (e.g., trace2:100:200) -> Not Found or Invalid | Return cached data for trace1
  5. Cache Miss / Stale Data: If data is not in the cache or is marked stale: - The cached store makes a network request to the chromeperf.AnomalyApiClient. - The response from chromeperf is received. - This new data is added to the LRU cache for future requests. - The data is returned to the caller. Caller -> anomalies.Store.GetAnomalies(traces, range) | v Cache.GetAnomalies() | +--> Check testsCache (e.g., trace1:100:200) -> Found & Valid | +--> Check testsCache (e.g., trace2:100:200) -> Not Found or Invalid | | | (Data for trace1) v +---------------------------> [ ChromePerf API ] -- GetAnomalies(trace2, range) | v Cache.Add(trace2_data) | v Combine trace1_data & trace2_data | v Return to Caller

Cache Invalidation Workflow:

  1. An external event occurs (e.g., a user triages an anomaly in the Chrome Perf UI, which modifies its state).
  2. A mechanism (not detailed within this module, but implied) detects this change.
  3. This mechanism calls cache.store.InvalidateTestsCacheForTraceName(ctx, "affected_trace_name").
  4. The affected_trace_name is added to the invalidationMap in the cache.store.
  5. Next GetAnomalies call for affected_trace_name:
    • Even if testsCache contains an entry for this trace and range, the presence of affected_trace_name in invalidationMap will cause a cache miss.
    • Data will be re-fetched from chromeperf.
    • The invalidationMap entry for affected_trace_name typically remains until the invalidationMap is periodically cleared.

This module effectively decouples the rest of the Perf application from the direct complexities of interacting with chromeperf for anomaly data, offering performance benefits through caching and a consistent interface for data retrieval.

Module: /go/anomalygroup

The anomalygroup module is designed to group related anomalies (regressions in performance metrics) together. This grouping allows for consolidated actions like filing a single bug report for multiple related regressions or triggering a single bisection job to find the common culprit for a set of anomalies. This approach aims to reduce noise and improve the efficiency of triaging performance regressions.

The core idea is to identify anomalies that share common characteristics, such as the subscription (alert configuration), benchmark, and commit range. When a new anomaly is detected, the system attempts to find an existing group that matches these criteria. If a suitable group is found, the new anomaly is added to it. Otherwise, a new group is created.

The module defines a gRPC service for managing anomaly groups, a storage interface for persisting group data, and utilities for processing regressions and interacting with the grouping logic.

Key Components and Responsibilities

store.go: Anomaly Group Storage Interface

The store.go file defines the Store interface, which outlines the contract for persisting and retrieving anomaly group data. This abstraction allows for different storage backends (e.g., SQL databases) to be used.

Key Responsibilities:

  • Creating new anomaly groups: When a new anomaly doesn't fit into an existing group, a new group record needs to be created. This involves storing metadata about the group, such as the subscription details, benchmark, initial commit range, and the action to be taken (e.g., REPORT, BISECT).
  • Loading anomaly groups: Retrieving group information by its unique ID is essential for processing and taking actions on the group.
  • Finding existing groups: This is a crucial part of the grouping logic. When a new anomaly is detected, the store is queried to find existing groups that match criteria like subscription, revision, domain (master), benchmark, commit range, and action type.
  • Updating anomaly groups: Groups are dynamic. As new anomalies are added, or as actions are taken (e.g., bisection started, bug filed), the group record needs to be updated. This includes:
    • Adding new anomaly IDs to the group.
    • Adding culprit commit IDs once a bisection identifies them.
    • Storing the ID of a bisection job associated with the group.
    • Storing the ID of a reported issue (bug) associated with the group.

The Store interface ensures that the core logic for anomaly grouping is decoupled from the specific implementation of data persistence.

sqlanomalygroupstore/sqlanomalygroupstore.go: SQL-backed Anomaly Group Store

This file provides a concrete implementation of the Store interface using a SQL database (specifically designed with CockroachDB and Spanner in mind).

Implementation Details:

  • Schema: The SQL schema for anomaly groups is defined in sqlanomalygroupstore/schema/schema.go. It includes fields for the group ID, creation time, list of anomaly IDs, metadata (stored as JSONB), common commit range, action type, and associated IDs for bisections, issues, and culprits.
  • Database Operations:
    • Create: Inserts a new row into the AnomalyGroups table. It takes parameters like subscription details, benchmark, commit range, and action, and stores them. The group metadata (subscription name, revision, domain, benchmark) is marshaled into a JSON string before insertion.
    • LoadById: Selects an anomaly group from the database based on its ID. It retrieves core attributes of the group.
    • UpdateBisectID, UpdateReportedIssueID, AddAnomalyID, AddCulpritIDs: These methods execute SQL UPDATE statements to modify specific fields of an existing anomaly group record. They handle array appends for lists like anomaly_ids and culprit_ids, with specific syntax considerations for different SQL databases (e.g., Spanner's COALESCE for array concatenation).
    • FindExistingGroup: Constructs a SQL SELECT query with WHERE clauses to match the provided criteria (subscription, revision, domain, benchmark, commit range overlap, and action). This allows finding groups that a new anomaly might belong to.

Design Choices:

  • UUIDs for IDs: Using UUIDs for group IDs, anomaly IDs, and culprit IDs ensures global uniqueness.
  • JSONB for Metadata: Storing group_meta_data as JSONB provides flexibility in the metadata stored without requiring schema changes for minor additions.
  • Array Columns: Storing anomaly_ids and culprit_ids as array types in the database is a natural way to represent lists of associated entities.
  • Database Type Abstraction: While targeting SQL, there are minor conditional logic snippets (e.g., for array appending in Spanner vs. CockroachDB) to handle database-specific syntax, indicated by dbType checks.

service/service.go: gRPC Service Implementation

This file implements the AnomalyGroupServiceServer interface defined by the protobuf definitions in proto/v1/anomalygroup_service.proto. It acts as the entry point for external systems to interact with the anomaly grouping functionality.

Responsibilities:

  • Exposing Store Operations via gRPC: The service methods largely delegate to the corresponding methods of the anomalygroup.Store interface. For example, CreateNewAnomalyGroup calls anomalygroupStore.Create.
  • Handling gRPC Requests and Responses: It translates incoming gRPC requests into calls to the store and formats the store's output into gRPC responses.
  • FindTopAnomalies Logic: This method involves more than a simple store passthrough.
    1. It loads the specified anomaly group.
    2. It retrieves all regressions (anomalies) associated with that group using the regression.Store.
    3. It sorts these regressions based on the percentage change in their median values (from median_before to median_after).
    4. It formats the top N regressions (or all if N is not specified or is too large) into the ag.Anomaly protobuf message format, extracting relevant paramset values (bot, benchmark, story, measurement, stat).
  • FindIssuesFromCulprits Logic:
    1. Loads the specified anomaly group.
    2. Retrieves the culprit IDs associated with the group.
    3. Uses the culprit.Store to get the details of these culprits.
    4. For each culprit, it checks its GroupIssueMap to find any issue IDs that are specifically associated with the given anomaly group ID. This allows correlation between a group (potentially containing multiple anomalies that led to a bisection) and the issues filed for the culprits found by that bisection.

Design Choices:

  • Dependency Injection: The service takes instances of anomalygroup.Store, culprit.Store, and regression.Store as dependencies, promoting testability and decoupling.
  • Metric Collection: It increments a counter (newGroupCounter) whenever a new group is created, allowing for monitoring of the system's behavior.

proto/v1/anomalygroup_service.proto: Protocol Buffer Definitions

This file defines the gRPC service AnomalyGroupService and the message types used for requests and responses. This is the contract for how clients interact with the anomaly grouping system.

Key Messages:

  • AnomalyGroup: Represents a group of anomalies, including its ID, the action to take, lists of associated anomaly and culprit IDs, reported issue ID, and metadata like subscription and benchmark names.
  • Anomaly: Represents a single regression, including its start and end commit positions, a paramset (key-value pairs describing the test), improvement direction, and median values before and after the regression.
  • GroupActionType: An enum defining the possible actions for a group (NOACTION, REPORT, BISECT).
  • Request/Response Messages: Specific messages for each RPC method (e.g., CreateNewAnomalyGroupRequest, FindExistingGroupsResponse).

Purpose:

  • Defines a clear, language-agnostic API for the service.
  • Ensures type safety and structured data exchange.

notifier/anomalygroupnotifier.go: Anomaly Group Notifier

This component implements the notify.Notifier interface. It's invoked when a new regression is detected by the alerting system. Its primary role is to integrate the regression detection with the anomaly grouping logic.

Workflow when RegressionFound is called:

  1. Receive details of a newly detected regression (commit information, alert configuration, cluster summary, trace data, regression ID).
  2. Extract the paramset from the trace data.
  3. Validate the paramset to ensure it contains required keys (e.g., master, bot, benchmark, test, subtest_1). This is important because the grouping and subsequent actions (like bisection) rely on these parameters.
  4. Determine the testPath from the paramset. This path is used in finding or creating anomaly groups.
  5. Call grouper.ProcessRegressionInGroup (which eventually calls utils.ProcessRegression) to handle the grouping logic for this new regression.

Design Choices:

  • Interface Implementation: Adheres to the notify.Notifier interface, allowing it to be plugged into the existing notification pipeline of the performance monitoring system.
  • Delegation to AnomalyGrouper: It delegates the core grouping logic to an AnomalyGrouper instance (typically utils.AnomalyGrouperImpl). This keeps the notifier focused on the integration aspect.
  • Handling of Summary Traces: It explicitly ignores regressions found on summary-level traces (traces representing an aggregation of multiple specific tests), as anomaly grouping is typically more meaningful for specific test cases.

utils/anomalygrouputils.go: Anomaly Grouping Utilities

This file contains the core logic for processing a new regression and integrating it into an anomaly group.

ProcessRegression Function - Key Steps:

  1. Synchronization: Uses a sync.Mutex (groupingMutex). This is a critical point: it aims to prevent race conditions when multiple regressions are processed concurrently, especially around creating new groups. However, the comment notes that with multiple containers, this mutex might not be sufficient and needs review.
  2. Client Initialization: Creates an AnomalyGroupServiceClient to communicate with the gRPC service.
  3. Find Existing Group: Calls the FindExistingGroups gRPC method to see if the new anomaly fits into any current groups based on subscription, revision, action type, commit range overlap, and test path.
  4. Group Creation or Update: - If no existing group is found: - Calls CreateNewAnomalyGroup to create a new group. - Calls UpdateAnomalyGroup to add the current anomalyID to this newly created group. - Triggers a Temporal Workflow: Initiates a MaybeTriggerBisection workflow. This workflow is responsible for deciding whether to start a bisection or file a bug based on the group‘s action type and other conditions. Regression Detected --> FindExistingGroups | +-- No Group Found --> CreateNewAnomalyGroup --> UpdateAnomalyGroup (add anomaly) --> Start Temporal Workflow (MaybeTriggerBisection) - If existing group(s) are found: - For each matching group: - Calls UpdateAnomalyGroup to add the current anomalyID to that group. - Calls FindIssuesToUpdate to determine if any existing bug reports (either the group’s own ReportedIssueId or issues linked via culprits) should be updated with information about this new anomaly. - If issues are found, it uses the issuetracker to add a comment to each relevant issue. Regression Detected --> FindExistingGroups | +-- Group(s) Found --> For each group: | +-- UpdateAnomalyGroup (add anomaly) +-- FindIssuesToUpdate --> If issues exist --> Add Comment to Issue(s)
  5. Return Group ID(s): Returns a comma-separated string of group IDs the anomaly was associated with.

FindIssuesToUpdate Function:

This helper determines which existing issue tracker IDs should be updated with information about a new anomaly being added to a group.

  • If the group_action is REPORT and reported_issue_id is set on the group, that issue ID is returned.
  • If the group_action is BISECT, it calls the FindIssuesFromCulprits gRPC method. This method looks up culprits associated with the group and then checks if those culprits have specific issues filed for them in the context of this particular group. This is important because a single culprit (commit) might be associated with multiple anomaly groups, and each might have its own context or bug report.

Design Choices:

  • Centralized Grouping Logic: This package encapsulates the decision-making process of whether to create a new group or add to an existing one.
  • Temporal Workflow Integration: Offloads the decision and execution of bisection or bug filing to a Temporal workflow. This makes the process asynchronous and more resilient.
  • Issue Tracker Interaction: Directly interacts with the issue tracker to update existing bugs, keeping them relevant as new, related anomalies are found.

Mocking Strategy

The module extensively uses mocks for testing:

  • mocks/Store.go: A mock implementation of the anomalygroup.Store interface, generated by testify/mock. Used in service/service_test.go.
  • proto/v1/mocks/AnomalyGroupServiceServer.go: A mock for the gRPC server interface AnomalyGroupServiceServer, generated by testify/mock (with manual adjustments noted in the file). Used by clients or other services that might call this gRPC service.
  • utils/mocks/AnomalyGrouper.go: A mock for the AnomalyGrouper interface, used in notifier/anomalygroupnotifier_test.go.

This approach allows for unit testing components in isolation by providing controlled behavior for their dependencies.

Overall Workflow Example (Simplified)

  1. Anomaly Detection: Perf system detects a new regression (anomaly).
  2. Notification: AnomalyGroupNotifier.RegressionFound is called.
  3. Preprocessing: The notifier extracts paramset, validates it, and derives testPath.
  4. Grouping Logic (utils.ProcessRegression):
    • The system queries AnomalyGroupService.FindExistingGroups using the anomaly's properties (subscription, commit range, test path, action type).
    • Scenario A: No existing group:
      • AnomalyGroupService.CreateNewAnomalyGroup is called.
      • The new anomaly ID is added to this group via AnomalyGroupService.UpdateAnomalyGroup.
      • A Temporal workflow (MaybeTriggerBisection) is started for this new group.
    • Scenario B: Existing group(s) found:
      • The new anomaly ID is added to each matching group via AnomalyGroupService.UpdateAnomalyGroup.
      • utils.FindIssuesToUpdate is called for each group.
      • If the group's action is REPORT and it has a ReportedIssueId, that issue is updated.
      • If the group‘s action is BISECT, AnomalyGroupService.FindIssuesFromCulprits is called. If it returns issue IDs associated with this group’s culprits, those issues are updated.
  5. Temporal Workflow (MaybeTriggerBisection - not detailed here but implied):
    • Based on the group's GroupActionType:
      • If BISECT: It might check conditions (e.g., number of anomalies in the group) and then trigger a bisection job (e.g., Pinpoint) using AnomalyGroupService.FindTopAnomalies to pick the most significant anomaly. The bisection ID is then saved to the group.
      • If REPORT: It might check conditions and then file a bug using AnomalyGroupService.FindTopAnomalies to gather details. The issue ID is saved to the group.

This system aims to automate and streamline the handling of performance regressions by intelligently grouping them and initiating appropriate follow-up actions.

Module: /go/backend

The /go/backend module implements a gRPC-based backend service for Perf. This service is designed to host API endpoints that are not directly user-facing, promoting a separation of concerns and enabling better scalability and maintainability.

Core Purpose and Design Philosophy:

The primary motivation for this backend service is to create a stable, internal API layer. This decouples user-facing components (like the frontend) from the direct implementation details of various backend tasks. For instance, if Perf needs to trigger a Pinpoint job, the frontend doesn't interact with Pinpoint or a workflow engine like Temporal directly. Instead, it makes a gRPC call to an endpoint on this backend service. The backend service then handles the interaction with the underlying system (e.g., Temporal).

This design offers several advantages:

  • Interface Stability: If the underlying implementation for a task changes (e.g., replacing Temporal with another workflow orchestrator), the gRPC contract exposed by the backend service can remain the same. This minimizes changes required in calling services.
  • Load Offloading: Computationally intensive operations that might otherwise burden the frontend can be delegated to this backend service. Examples include dry-running regression detection.
  • Centralized Internal Logic: It provides a dedicated place for internal, non-UI-facing business logic.

Key Components and Responsibilities:

  • backend.go: This is the heart of the backend service.

    • Backend struct: Encapsulates the state and configuration of the backend application, including gRPC server settings, ports, and loaded configuration.
    • BackendService interface: Defines a contract for any service that wishes to be hosted by this backend. Each such service must provide its gRPC service descriptor, registration logic, and an authorization policy. This interface-based approach allows for modular addition of new functionalities.
    • The GetAuthorizationPolicy() method returns a shared.AuthorizationPolicy which specifies whether unauthenticated access is allowed and which user roles are authorized to call the service or specific methods within it.
    • RegisterGrpc() is responsible for registering the specific gRPC service implementation with the main gRPC server.
    • GetServiceDescriptor() provides metadata about the gRPC service.
    • initialize() function: This is a crucial setup function. It:
    • Initializes common application components (like Prometheus metrics).
    • Loads and validates the application configuration (from a JSON file, e.g., demo.json).
    • Instantiates various data stores (for anomaly groups, culprits, subscriptions, regressions) by using builder functions that typically read connection details from the loaded configuration. This allows for flexibility in choosing data store implementations (e.g., Spanner, CockroachDB).
    • Sets up a culprit notifier, which is responsible for sending notifications about identified culprits.
    • Initializes a Temporal client if the NotifyConfig.Notifications is set to AnomalyGrouper, as this indicates that anomaly grouping workflows managed by Temporal are in use.
    • Dynamically configures and registers all BackendService implementations. This involves setting up authorization rules based on the policy defined by each service and then registering their gRPC handlers.
    • Starts listening for gRPC connections on the configured port.
    • configureServices() and registerServices(): These helper functions iterate over the list of BackendService implementations to set up authorization and register them with the main gRPC server.
    • configureAuthorizationForService(): This function applies the authorization policies defined by each individual service to the gRPC server's authorization policy. It uses grpcsp.ServerPolicy to define which roles can access the service or specific methods.
    • New() constructor: Creates and initializes a new Backend instance. It takes various store implementations and a notifier as arguments, allowing for dependency injection, particularly useful for testing. If these are nil, they are typically created within initialize() based on the configuration.
    • ServeGRPC() and Serve(): These methods start the gRPC server and block until it's shut down.
    • Cleanup(): Handles graceful shutdown of the gRPC server.
  • pinpoint.go: This file defines a wrapper for the actual Pinpoint service implementation (which resides in pinpoint/go/service).

    • pinpointService struct: Implements the BackendService interface.
    • NewPinpointService(): Creates a new instance, taking a Temporal provider and a rate limiter as arguments. This indicates that Pinpoint operations might be rate-limited and potentially involve Temporal workflows.
    • It defines an authorization policy requiring users to have at least roles.Editor to access Pinpoint functionalities. This is a good example of how specific services define their own access control rules.
  • shared/authorization.go:

    • AuthorizationPolicy struct: A simple struct used by BackendService implementations to declare their authorization requirements. This includes whether unauthenticated access is permitted, a list of roles authorized for the entire service, and a map for method-specific role authorizations. This promotes a consistent way for services to define their security posture.
  • client/backendclientutil.go: This utility file provides helper functions for creating gRPC clients to connect to the backend service itself (or specific services hosted by it).

    • getGrpcConnection(): Abstracts the logic for establishing a gRPC connection. It handles both insecure (typically for local development/testing) and secure connections. For secure connections, it uses TLS (with InsecureSkipVerify: true as it's intended for internal GKE cluster communication) and OAuth2 for authentication, obtaining tokens for the service account running the client process.
    • NewPinpointClient(), NewAnomalyGroupServiceClient(), NewCulpritServiceClient(): These are factory functions that simplify the creation of typed gRPC clients for the specific services hosted on the backend. They first check if the backend service is configured/enabled before attempting to create a connection. This pattern makes it easy for other internal services to consume the APIs provided by this backend.
  • backendserver/main.go: This is the entry point for the backend server executable.

    • It uses the urfave/cli library to define a command-line interface.
    • The run command initializes and starts the Backend service using the backend.New() constructor and then calls b.Serve().
    • It primarily parses command-line flags (defined in config.BackendFlags) and passes them to the backend package. It doesn't instantiate stores or notifiers directly, relying on the backend.New (and subsequently initialize) to create them based on the loaded configuration if nil is passed.

Workflow Example: Handling a gRPC Request

  1. A client (e.g., the Perf frontend or another internal service) uses a generated gRPC client stub (potentially created with helpers from client/backendclientutil.go) to make a call to a specific method on a service hosted by the backend (e.g., Pinpoint.ScheduleJob).
  2. The gRPC request arrives at the Backend server's listener (b.lisGRPC).
  3. The grpc.Server routes the request to the appropriate service implementation (e.g., pinpointService).
  4. Authentication/Authorization (via grpcsp.ServerPolicy): Before the service method is executed, the UnaryInterceptor configured in backend.go (which uses b.serverAuthPolicy) intercepts the call. Incoming gRPC Request --> UnaryInterceptor (grpcsp) | V Check Auth Policy for Service/Method (defined by pinpointService.GetAuthorizationPolicy()) | V Allow/Deny ----> Yes: Proceed to service method No: Return error
  5. If authorized, the corresponding method on the pinpointService (which delegates to the actual pinpoint_service.PinpointServer implementation) is invoked.
  6. The service method performs its logic (e.g., interacting with Temporal to schedule a Pinpoint job, querying data stores).
  7. A response is sent back to the client.

Configuration and Initialization:

The system relies heavily on a configuration file (specified by flags.ConfigFilename, often demo.json for local development as seen in backend_test.go and testdata/demo.json). This file dictates:

  • Data store connection strings and types (data_store_config).
  • Notification settings (notify_config).
  • The backend service's own host URL (backend_host_url), which it might use if it needs to call itself or if other components need to discover it.
  • Temporal configuration (temporal_config - though not explicitly in demo.json, it's checked in backend.go).

The initialize function in backend.go is responsible for parsing this configuration and setting up all necessary dependencies like database connections, the Temporal client, and the culprit notifier. The use of builder functions (e.g., builders.NewAnomalyGroupStoreFromConfig) allows the system to be flexible with regard to the actual implementations of these components, as long as they conform to the required interfaces.

This backend module serves as a crucial intermediary, enhancing the robustness and maintainability of the Perf system by providing a well-defined internal API layer.

Module: /go/bug

The go/bug module is designed to facilitate the creation of URLs for reporting bugs or regressions identified within the Skia performance monitoring system. Its primary purpose is to dynamically generate these URLs based on a predefined template and specific details about the identified issue. This approach allows for flexible integration with various bug tracking systems, as the URL structure can be configured externally.

Core Functionality and Design:

The module centers around the concept of URI templates. Instead of hardcoding URL formats for specific bug trackers, it uses a template string that contains placeholders for relevant information. This makes the system adaptable to changes in bug tracker URL schemes or the adoption of new trackers without requiring code modifications.

The key function, Expand, takes a URI template and populates it with details about the regression. These details include:

  1. clusterLink: A URL pointing to the specific performance data cluster that exhibits the regression. This provides direct context for anyone investigating the bug.
  2. c provider.Commit: Information about the specific commit suspected of causing the regression. This includes the commit's URL, allowing for easy navigation to the code change. The use of the provider.Commit type from perf/go/git/provider indicates an integration with a system that can furnish commit details.
  3. message: A user-provided message describing the regression. This allows the reporter to add specific observations or context.

The Expand function utilizes the gopkg.in/olivere/elastic.v5/uritemplates library to perform the actual substitution of placeholders in the template string with the provided values. This library handles URL encoding of the substituted values, ensuring the generated URL is valid.

Key Components/Files:

  • bug.go: This file contains the core logic for expanding URI templates.

    • Expand(uriTemplate string, clusterLink string, c provider.Commit, message string) string: This is the primary function responsible for generating the bug reporting URL. It takes the template and the contextual information as input and returns the fully formed URL. If the template expansion fails (e.g., due to a malformed template), it logs an error using go.skia.org/infra/go/sklog and returns an empty string or a partially formed URL depending on the nature of the error.
    • ExampleExpand(uriTemplate string) string: This function serves as a utility or example for demonstrating how to use the Expand function. It calls Expand with pre-defined example data for the cluster link, commit, and message. This can be useful for testing the template expansion logic or for providing a quick way to see how a given template would be populated.
  • bug_test.go: This file contains unit tests for the functionality in bug.go.

    • TestExpand(t *testing.T): This test function verifies that the Expand function correctly substitutes the provided values into the URI template and produces the expected URL. It uses the github.com/stretchr/testify/assert library for assertions, ensuring that the generated URL matches the anticipated output, including proper URL encoding.

Workflow:

A typical workflow involving this module would be:

  1. Configuration: An external system (e.g., the Perf frontend) is configured with a URI template for the desired bug tracking system. This template will contain placeholders like {cluster_url}, {commit_url}, and {message}. Example Template: https://bugtracker.example.com/new?summary=Regression%20Found&description=Regression%20details:%0ACluster:%20{cluster_url}%0ACommit:%20{commit_url}%0AMessage:%20{message}

  2. Regression Identification: A user or an automated system identifies a performance regression.

  3. Information Gathering: The system gathers the necessary information:

    • The URL to the Perf cluster graph showing the regression.
    • Details of the commit suspected to have introduced the regression.
    • An optional message from the user.
  4. URL Generation: The Expand function in go/bug is called with the configured URI template and the gathered information.

    template := "https://bugtracker.example.com/new?summary=Regression%20Found&description=Cluster:%20{cluster_url}%0ACommit:%20{commit_url}%0AMessage:%20{message}"
    clusterURL := "https://perf.skia.org/t/?some_params"
    commitData := provider.Commit{URL: "https://skia.googlesource.com/skia/+show/abcdef123"}
    userMessage := "Significant drop in frame rate on TestXYZ."
    
    bugReportURL := bug.Expand(template, clusterURL, commitData, userMessage)
    
  5. Redirection/Display: The generated bugReportURL is then presented to the user, who can click it to navigate to the bug tracker with the pre-filled information.

This design decouples the bug reporting logic from the specifics of any single bug tracking system, promoting flexibility and maintainability. The use of a standard URI template expansion library ensures robustness in URL generation.

Module: /go/builders

The builders module is responsible for constructing various core components of the Perf system based on instance configuration. This centralized approach to object creation prevents cyclical dependencies that could arise if configuration objects were directly responsible for building the components they configure. The module acts as a factory, taking an InstanceConfig and returning fully initialized and operational objects like data stores, file sources, and caches.

The primary design goal is to decouple the configuration of Perf components from their instantiation. This allows for cleaner dependencies and makes it easier to manage the lifecycle of different parts of the system. For example, a TraceStore needs a database connection, but the InstanceConfig that defines the database connection string shouldn't also be responsible for creating the TraceStore itself. The builders module bridges this gap.

Key components and their instantiation logic:

  • builders.go: This is the central file containing all the builder functions.
    • Database Pool (NewDBPoolFromConfig): This function is crucial as many other components rely on a database connection. It establishes a connection pool to the configured database (e.g., CockroachDB, Spanner).
    • Why: A connection pool is used to manage database connections efficiently, reusing existing connections to reduce the overhead of establishing new ones for each request.
    • How: It parses the connection string from the InstanceConfig, configures pool parameters like maximum and minimum connections, and sets up a logging adapter (pgxLogAdaptor) to integrate database logs with the application's logging system.
    • Singleton: A key design choice here is the singletonPool. This ensures that only one database connection pool is created per application instance, preventing resource exhaustion and ensuring consistent database interaction. A mutex (singletonPoolMutex) protects the creation of this singleton.
    • Schema Check: Optionally, it can verify that the connected database schema matches the expected schema defined for the application. This is important for ensuring data integrity and compatibility.
    • Timeout Wrapper: The raw database pool is wrapped with a timeout.New wrapper. This enforces that all database operations are performed within a context that has a timeout, preventing indefinite blocking. InstanceConfig --> NewDBPoolFromConfig --> pgxpool.ParseConfig | +-> pgxpool.ConnectConfig --> rawPool | +-> timeout.New(rawPool) --> singletonPool (if schema check passes)
    • PerfGit (NewPerfGitFromConfig): Constructs a perfgit.Git object, which provides an interface to Git repository data.
    • Why: Perf needs to associate performance data with specific code revisions.
    • How: It first obtains a database pool using getDBPool (which in turn uses NewDBPoolFromConfig) and then instantiates perfgit.New with this pool and the instance configuration.
    • TraceStore (NewTraceStoreFromConfig): Creates a tracestore.TraceStore for managing performance trace data.
    • Why: This is the core component for storing and retrieving time-series performance metrics.
    • How: It gets a database pool and a TraceParamStore (for managing trace parameter sets) and then instantiates the appropriate sqltracestore.
    • MetadataStore (NewMetadataStoreFromConfig): Creates a tracestore.MetadataStore for managing metadata associated with traces.
    • How: Similar to TraceStore, it obtains a database pool and then creates an sqltracestore.NewSQLMetadataStore.
    • AlertStore, RegressionStore, ShortcutStore, GraphsShortcutStore, AnomalyGroupStore, CulpritStore, SubscriptionStore, FavoriteStore, UserIssueStore: These functions follow a similar pattern: they obtain a database pool via getDBPool and then instantiate their respective SQL-backed store implementations (e.g., sqlalertstore, sqlregression2store).
    • Why: These stores manage various aspects of Perf's functionality, such as alerting configurations, regression tracking, saved shortcuts, etc. Centralizing their creation based on the common database configuration simplifies the system.
    • RegressionStore Variation: NewRegressionStoreFromConfig has a conditional logic based on instanceConfig.UseRegression2 to instantiate either sqlregression2store or sqlregressionstore. This allows for migrating to a new regression store implementation controlled by configuration.
    • GraphsShortcutStore Caching: NewGraphsShortcutStoreFromConfig can return a cached version (graphsshortcutstore.NewCacheGraphsShortcutStore) if localToProd is true, indicating a local development or testing environment where a simpler in-memory cache might be preferred over a database-backed store.
    • Source (NewSourceFromConfig): Creates a file.Source which defines where Perf ingests data from (e.g., Google Cloud Storage, local directories).
    • Why: Perf needs to be flexible in terms of where it reads input data files.
    • How: It uses a switch statement based on instanceConfig.IngestionConfig.SourceConfig.SourceType to instantiate either a gcssource or a dirsource.
    • IngestedFS (NewIngestedFSFromConfig): Creates a fs.FS (file system interface) that provides access to already ingested files.
    • Why: To provide a consistent way to access files regardless of their underlying storage (GCS or local).
    • How: Similar to NewSourceFromConfig, it switches on the source type to return a GCS or local file system implementation.
    • Cache (GetCacheFromConfig): Returns a cache.Cache instance (either Redis-backed or local in-memory).
    • Why: Caching is used to improve the performance of frequently accessed data or computationally intensive queries.
    • How: It checks instanceConfig.QueryConfig.CacheConfig.Type to determine whether to create a redisCache (connecting to a Google Cloud Redis instance) or a localCache.

The getDBPool helper function is used internally by many builder functions. It acts as a dispatcher based on instanceConfig.DataStoreConfig.DataStoreType, calling NewDBPoolFromConfig with appropriate schema checking flags. This abstracts the direct call to NewDBPoolFromConfig and centralizes the logic for selecting the database type.

The test file (builders_test.go) ensures that these builder functions correctly instantiate objects and handle different configurations, including invalid ones. A notable aspect of the tests is the management of the singletonPool. Since NewDBPoolFromConfig creates a singleton, tests that require fresh database instances must explicitly clear this singleton (singletonPool = nil) before calling the builder to avoid reusing a connection from a previous test. This is handled in newDBConfigForTest.

Module: /go/chromeperf

The chromeperf module facilitates interaction with the Chrome Perf backend, which is the system of record for performance data for Chromium. This module allows Perf to send and receive data from Chrome Perf.

Key Responsibilities

The primary responsibility of this module is to abstract the communication details with the Chrome Perf API. It provides a typed Go interface to various Chrome Perf endpoints, handling request formatting, authentication, and response parsing.

This interaction is crucial for:

  • Reporting Regressions: When Perf detects a performance regression, it needs to inform Chrome Perf to create an alert and potentially file a bug.
  • Fetching Anomaly Data: Perf needs to retrieve information about existing anomalies and alert groups from Chrome Perf to display them in its UI or use them in its analysis. This includes details about the commit range, affected tests, and associated bug IDs.
  • Maintaining Test Path Consistency: Chrome Perf and Perf may have slightly different representations of test paths (e.g., due to character restrictions). This module, in conjunction with the sqlreversekeymapstore submodule, helps manage these differences.

Key Components

chromeperfClient.go

This file defines the generic ChromePerfClient interface and its implementation, chromePerfClientImpl. This is the core component responsible for making HTTP GET and POST requests to the Chrome Perf API.

Why: Abstracting the HTTP client allows for easier testing (by mocking the client) and centralizes the logic for handling authentication (using OAuth2 Google default token source) and constructing target URLs.

How:

  • It uses google.DefaultTokenSource for authentication.
  • generateTargetUrl constructs the correct API endpoint URL, differentiating between the Skia-Bridge proxy (https://skia-bridge-dot-chromeperf.appspot.com) and direct calls to the legacy Chrome Perf endpoint (https://chromeperf.appspot.com). The Skia-Bridge is generally preferred.
  • SendGetRequest and SendPostRequest handle the actual HTTP communication, JSON marshalling/unmarshalling, and basic error handling, including checking for accepted HTTP status codes.

Example workflow for a POST request:

Caller -> chromePerfClient.SendPostRequest(ctx, "anomalies", "add", requestBody, &responseObj, []int{200})
  |
  |  (Serializes requestBody to JSON)
  v
  |--------------------------------------------------------------------------------------------------------|
  | generateTargetUrl("https://skia-bridge-dot-chromeperf.appspot.com/anomalies/add")                      |
  |--------------------------------------------------------------------------------------------------------|
  |
  v
httpClient.Post(targetUrl, "application/json", jsonBody)
  |
  v
  (HTTP Request to Chrome Perf API)
  |
  v
  (Receives HTTP Response)
  |
  v
  (Checks if response status code is in acceptedStatusCodes)
  |
  v
  (Deserializes response body into responseObj)
  |
  v
Caller (receives populated responseObj or error)

anomalyApi.go

This file builds upon chromeperfClient.go to provide a specialized client for interacting with the /anomalies endpoint in Chrome Perf. It defines the AnomalyApiClient interface and its implementation anomalyApiClientImpl.

Why: This client encapsulates the logic specific to anomaly-related operations, such as formatting requests for reporting regressions or fetching anomaly details, and parsing the specific JSON structures returned by these endpoints. It also handles the translation between Perf‘s trace identifiers and Chrome Perf’s test_path format.

How:

  • ReportRegression: Constructs a ReportRegressionRequest and sends it to the anomalies/add endpoint. This is how Perf informs Chrome Perf about a new regression.
  • GetAnomalyFromUrlSafeKey: Fetches details for a specific anomaly using its key from the anomalies/get endpoint.
  • GetAnomalies: Retrieves anomalies for a list of tests within a specific commit range (min_revision, max_revision) by calling the anomalies/find endpoint.
    • It performs a crucial translation step: traceNameToTestPath converts Perf‘s comma-separated key-value trace names (e.g., ,benchmark=Blazor,bot=MacM1,...) into Chrome Perf’s slash-separated test_path (e.g., ChromiumPerf/MacM1/Blazor/...).
    • It also handles potential discrepancies in commit numbers if Chrome Perf returns commit hashes. It uses perfGit.CommitNumberFromGitHash to resolve these.
  • GetAnomaliesTimeBased: Similar to GetAnomalies, but fetches anomalies based on a time range (start_time, end_time) by calling the anomalies/find_time endpoint.
  • GetAnomaliesAroundRevision: Fetches anomalies that occurred around a specific revision number.
  • traceNameToTestPath: This function is key for interoperability. It parses a Perf trace name (which is a string of key-value pairs) and constructs the corresponding test_path string that Chrome Perf expects. It also handles an experimental feature (EnableSkiaBridgeAggregation) which can modify how test paths are generated, particularly for aggregated statistics (e.g., ensuring testName_avg is used if the stat is value).
    • The logic for statToSuffixMap and hasSuffixInTestValue addresses historical inconsistencies where test names in Perf might or might not include statistical suffixes (like _avg, _max). The goal is to derive the correct Chrome Perf test_path.

Workflow for fetching anomalies:

Perf UI/Backend -> anomalyApiClient.GetAnomalies(ctx, ["trace_A,key=val", "trace_B,key=val"], 100, 200)
  |
  v
  (For each traceName)
  traceNameToTestPath("trace_A,key=val") -> "chromeperf/test/path/A"
  |
  v
  chromeperfClient.SendPostRequest(ctx, "anomalies", "find", {Tests: ["path/A", "path/B"], MinRevision: "100", MaxRevision: "200"}, &anomaliesResponse, ...)
  |
  v
  (Parses anomaliesResponse, potentially resolving commit hashes to commit numbers)
  |
  v
Perf UI/Backend (receives AnomalyMap)

alertGroupApi.go

This file provides a client for interacting with Chrome Perf's /alert_group API, specifically to get details about alert groups. An alert group in Chrome Perf typically corresponds to a set of related anomalies (regressions).

Why: When Perf displays information about an alert (which might have originated from Chrome Perf), it needs to fetch details about the associated alert group, such as the specific anomalies included, the commit range, and other metadata.

How:

  • GetAlertGroupDetails: Takes an alert group key and calls the alert_group/details endpoint on Chrome Perf.
  • The AlertGroupDetails struct holds the response, including a map of Anomalies (where the value is the Chrome Perf test_path) and start/end commit numbers/hashes.
  • GetQueryParams and GetQueryParamsPerTrace: These methods are utilities to transform the AlertGroupDetails into query parameters that can be used to construct URLs for Perf's own explorer page. This allows users to easily navigate from a Chrome Perf alert to viewing the corresponding data in Perf.
    • GetQueryParams aggregates all test path components (masters, bots, benchmarks, etc.) from all anomalies in the group into a single set of parameters.
    • GetQueryParamsPerTrace generates a separate set of query parameters for each individual anomaly in the alert group.
    • They parse the slash-separated test_path from Chrome Perf back into individual components.

Workflow for getting alert group details:

Perf Backend (e.g., when processing an incoming alert from Chrome Perf)
  |
  v
  alertGroupApiClient.GetAlertGroupDetails(ctx, "chrome_perf_group_key")
  |
  v
  chromeperfClient.SendGetRequest(ctx, "alert_group", "details", {key: "chrome_perf_group_key"}, &alertGroupResponse)
  |
  v
  (alertGroupResponse is populated)
  |
  v
  alertGroupResponse.GetQueryParams(ctx) -> Perf Explorer URL query params

store.go and the sqlreversekeymapstore submodule

store.go defines the ReverseKeyMapStore interface. The sqlreversekeymapstore directory and its schema subdirectory provide an SQL-based implementation of this interface.

Why: Test paths in Chrome Perf can contain characters that are considered “invalid” or are handled differently by Perf‘s parameter parsing (e.g., Perf’s trace keys are comma-separated key-value pairs, and the values themselves should ideally not interfere with this). When data is ingested into Perf from Chrome Perf, or when Perf constructs test paths to query Chrome Perf, these “invalid” characters in Chrome Perf test path components (like subtest names) might be replaced (e.g., with underscores).

This creates a problem: if Perf has test/foo_bar and Chrome Perf has test/foo?bar, Perf needs a way to know that foo_bar corresponds to foo?bar when querying Chrome Perf. The ReverseKeyMapStore is designed to store these mappings.

How:

  • sqlreversekeymapstore/schema/schema.go defines the SQL table schema ReverseKeyMapSchema with columns:
    • ModifiedValue: The value as it appears in Perf (e.g., foo_bar).
    • ParamKey: The parameter key this value belongs to (e.g., subtest_1).
    • OriginalValue: The original value as it was in Chrome Perf (e.g., foo?bar).
    • The primary key is a combination of ModifiedValue and ParamKey.
  • sqlreversekeymapstore/sqlreversekeymapstore.go implements the ReverseKeyMapStore interface using a SQL database (configurable for CockroachDB or Spanner via different SQL statements).
    • Create: Inserts a new mapping. If a mapping for the ModifiedValue and ParamKey already exists (conflict), it does nothing. This is important because the mapping should be stable.
    • Get: Retrieves the OriginalValue given a ModifiedValue and ParamKey.

This store is likely used during the process of converting between Perf trace parameters and Chrome Perf test paths, especially when generating requests to Chrome Perf. If a parameter value in Perf might have been modified from its Chrome Perf original, this store can be queried to get the original value needed for the Chrome Perf API call. The exact point of integration for creating these mappings (i.e., when are Create calls made) is not explicitly detailed within this module but would typically happen when Perf first encounters/ingests a test path from Chrome Perf that requires modification.

For example, if anomalyApi.go needs to construct a test_path to query Chrome Perf based on parameters from Perf:

  1. Perf has params: test=my_test, subtest_1=value_with_question_mark
  2. When constructing the test_path segment for subtest_1: - Call reverseKeyMapStore.Get(ctx, "value_with_question_mark", "subtest_1"). - If it returns an original value like "value?with?question?mark", use that for the Chrome Perf API call. - Otherwise, use "value_with_question_mark".

The store.go file simply defines the interface, allowing for different backend implementations of this mapping store if needed, though sqlreversekeymapstore is the provided concrete implementation.

Module: /go/clustering2

Overview

The clustering2 module is responsible for grouping similar performance traces together using k-means clustering. This helps in identifying patterns and regressions in performance data by analyzing the collective behavior of traces rather than individual ones. The core idea is to represent each trace as a point in a multi-dimensional space and then find k clusters of these points.

Design and Implementation

Why K-Means?

K-means is a well-understood and relatively efficient clustering algorithm suitable for the scale of performance data encountered. It partitions data into k distinct, non-overlapping clusters. Each data point belongs to the cluster with the nearest mean (cluster centroid). This approach allows for the summarization of large numbers of traces into a smaller set of representative “shapes” or behaviors.

Key Components and Files

clustering.go

This file contains the primary logic for performing k-means clustering on performance traces.

  • ClusterSummary: This struct represents a single cluster found by the k-means algorithm.

    • Centroid: The average shape of all traces in this cluster. This is the core representation of the cluster's behavior.
    • Keys: A list of identifiers for the traces belonging to this cluster. These are sorted by their distance to the Centroid, allowing users to quickly see the most representative traces. This is not serialized to JSON to keep the payload manageable, as it can be very large.
    • Shortcut: An identifier for a pre-computed set of Keys, used for efficient retrieval and display in UIs.
    • ParamSummaries: A breakdown of the parameter key-value pairs present in the cluster and their prevalence (see valuepercent.go). This helps in understanding what distinguishes this cluster (e.g., “all traces in this cluster are for arch=x86”).
    • StepFit: Contains information about how well the Centroid fits a step function. This is crucial for identifying regressions or improvements that manifest as sudden shifts in performance.
    • StepPoint: The specific data point (commit/timestamp) where the step (if any) in the Centroid is detected.
    • Num: The total number of traces in this cluster.
    • Timestamp: Records when the cluster analysis was performed.
    • NotificationID: Stores the ID of any alert or notification sent regarding a significant step change detected in this cluster.
  • ClusterSummaries: A container for all the ClusterSummary objects produced by a single clustering run, along with metadata like the K value used and the StdDevThreshold.

  • CalculateClusterSummaries function: This is the main entry point for the clustering process.

    • Trace Conversion: It takes a dataframe.DataFrame (which holds traces and their metadata) and converts each trace into a kmeans.Clusterable object. The ctrace2.NewFullTrace function is used here, which likely involves some form of normalization or feature extraction to make traces comparable. The stddevThreshold parameter is used during this conversion, potentially to filter out noisy or flat traces.
    • Initial Centroid Selection (chooseK): K-means requires an initial set of k centroids. This function randomly selects k traces from the input data to serve as the initial centroids. Random selection is a common and simple initialization strategy.
    • K-Means Iteration:
    • The kmeans.Do function performs one iteration of the k-means algorithm:
      1. Assign each observation (trace) to the nearest centroid.
      2. Recalculate the centroids based on the mean of the observations assigned to them. The ctrace2.CalculateCentroid function is likely responsible for computing the mean of a set of traces.
    • This process is repeated for a maximum of MAX_KMEANS_ITERATIONS or until the change in totalError (sum of squared distances from each point to its centroid) between iterations falls below KMEAN_EPSILON. This convergence criterion prevents unnecessary computations once the clusters stabilize.
    • A Progress callback can be provided to monitor the clustering process, reporting the totalError at each iteration.
    • Summary Generation (getClusterSummaries): After the k-means algorithm converges, this function takes the final centroids and the original observations to generate ClusterSummary objects for each cluster.
    • For each cluster, it identifies the member traces.
    • It calculates ParamSummaries (see valuepercent.go) to describe the common characteristics of traces in that cluster.
    • It performs step detection (stepfit.GetStepFitAtMid) on the cluster's centroid to identify significant performance shifts. The interesting parameter likely defines a threshold for what constitutes a noteworthy step change, and stepDetection specifies the algorithm or method used for step detection.
    • It sorts the traces within each cluster by their distance to the centroid, ensuring ClusterSummary.Keys lists the most representative traces first. A limited number of sample keys (config.MaxSampleTracesPerCluster) are stored.
    • Finally, the resulting ClusterSummary objects are sorted, likely by the magnitude or significance of the detected step (StepFit.Regression), to highlight the most impactful changes first.
  • Constants:

    • K: The default number of clusters to find. 50 is chosen as a balance between granularity and computational cost.
    • MAX_KMEANS_ITERATIONS: A safeguard against non-converging k-means runs.
    • KMEAN_EPSILON: A threshold to determine convergence, balancing precision with computation time.

valuepercent.go

This file defines how to summarize and present the parameter distributions within a cluster.

  • ValuePercent struct: Represents a specific parameter key-value pair (e.g., “config=8888”) and the percentage of traces in a cluster that have this pair. This provides a quantitative measure of how characteristic a parameter is for a given cluster.

  • SortValuePercentSlice function: This is crucial for making the ParamSummaries in ClusterSummary human-readable and informative. The goal is to:

    1. Group parameter values by their key (e.g., all “config=...” values together).
    2. Within each key group, sort by the percentage (highest first).
    3. Sort the key groups themselves by the highest percentage of their top value. If percentages are equal, an alphabetical sort of the value is used as a tie-breaker.

    This complex sorting logic ensures that the most dominant and distinguishing parameters for a cluster are presented prominently. For example:

    config=8888 90%
    config=565  10%
    arch=x86    80%
    arch=arm    20%
    

    Here, “config” is listed before “arch” because its top value (“config=8888”) has a higher percentage (90%) than the top value for “arch” (“arch=x86” at 80%).

Workflow: Calculating Cluster Summaries

Input: DataFrame (traces, headers), K, StdDevThreshold, ProgressCallback, InterestingThreshold, StepDetectionMethod

1.  [clustering.go: CalculateClusterSummaries]
    a.  Initialize empty list of observations.
    b.  For each trace in DataFrame.TraceSet:
        i.  Create ClusterableTrace (ctrace2.NewFullTrace) using trace data and StdDevThreshold.
        ii. Add to observations list.
    c.  If no observations, return error.
    d.  [clustering.go: chooseK]
        i.  Randomly select K observations to be initial centroids.
    e.  Initialize lastTotalError = 0.0
    f.  Loop MAX_KMEANS_ITERATIONS times OR until convergence:
        i.   [kmeans.Do] -> new_centroids
             1.  Assign each observation to its closest centroid (from previous iteration or initial).
             2.  Recalculate centroids (ctrace2.CalculateCentroid) based on assigned observations.
        ii.  [kmeans.TotalError] -> currentTotalError
        iii. If ProgressCallback provided, call it with currentTotalError.
        iv.  If |currentTotalError - lastTotalError| < KMEAN_EPSILON, break loop.
        v.   lastTotalError = currentTotalError
    g.  [clustering.go: getClusterSummaries] -> clusterSummaries
        i.   [kmeans.GetClusters] -> allClusters (list of observations per centroid)
        ii.  For each cluster in allClusters and its corresponding centroid:
             1.  Create new ClusterSummary.
             2.  [clustering.go: getParamSummaries] (using cluster members) -> ParamSummaries
                 a.  [clustering.go: GetParamSummariesForKeys]
                     i.   Count occurrences of each param=value in cluster keys.
                     ii.  Convert counts to ValuePercent structs.
                     iii. [valuepercent.go: SortValuePercentSlice] -> sorted ParamSummaries.
             3.  [stepfit.GetStepFitAtMid] (on centroid values, StdDevThreshold, InterestingThreshold, StepDetectionMethod) -> StepFit, StepPoint.
             4.  Set ClusterSummary.Num = number of members in cluster.
             5.  Sort cluster members by distance to centroid.
             6.  Populate ClusterSummary.Keys with top N sorted member keys.
             7.  Populate ClusterSummary.Centroid with centroid values.
        iii. Sort all ClusterSummary objects (e.g., by StepFit.Regression).
    h.  Populate ClusterSummaries struct with results, K, and StdDevThreshold.
    i.  Return ClusterSummaries.

Output: ClusterSummaries object or error.

This process effectively transforms raw trace data into a structured summary that highlights significant patterns and changes, facilitating performance analysis and regression detection.

Module: /go/config

The /go/config module defines the configuration structure for Perf instances and provides utilities for loading, validating, and managing these configurations. It plays a crucial role in customizing the behavior of a Perf instance, from data ingestion and storage to alert notifications and UI presentation.

Core Responsibilities and Design:

The primary responsibility of this module is to define and manage the InstanceConfig struct. This struct is a comprehensive container for all settings that govern a Perf instance. The design emphasizes:

  1. Centralized Configuration: By consolidating all instance-specific settings into a single InstanceConfig struct (config.go), the module provides a single source of truth. This simplifies understanding the state of an instance and reduces the chances of configuration drift.
  2. Typed Configuration: Using Go structs with explicit types ensures that configuration values are of the expected format, catching many potential errors at compile-time or during validation. This is preferable to using untyped maps or generic configuration formats.
  3. JSON Serialization/Deserialization: Configuration files are expected to be in JSON format. The module uses standard Go encoding/json for this, making it easy to create, read, and modify configurations.
  4. Schema Validation: To ensure the integrity and correctness of configuration files, the module employs JSON Schema validation (/go/config/validate/validate.go, /go/config/validate/instanceConfigSchema.json).
    • A JSON schema (instanceConfigSchema.json) formally defines the structure and types of the InstanceConfig. This schema is automatically generated from the Go struct definition using the /go/config/generate/main.go program, ensuring the schema stays in sync with the code.
    • The validate.InstanceConfigFromFile function uses this schema to validate a configuration file before attempting to deserialize it. This allows for early detection of malformed or incomplete configurations.
  5. Command-Line Flag Integration: The module defines structs like BackendFlags, FrontendFlags, IngestFlags, and MaintenanceFlags (config.go). These structs group related command-line flags and provide methods (AsCliFlags) to convert them into cli.Flag slices, compatible with the github.com/urfave/cli/v2 library. This design keeps flag definitions organized and associated with the components they configure.
  6. Extensibility: The InstanceConfig is designed to be extensible. New configuration options can be added as new fields to the relevant sub-structs. The JSON schema generation and validation mechanisms will automatically adapt to these changes.

Key Components and Files:

  • config.go: This is the heart of the module.
    • It defines the main InstanceConfig struct, which aggregates various sub-configuration structs like AuthConfig, DataStoreConfig, IngestionConfig, GitRepoConfig, NotifyConfig, IssueTrackerConfig, AnomalyConfig, QueryConfig, TemporalConfig, and DataPointConfig. Each of these sub-structs groups settings related to a specific aspect of the Perf system (e.g., authentication, data storage, data ingestion).
    • It defines various enumerated types (e.g., DataStoreType, SourceType, GitAuthType, GitProvider, TraceFormat) to provide clear and constrained options for certain configuration values.
    • It includes DurationAsString, a custom type for handling time.Duration serialization and deserialization as strings in JSON, which is more human-readable than nanosecond integers. It also provides a custom JSON schema for this type.
    • It defines structs for command-line flags used by different Perf services (backend, frontend, ingest, maintenance). This helps in organizing and parsing command-line arguments.
    • Global constants like MaxSampleTracesPerCluster, MinStdDev, GotoRange, and QueryMaxRunTime are defined here, providing default values or limits used across the application.
  • /go/config/validate/validate.go:
    • This file contains the logic for validating an InstanceConfig beyond what the JSON schema can enforce. This includes semantic checks, such as ensuring that required fields are present based on the values of other fields (e.g., API keys for issue tracker notifications).
    • The InstanceConfigFromFile function is the primary entry point for loading and validating a configuration file. It first performs schema validation and then calls the Validate function for further business logic checks.
    • It also validates the Go text templates used in NotifyConfig by attempting to format them with sample data. This helps catch template syntax errors early.
  • /go/config/validate/instanceConfigSchema.json:
    • This is an automatically generated JSON Schema file that defines the expected structure and data types for InstanceConfig JSON files. It is used by validate.go to perform initial validation of configuration files.
  • /go/config/generate/main.go:
    • This is a small utility program that generates the instanceConfigSchema.json file based on the InstanceConfig struct definition in config.go. This ensures that the schema is always up-to-date with the Go code. The //go:generate directive at the top of the file allows for easy regeneration of the schema.
  • config_test.go and /go/config/validate/validate_test.go:
    • These files contain unit tests for the configuration loading, serialization/deserialization (especially for custom types like DurationAsString), and validation logic. The tests for validate.go include checks against actual configuration files used in production (//perf:configs), ensuring that the validation logic is robust and correctly handles real-world scenarios.

Workflows:

1. Loading and Validating a Configuration File:

User provides config file path (e.g., "configs/nano.json")
  |
  V
Application calls validate.InstanceConfigFromFile("configs/nano.json")
  |
  V
validate.go: Reads the JSON file content.
  |
  V
validate.go: Validates content against instanceConfigSchema.json (using jsonschema.Validate).
  |                                 \
  | (If schema violation)            \
  V                                   V
Error returned with schema violations.  Deserializes JSON into config.InstanceConfig struct.
                                      |
                                      V
                                    validate.go: Calls Validate(instanceConfig) for further business logic checks.
                                      |           (e.g., API key presence, template validity)
                                      |
                                      | (If validation error)
                                      V
                                    Error returned.
                                      |
                                      V (If all valid)
                                    Returns the populated config.InstanceConfig struct.
                                      |
                                      V
                                    Application sets config.Config = returnedInstanceConfig
                                      |
                                      V
                                    Perf instance uses config.Config for its operations.

2. Generating the JSON Schema:

This is typically done during development when the InstanceConfig struct changes.

Developer modifies config.InstanceConfig struct in config.go
  |
  V
Developer runs `go generate` in the /go/config/generate directory (or via bazel)
  |
  V
/go/config/generate/main.go: Calls jsonschema.GenerateSchema("../validate/instanceConfigSchema.json", &config.InstanceConfig{})
  |
  V
jsonschema library: Introspects the config.InstanceConfig struct and its fields.
  |
  V
jsonschema library: Generates a JSON Schema definition.
  |
  V
/go/config/generate/main.go: Writes the generated schema to /go/config/validate/instanceConfigSchema.json.

The design prioritizes robustness through schema and semantic validation, maintainability through structured Go types and centralized configuration, and ease of use through standard JSON format and command-line flag integration. The separation of schema generation (generate subdirectory) and validation (validate subdirectory) keeps concerns distinct.

Module: /go/ctrace2

ctrace2 Module Documentation

Overview

The ctrace2 module provides the functionality to adapt trace data (represented as a series of floating-point values) for use with k-means clustering algorithms. The primary goal is to transform raw trace data into a format that is suitable for distance calculations and centroid computations, which are fundamental operations in k-means. This involves normalization and handling of missing data points.

Why and How

In performance analysis, traces often represent measurements over time or across different configurations. Clustering these traces helps identify groups of similar performance characteristics. However, raw trace data might have issues that hinder effective clustering:

  1. Varying Scales: Different traces might have values in vastly different ranges, leading to biased distance calculations where traces with larger absolute values dominate.
  2. Missing Data: Traces can have missing data points, which need to be handled appropriately during normalization and distance computation.
  3. Zero Standard Deviation: Traces with constant values (zero standard deviation) can cause division by zero errors during normalization.

The ctrace2 module addresses these by:

  • Normalization: Each trace is normalized to have a standard deviation of 1.0. This ensures that the scale of the values does not disproportionately influence the clustering. The vec32.Norm function from the go/vec32 module is leveraged for this. Before normalization, any missing data points (vec32.MissingDataSentinel) are filled in using vec32.Fill, which likely interpolates or uses a similar strategy to replace them.
  • Minimum Standard Deviation: To prevent division by zero or issues with extremely small standard deviations, a minStdDev parameter is used during normalization. If the calculated standard deviation of a trace is below this minimum, the minStdDev value is used instead. This is a practical approach to handle traces with very little variation without excluding them from clustering.
  • ClusterableTrace Structure: This structure wraps the trace data (Key and Values) and implements the kmeans.Clusterable and kmeans.Centroid interfaces from the perf/go/kmeans module. This makes ClusterableTrace instances directly usable by the k-means algorithm.

Responsibilities and Key Components

  • ctrace.go: This is the core file of the module.
    • ClusterableTrace struct:
    • Purpose: Represents a single trace that is ready for clustering. It holds a Key (a string identifier for the trace) and Values (a slice of float32 representing the normalized data points).
    • Why: This struct is designed to be directly consumable by the k-means clustering algorithm by implementing necessary interfaces.
    • Distance(c kmeans.Clusterable) float64 method: Calculates the Euclidean distance between the current ClusterableTrace and another ClusterableTrace. This is crucial for the k-means algorithm to determine how similar two traces are. The calculation assumes that both traces have the same number of data points (a guarantee maintained by NewFullTrace). For each point i in trace1 and trace2: diff_i = trace1.Values[i] - trace2.Values[i] squared_diff_i = diff_i * diff_i Sum all squared_diff_i Distance = Sqrt(Sum)
    • AsClusterable() kmeans.Clusterable method: Returns the ClusterableTrace itself, satisfying the kmeans.Centroid interface requirement.
    • Dup(newKey string) *ClusterableTrace method: Creates a deep copy of the ClusterableTrace with a new key. This is useful when you need to manipulate a trace without affecting the original.
    • NewFullTrace(key string, values []float32, minStdDev float32) *ClusterableTrace function:
    • Purpose: The primary factory function for creating ClusterableTrace instances from raw trace data.
    • How: 1. It takes a key (string identifier), raw values ([]float32), and a minStdDev. 2. Creates a copy of the input values to avoid modifying the original slice. 3. Calls vec32.Fill() on the copied values. This step handles missing data points by filling them, likely through interpolation or a similar imputation technique provided by the go/vec32 module. 4. Calls vec32.Norm() on the filled values, using minStdDev. This normalizes the trace data so that its standard deviation is effectively 1.0 (or adjusted if the original standard deviation was below minStdDev). 5. Returns a new ClusterableTrace with the provided key and the processed (filled and normalized) values. Input: key, raw_values, minStdDev ------------------------------------ copied_values = copy(raw_values) filled_values = vec32.Fill(copied_values) normalized_values = vec32.Norm(filled_values, minStdDev) Output: ClusterableTrace{Key: key, Values: normalized_values}
    • CalculateCentroid(members []kmeans.Clusterable) kmeans.Centroid function:
    • Purpose: Implements the kmeans.CalculateCentroid function type. Given a slice of ClusterableTrace instances (which are members of a cluster), it computes their centroid.
    • How: 1. It initializes a new slice of float32 (mean) with the same length as the Values of the first member trace. 2. It iterates through each member trace in the members slice. 3. For each member, it iterates through its Values and adds each value to the corresponding element in the mean slice. 4. After summing up all values component-wise, it divides each element in the mean slice by the total number of members to get the average value for each dimension. 5. It returns a new ClusterableTrace representing the centroid. The key for this centroid trace is set to CENTROID_KEY (“special_centroid”). Input: members (list of ClusterableTraces) ------------------------------------------ Initialize mean_values = [0.0, 0.0, ..., 0.0] (same length as members[0].Values) For each member_trace in members: For each i from 0 to len(member_trace.Values) - 1: mean_values[i] = mean_values[i] + member_trace.Values[i] For each i from 0 to len(mean_values) - 1: mean_values[i] = mean_values[i] / len(members) Output: ClusterableTrace{Key: CENTROID_KEY, Values: mean_values}
    • CENTROID_KEY constant:
    • Purpose: Defines a standard key (“special_centroid”) to be used for traces that represent the centroid of a cluster.
    • Why: This provides a consistent way to identify centroid traces if they are, for example, added back into a collection of traces (e.g., in a DataFrame).

The interaction with the go/vec32 module is crucial for data preprocessing (filling missing values and normalization), while the perf/go/kmeans module provides the interfaces that ctrace2 implements to be compatible with k-means clustering algorithms.

Module: /go/culprit

The culprit module is responsible for identifying, storing, and notifying about commits that are likely causes of performance regressions. It integrates with anomaly detection and subscription systems to automate the process of pinpointing culprits and alerting relevant parties.

Key Responsibilities

  • Culprit Identification: While the actual bisection logic might reside elsewhere, this module is responsible for receiving information about potential culprit commits.
  • Culprit Persistence: Storing identified culprits in a database, linking them to the anomaly groups they are associated with.
  • Notification: Generating and sending notifications (e.g., creating issues in an issue tracker) when new culprits are found or when new anomaly groups are reported.
  • Data Formatting: Formatting notification messages (subjects and bodies) based on configurable templates.

Key Components and Files

store.go & sqlculpritstore/sqlculpritstore.go

  • Purpose: These files define the interface and implementation for storing and retrieving culprit data. The primary goal is to persist information about commits identified as culprits, associating them with specific anomaly groups and any filed issues.
  • How it Works:
    • store.go defines the Store interface, which outlines the contract for culprit data operations like Get, Upsert, and AddIssueId.
    • sqlculpritstore/sqlculpritstore.go provides a SQL-based implementation of this interface. It uses a SQL database (configured via pool.Pool) to store culprit information.
    • The Upsert method is crucial. It either inserts a new culprit record or updates an existing one if a commit has already been identified as a culprit for a different anomaly group. This prevents duplicate culprit entries for the same commit. It also links the culprit to the anomaly_group_id.
    • The AddIssueId method updates a culprit record to include the ID of an issue (e.g., a bug tracker ticket) that was created for it, and also maintains a map between the anomaly group and the issue ID. This is important for tracking and referencing.
    • The database schema (defined in sqlculpritstore/schema/schema.go) includes fields for commit details (host, project, ref, revision), associated anomaly group IDs, and associated issue IDs. An index on (revision, host, project, ref) helps in efficiently querying for existing culprits.
  • Design Choices:
    • Using an interface (Store) decouples the rest of the module from the specific database implementation, allowing for easier testing and potential future changes in the storage backend.
    • The Upsert logic is designed to handle cases where the same commit might be identified as a culprit for multiple regressions (different anomaly groups). Instead of creating duplicate entries, it appends the new anomaly_group_id to the existing record.
    • Storing group_issue_map as JSONB allows flexible storage of the mapping between anomaly groups and the specific issue filed for that group in the context of this culprit.

formatter/formatter.go

  • Purpose: This component is responsible for constructing the content (subject and body) of notifications. It allows for customizable message formats.
  • How it Works:
    • Defines the Formatter interface with methods GetCulpritSubjectAndBody (for new culprit notifications) and GetReportSubjectAndBody (for new anomaly group reports).
    • MarkdownFormatter is the concrete implementation. It uses Go's text/template package to render notification messages.
    • Templates for subjects and bodies can be provided via InstanceConfig. If not provided, default templates are used.
    • TemplateContext and ReportTemplateContext provide the data that can be used within the templates (e.g., commit details, subscription information, anomaly group details).
    • Helper functions like buildCommitURL, buildAnomalyGroupUrl, and buildAnomalyDetails are available within the templates to construct URLs and format anomaly details.
  • Design Choices:
    • The use of interfaces and templates promotes flexibility. Users can define their own notification formats without modifying the core notification logic.
    • Default templates ensure that the system can function even without explicit template configuration.
    • Separating formatting from the transport mechanism (how notifications are sent) adheres to the single responsibility principle.
  • formatter/noop.go: Provides a NoopFormatter that generates empty subjects and bodies, useful for disabling notifications or for testing scenarios where actual formatting is not needed.

transport/transport.go

  • Purpose: This component handles the actual sending of notifications to external systems, primarily issue trackers.
  • How it Works:
    • Defines the Transport interface with the SendNewNotification method.
    • IssueTrackerTransport is the concrete implementation for interacting with an issue tracker (e.g., Google Issue Tracker/Buganizer).
    • It uses the go.skia.org/infra/go/issuetracker/v1 client library.
    • Authentication is handled using an API key retrieved via the secret package.
    • When SendNewNotification is called, it constructs an issuetracker.Issue object based on the provided subject, body, and subscription details (like component ID, priority, CCs, hotlists).
    • It then calls the issue tracker API to create a new issue.
    • Metrics (SendNewNotificationSuccess, SendNewNotificationFail) are recorded to monitor the success rate of sending notifications.
  • Design Choices:
    • The Transport interface allows for different notification mechanisms to be plugged in (e.g., email, Slack) in the future.
    • Configuration for the issue tracker (API key, secret project/name) is externalized, promoting better security and manageability.
    • Error handling and metrics provide visibility into the notification delivery process.
  • transport/noop.go: Provides a NoopTransport that doesn't actually send any notifications, useful for disabling notifications or for testing.

notify/notify.go

  • Purpose: This component orchestrates the notification process by combining a Formatter and a Transport.
  • How it Works:
    • Defines the CulpritNotifier interface with methods NotifyCulpritFound and NotifyAnomaliesFound.
    • DefaultCulpritNotifier implements this interface. It takes a formatter.Formatter and a transport.Transport as dependencies.
    • The GetDefaultNotifier factory function determines which Formatter and Transport to use based on the InstanceConfig.IssueTrackerConfig.NotificationType. If NoneNotify, it uses NoopFormatter and NoopTransport. If IssueNotify, it sets up MarkdownFormatter and IssueTrackerTransport.
    • NotifyCulpritFound:
    • Calls the formatter's GetCulpritSubjectAndBody to get the message content.
    • Calls the transport's SendNewNotification to send the message.
    • Returns the ID of the created issue (or an empty string if no notification was sent).
    • NotifyAnomaliesFound:
    • Calls the formatter's GetReportSubjectAndBody.
    • Calls the transport's SendNewNotification.
    • Returns the ID of the created issue.
  • Design Choices:
    • Decouples the high-level notification logic from the specifics of message formatting and sending.
    • Configuration-driven selection of formatter and transport makes the notification behavior adaptable.

service/service.go

  • Purpose: Implements the gRPC service defined in culprit.proto. This is the main entry point for external systems (like a bisection service or an anomaly detection pipeline) to interact with the culprit module.
  • How it Works:
    • Implements the pb.CulpritServiceServer interface.
    • It depends on anomalygroup.Store, culprit.Store, subscription.Store, and notify.CulpritNotifier.
    • PersistCulprit RPC:
    • Calls culpritStore.Upsert to save the identified culprit commits and associate them with the anomaly_group_id.
    • Calls anomalygroupStore.AddCulpritIDs to link the newly created/updated culprit IDs back to the anomaly group. [Client (e.g., Bisection Service)] | v [PersistCulpritRequest {Commits, AnomalyGroupID}] | v [culpritService.PersistCulprit] | \ | `-> [culpritStore.Upsert(AnomalyGroupID, Commits)] -> Returns CulpritIDs | | | v `<----------------------- [anomalygroupStore.AddCulpritIDs(AnomalyGroupID, CulpritIDs)] | v [PersistCulpritResponse {CulpritIDs}] | v [Client]
    • GetCulprit RPC:
    • Calls culpritStore.Get to retrieve culprit details by their IDs.
    • NotifyUserOfCulprit RPC:
    • Retrieves culprit details using culpritStore.Get.
    • Loads the corresponding AnomalyGroup using anomalygroupStore.LoadById.
    • Loads the Subscription associated with the anomaly group using subscriptionStore.GetSubscription.
    • Calls notifier.NotifyCulpritFound for each culprit to send a notification (e.g., file a bug).
    • Calls culpritStore.AddIssueId to store the generated issue ID with the culprit and the specific anomaly group. [Client (e.g., Bisection Service after PersistCulprit)] | v [NotifyUserOfCulpritRequest {CulpritIDs, AnomalyGroupID}] | v [culpritService.NotifyUserOfCulprit] |-> [culpritStore.Get(CulpritIDs)] -> Culprits |-> [anomalygroupStore.LoadById(AnomalyGroupID)] -> AnomalyGroup | | -> [subscriptionStore.GetSubscription(AnomalyGroup.SubName, AnomalyGroup.SubRev)] -> Subscription | (For each Culprit in Culprits) | | | `-> [notifier.NotifyCulpritFound(Culprit, Subscription)] -> Returns IssueID | | | v | `-> [culpritStore.AddIssueId(Culprit.ID, IssueID, AnomalyGroupID)] | v [NotifyUserOfCulpritResponse {IssueIDs}] | v [Client]
    • NotifyUserOfAnomaly RPC:
    • Loads the AnomalyGroup and its associated Subscription.
    • Calls notifier.NotifyAnomaliesFound to send a notification about the group of anomalies (e.g., file a summary bug). [Client (e.g., Anomaly Detection Service)] | v [NotifyUserOfAnomalyRequest {AnomalyGroupID, Anomalies[]}] | v [culpritService.NotifyUserOfAnomaly] |-> [anomalygroupStore.LoadById(AnomalyGroupID)] -> AnomalyGroup | | -> [subscriptionStore.GetSubscription(AnomalyGroup.SubName, AnomalyGroup.SubRev)] -> Subscription | `-> [notifier.NotifyAnomaliesFound(AnomalyGroup, Subscription, Anomalies[])] -> Returns IssueID | v [NotifyUserOfAnomalyResponse {IssueID}] | v [Client]
    • PrepareSubscription is a helper function used to potentially override or mock subscription details for testing or during transitional phases before full sheriff configuration is active. This is a temporary measure.
  • Design Choices:
    • Clear separation of concerns: the service layer orchestrates actions by calling appropriate stores and notifiers.
    • gRPC provides a well-defined, language-agnostic interface for the service.
    • The authorization policy (GetAuthorizationPolicy) is currently set to allow unauthenticated access, which might need to be revisited for production environments.

proto/v1/culprit_service.proto

  • Purpose: Defines the gRPC service contract for culprit-related operations.
  • Key Messages and RPCs:
    • Commit: Represents a source code commit.
    • Culprit: Represents an identified culprit commit, including its ID, the commit details, associated anomaly group IDs, and issue IDs. It also includes group_issue_map to track which issue was filed for which anomaly group in the context of this culprit.
    • Anomaly: Represents a detected performance anomaly (duplicated from anomalygroup service for potential independent evolution).
    • PersistCulpritRequest/Response: For storing new culprits.
    • GetCulpritRequest/Response: For retrieving existing culprits.
    • NotifyUserOfAnomalyRequest/Response: For triggering notifications about a new set of anomalies (anomaly group).
    • NotifyUserOfCulpritRequest/Response: For triggering notifications about newly identified culprits.
  • Design Choices:
    • Proto definitions provide a clear, typed contract for communication.
    • The Anomaly message is duplicated from the anomalygroup service. This choice was made to allow the culprit service and anomalygroup service to evolve their respective Anomaly definitions independently if needed in the future, avoiding tight coupling.
    • The group_issue_map in the Culprit message is important for scenarios where a single culprit might be associated with multiple anomaly groups, and each of those (culprit, group) pairs might result in a distinct bug being filed.

Mocks (mocks/ subdirectories)

  • These directories contain generated mock implementations for the interfaces defined within the culprit module (e.g., Store, Formatter, Transport, CulpritNotifier, CulpritServiceServer).
  • Purpose: Facilitate unit testing by allowing dependencies to be easily mocked. This is standard practice for writing testable Go code. They are generated using tools like mockery.

Overall Workflow Example: Finding and Notifying a Culprit

  1. Anomaly Detection: An external system detects a performance regression and groups related anomalies into an AnomalyGroup.
  2. Bisection (External): A bisection process is triggered (potentially externally) to find the commit(s) responsible for the anomalies in the AnomalyGroup.
  3. Persist Culprit: The bisection service calls the CulpritService.PersistCulprit RPC with the identified Commit(s) and the AnomalyGroupID.
    • culpritService uses culpritStore.Upsert to save these commits as Culprit records, linking them to the AnomalyGroupID.
    • It then calls anomalygroupStore.AddCulpritIDs to update the AnomalyGroup record with the IDs of these new culprits.
  4. Notify User of Culprit: The bisection service (or another orchestrator) then calls CulpritService.NotifyUserOfCulprit RPC with the CulpritID(s) and the AnomalyGroupID.
    • culpritService retrieves the full Culprit details and the associated Subscription.
    • DefaultCulpritNotifier is invoked:
      • MarkdownFormatter generates the subject and body for the notification.
      • IssueTrackerTransport sends this formatted message to the issue tracker, creating a new bug.
    • The ID of the created bug is returned.
    • culpritService calls culpritStore.AddIssueId to associate this bug ID with the specific Culprit and AnomalyGroupID.

This flow ensures that culprits are stored, linked to their regressions, and users are notified through the configured channels. The modular design allows for flexibility in how each step (storage, formatting, transport) is implemented.

Module: /go/dataframe

Module: /go/dataframe

The dataframe module provides the DataFrame data structure and related functionality for handling and manipulating performance trace data. It is a core component for querying, analyzing, and visualizing performance metrics within the Skia Perf system.

Key Design Principles:

  • Tabular Data Representation: Inspired by R's DataFrame, this module represents performance data as a table. Rows correspond to individual traces (identified by a structured key), and columns represent distinct commit points or patch levels. This structure facilitates efficient querying and analysis of time-series data across different configurations.
  • TraceSet and ParamSet: A DataFrame encapsulates a types.TraceSet, which is a map of trace keys to their corresponding performance values. It also maintains a paramtools.ReadOnlyParamSet, which describes the unique parameter key-value pairs present in the TraceSet. This allows for efficient filtering and aggregation based on trace characteristics.
  • Commit-Centric Columns: The columns of a DataFrame are defined by ColumnHeader structs, each containing a commit offset and a timestamp. This ties the performance data directly to specific points in the codebase's history.
  • Data Retrieval Abstraction: The DataFrameBuilder interface decouples the DataFrame creation logic from the underlying data source. This allows for different implementations to fetch data (e.g., from a database) while providing a consistent API for consumers.
  • Efficiency for Common Operations: The module provides functions for common data manipulation tasks like merging DataFrames (Join), filtering traces (FilterOut), slicing data (Slice), and compressing data by removing empty columns (Compress). These operations are designed with performance considerations in mind.

Key Components and Files:

  • dataframe.go: This is the central file defining the DataFrame struct and its associated methods.

    • DataFrame struct:
    • TraceSet: Stores the actual performance data, mapping trace keys (strings representing parameter combinations like “,arch=x86,config=8888,”) to types.Trace (slices of float32 values).
    • Header: A slice of *ColumnHeader pointers, defining the columns of the DataFrame. Each ColumnHeader links a column to a specific commit (Offset) and its Timestamp.
    • ParamSet: A paramtools.ReadOnlyParamSet that contains all unique key-value pairs from the keys in TraceSet. This is crucial for understanding the dimensions of the data and for building UI controls for filtering. It's rebuilt by BuildParamSet().
    • Skip: An integer indicating if any commits were skipped during data retrieval to keep the DataFrame size manageable (related to MAX_SAMPLE_SIZE).
    • DataFrameBuilder interface: Defines the contract for objects that can construct DataFrame instances. This allows for different data sources or retrieval strategies. Key methods include:
    • NewFromQueryAndRange: Creates a DataFrame based on a query and a time range.
    • NewFromKeysAndRange: Creates a DataFrame for specific trace keys over a time range.
    • NewNFromQuery / NewNFromKeys: Creates a DataFrame with the N most recent data points for matching traces or specified keys.
    • NumMatches / PreflightQuery: Used to estimate the size of the data that a query will return, often for UI feedback or to refine queries.
    • ColumnHeader struct: Represents a single column in the DataFrame, typically corresponding to a commit. It contains:
    • Offset: A types.CommitNumber identifying the commit.
    • Timestamp: The timestamp of the commit in seconds since the Unix epoch.
    • Key Functions:
    • NewEmpty(): Creates an empty DataFrame.
    • NewHeaderOnly(): Creates a DataFrame with populated headers (commits within a time range) but no trace data. This can be useful for setting up the structure before fetching actual data.
    • FromTimeRange(): Retrieves commit information (headers and commit numbers) for a given time range from a perfgit.Git instance. This is a foundational step in populating the Header of a DataFrame.
    • MergeColumnHeaders(): A utility function that takes two slices of ColumnHeader and merges them into a single sorted slice, returning mapping indices to reconstruct traces. This is essential for the Join operation.
    • Join(): Combines two DataFrames into a new DataFrame. It merges their headers and trace data. If traces exist in one DataFrame but not the other for a given key, missing data points (vec32.MissingDataSentinel) are inserted. The ParamSet of the resulting DataFrame is the union of the input ParamSets. DataFrame A (Header: [C1, C3], TraceX: [v1, v3]) | V DataFrame B (Header: [C2, C3], TraceX: [v2', v3']) | V Joined DataFrame (Header: [C1, C2, C3], TraceX: [v1, v2', v3/v3']) (TraceY from A or B padded with missing data)
    • BuildParamSet(): Recalculates the ParamSet for a DataFrame based on the current keys in its TraceSet. This is called after operations like FilterOut that might change the set of traces.
    • FilterOut(): Removes traces from the TraceSet based on a provided TraceFilter function. It then calls BuildParamSet() to update the ParamSet.
    • Slice(): Returns a new DataFrame that is a view into a sub-section of the original DataFrame's columns. The underlying trace data is sliced, not copied, for efficiency.
    • Compress(): Creates a new DataFrame by removing any columns (and corresponding data points in traces) that contain only missing data sentinels across all traces. This helps in reducing data size and focusing on relevant data points.
  • dataframe_test.go: Contains unit tests for the functionality in dataframe.go. These tests cover various scenarios, including empty DataFrames, different merging and joining cases, filtering, slicing, and compression. The tests often use gittest for creating mock Git repositories to test time range queries.

  • /go/dataframe/mocks/DataFrameBuilder.go: This file contains a mock implementation of the DataFrameBuilder interface, generated using the testify/mock library. This mock is used in tests of other packages that depend on DataFrameBuilder, allowing them to simulate DataFrame creation without needing a real data source or Git repository.

Workflows:

  1. Fetching Data for Display/Analysis:

    • A client (e.g., a web UI) specifies a query and a time range.
    • An implementation of DataFrameBuilder (e.g., one that queries a CockroachDB instance) uses NewFromQueryAndRange.
    • Internally, this likely involves:
      1. Resolving the time range to a list of commits using FromTimeRange (which calls perfgit.Git.CommitSliceFromTimeRange). This populates the Header.
      2. Querying the data source for traces matching the query and falling within the identified commit range.
      3. Populating the TraceSet.
      4. Building the ParamSet using BuildParamSet().
    • The resulting DataFrame is returned.
    Client Request (Query, TimeRange)
        |
        V
    DataFrameBuilder.NewFromQueryAndRange(ctx, begin, end, query, ...)
        |
        +-> FromTimeRange(ctx, git, begin, end, ...)  // Get commit headers
        |       |
        |       V
        |   perfgit.Git.CommitSliceFromTimeRange()
        |       |
        |       V
        |   [ColumnHeader{Offset, Timestamp}, ...]
        |
        +-> DataSource.QueryTraces(query, commit_numbers) // Fetch trace data
        |       |
        |       V
        |   types.TraceSet
        |
        +-> DataFrame.BuildParamSet() // Populate ParamSet
        |
        V
    DataFrame{Header, TraceSet, ParamSet}
    
  2. Joining DataFrames (e.g., from different sources or queries):

    • Two DataFrame instances, dfA and dfB, are available.
    • Join(dfA, dfB) is called.
    • MergeColumnHeaders(dfA.Header, dfB.Header) creates a unified header and maps to align traces.
    • A new TraceSet is built. For each key:
      • If a key is in dfA but not dfB, its trace is copied, padded with missing values for columns unique to dfB.
      • If a key is in dfB but not dfA, its trace is copied, padded with missing values for columns unique to dfA.
      • If a key is in both, values are merged based on the unified header.
    • The ParamSets of dfA and dfB are combined.
    • A new, joined DataFrame is returned.
  3. Filtering Data:

    • A DataFrame df exists.
    • A TraceFilter function myFilter is defined (e.g., to remove traces with all zero values).
    • df.FilterOut(myFilter) is called.
    • The method iterates through df.TraceSet. If myFilter returns true for a trace, that trace is deleted from the TraceSet.
    • df.BuildParamSet() is called to reflect the potentially reduced set of parameters.

Constants:

  • DEFAULT_NUM_COMMITS: Default number of commits to fetch when using methods like NewNFromQuery. Set to 50.
  • MAX_SAMPLE_SIZE: A limit on the number of commits (columns) a DataFrame might contain, especially when downsampling. Set to 5000. (Note: The downsample parameter in FromTimeRange is currently ignored, meaning this might not be strictly enforced by that specific function directly but could be a target for other parts of the system or future enhancements.)

Module: /go/dfbuilder

The dfbuilder module is responsible for constructing DataFrame objects. DataFrames are fundamental data structures in Perf, representing a collection of performance traces (time series data) along with their associated parameters and commit information. This module acts as an intermediary between the raw trace data stored in a TraceStore and the higher-level analysis and visualization components that consume DataFrames.

The core design revolves around efficiently fetching and organizing trace data based on various querying criteria. This involves interacting with a perfgit.Git instance to resolve commit ranges and timestamps, and a tracestore.TraceStore to retrieve the actual trace data.

Key Responsibilities and Components:

  • dfbuilder.go: This is the central file implementing the DataFrameBuilder interface.
    • builder struct: This struct holds the necessary dependencies like perfgit.Git, tracestore.TraceStore, tracecache.TraceCache, and configuration parameters (e.g., tileSize, numPreflightTiles, QueryCommitChunkSize). It also maintains metrics for various DataFrame construction operations.
    • Construction (NewDataFrameBuilderFromTraceStore): Initializes a builder instance. An important configuration here is filterParentTraces. If enabled, the builder will attempt to remove redundant parent traces when child traces (more specific traces) exist. For example, if traces for test=foo,subtest=bar and test=foo both exist, the latter might be filtered out if filterParentTraces is true.
    • Fetching by Time Range and Query (NewFromQueryAndRange):
    • Why: This is a common use case where users want to see traces matching a specific query (e.g., config=8888) within a given time period.
    • How: 1. It first uses dataframe.FromTimeRange (which internally queries perfgit.Git) to get a list of ColumnHeader (commit information) and CommitNumbers within the specified time range. It also handles downsampling if requested. 2. It then determines the relevant tiles to query from the TraceStore based on the commit numbers (sliceOfTileNumbersFromCommits). 3. The core data fetching happens in the new method. This method queries the TraceStore for matching traces per tile concurrently using errgroup.Group for parallelism. This is a key optimization to speed up data retrieval, especially over large time ranges spanning multiple tiles. 4. A tracesetbuilder.TraceSetBuilder is used to efficiently aggregate the traces fetched from different tiles into a single types.TraceSet and paramtools.ParamSet. 5. Finally, it constructs and returns a compressed DataFrame. NewFromQueryAndRange | -> dataframe.FromTimeRange (get commits in time range from Git) | -> sliceOfTileNumbersFromCommits (determine tiles to query) | -> new (concurrently query TraceStore for each tile) | -> TraceStore.QueryTraces (for each tile) | -> tracesetbuilder.Add (aggregate results) | -> tracesetbuilder.Build | -> DataFrame.Compress
    • Fetching by Keys and Time Range (NewFromKeysAndRange):
    • Why: Used when the specific trace keys are already known, and data for these keys is needed within a time range.
    • How: Similar to NewFromQueryAndRange in terms of getting commit information for the time range. However, instead of querying by a query.Query object, it directly calls TraceStore.ReadTraces for each relevant tile, providing the list of trace keys. Results are then aggregated. This is generally faster if the exact trace keys are known as it avoids the overhead of query parsing and matching within the TraceStore.
    • Fetching N Most Recent Data Points (NewNFromQuery, NewNFromKeys):
    • Why: Often, users are interested in the N most recent data points for a query or a set of keys, typically for displaying recent trends or for alert evaluation.
    • How: These methods work by iterating backward in time, tile by tile (or by QueryCommitChunkSize if configured), until N data points are collected for the matching traces. 1. It starts from a given end time (or the latest commit if end is zero). 2. It determines an initial beginIndex and endIndex for commit numbers. The QueryCommitChunkSize can influence this beginIndex to fetch a larger chunk of commits at once, potentially improving parallelism in the new method. 3. In a loop: - It fetches commit headers and indices for the current beginIndex-endIndex range. - It calls the new method (for NewNFromQuery) or a similar tile-based fetching logic (for NewNFromKeys) to get a DataFrame for this smaller range. - It counts non-missing data points in the fetched DataFrame. If no data is found for maxEmptyTiles consecutive attempts, it stops to prevent searching indefinitely through sparse data. - It appends the data from the fetched DataFrame to the result DataFrame, working backward from the Nth slot. - It then adjusts beginIndex and endIndex to move to the previous chunk of commits/tiles. 4. If filterParentTraces is enabled, it calls filterParentTraces to remove redundant parent traces from the final TraceSet. 5. The resulting DataFrame might have traces of length less than N if not enough data points were found. It trims the traces if necessary. NewNFromQuery (or NewNFromKeys) | -> findIndexForTime (get commit number for 'end' time) | -> Loop (until N points are found or maxEmptyTiles reached): | -> fromIndexRange (get commits for current chunk) | -> new (or similar logic for keys) (fetch data for this chunk) | -> Aggregate data into result DataFrame | -> Update beginIndex/endIndex to previous chunk | -> [Optional] filterParentTraces | -> Trim traces if fewer than N points found
    • Preflighting Queries (PreflightQuery):
    • Why: Before executing a potentially expensive query to fetch a full DataFrame, it's useful to get an estimate of how many traces will match and what the resulting ParamSet will look like. This allows UIs to present filter options dynamically.
    • How: 1. It fetches the latest tile number from the TraceStore. 2. It queries the numPreflightTiles most recent tiles (concurrently) for trace IDs matching the query q. This uses getTraceIds, which first attempts to fetch from tracecache and falls back to TraceStore.QueryTracesIDOnly. 3. The trace IDs (which are paramtools.Params) found are used to build up a ParamSet. 4. The count of matching traces from the tile with the most matches is taken as the estimated count. 5. Crucially, for parameter keys present in the input query q, it replaces the values in the computed ParamSet with all values for those keys from the referenceParamSet. This ensures that the UI can still offer all possible filter options for parameters the user has already started filtering on. 6. The resulting ParamSet is normalized. PreflightQuery | -> TraceStore.GetLatestTile | -> Loop (for numPreflightTiles, concurrently): | -> getTraceIds (TileN, query) // Checks tracecache first, then TraceStore.QueryTracesIDOnly | -> [If cache miss] TraceStore.QueryTracesIDOnly | -> [If cache miss & tracecache enabled] tracecache.CacheTraceIds | -> Aggregate Params into a new ParamSet -> Update max count | -> Update ParamSet with values from referenceParamSet for keys in the original query | -> Normalize ParamSet
    • Counting Matches (NumMatches):
    • Why: A simpler version of PreflightQuery that only returns the estimated number of matching traces.
    • How: It queries the two most recent tiles using TraceStore.QueryTracesIDOnly and returns the higher of the two counts.
    • Parent Trace Filtering (filterParentTraces function):
    • Why: To reduce data redundancy and present a cleaner set of traces to the user, especially in UIs where deeply nested subtests can create many similar-looking parent traces.
    • How: It uses tracefilter.NewTraceFilter(). For each trace key in the input TraceSet:
      1. The key is parsed into paramtools.Params.
      2. A “path” is constructed from the parameter values based on a predefined order of keys (e.g., “master”, “bot”, “benchmark”, “test”, “subtest_1”, ...).
      3. This path and the original trace key are added to the traceFilter.
      4. After processing all keys, traceFilter.GetLeafNodeTraceKeys() returns only the keys corresponding to the most specific (leaf) traces in the hierarchical structure implied by the paths.
      5. A new TraceSet is built containing only these leaf node traces.
    • Caching (getTraceIds, cacheTraceIdsIfNeeded):
    • Why: QueryTracesIDOnly can still be somewhat expensive if performed frequently on the same tiles and queries (e.g., during PreflightQuery). Caching the results (the list of matching trace IDs/params) can significantly speed this up.
    • How: The getTraceIds function first attempts to retrieve trace IDs from the tracecache.TraceCache. If there's a cache miss or the cache is not configured, it queries the TraceStore. If a database query was performed and the cache is configured, cacheTraceIdsIfNeeded is called to store the results in the cache for future requests. The cache key is typically a combination of the tile number and the query string.

Design Choices and Trade-offs:

  • Tile-Based Processing: The TraceStore organizes data into tiles. Most dfbuilder operations that involve fetching data across a range of commits are designed to process these tiles concurrently. This improves performance by parallelizing I/O and computation.
  • tracesetbuilder: This utility is used to efficiently merge trace data coming from different tiles (which might have different sets of commits) into a coherent TraceSet and ParamSet.
  • QueryCommitChunkSize: This parameter in NewNFromQuery allows fetching data in larger chunks than a single tile. This can increase parallelism in the underlying new method call, but fetching too large a chunk might lead to excessive memory usage or longer latency for the first chunk.
  • maxEmptyTiles / newNMaxSearch: When searching backward for N data points, these constants prevent indefinite searching if the data is very sparse or the query matches very few traces.
  • singleTileQueryTimeout: This guards against queries on individual tiles taking too long, which could happen with “bad” tiles containing excessive data or due to backend issues. This is particularly important for operations like NewNFromQuery or PreflightQuery which might issue many such single-tile queries.
  • Caching for PreflightQuery: PreflightQuery is often called by UIs to populate filter options. Caching the results of QueryTracesIDOnly (which provides the raw data for ParamSet construction in preflight) via tracecache helps make these UI interactions faster.
  • Parent Trace Filtering: This is an opinionated feature that aims to improve usability by default. The specific heuristic for identifying “parent” vs. “child” traces is based on a predefined order of parameter keys.

The dfbuilder_test.go file provides comprehensive unit tests for these functionalities, covering various scenarios including empty queries, queries matching data in single or multiple tiles, N-point queries, and preflight operations with and without caching. It uses gittest for creating a mock Git history and sqltest (for Spanner) or mock implementations for the TraceStore and TraceCache.

Module: /go/dfiter

dfiter Module Documentation

Overview

The dfiter module is responsible for efficiently creating and providing dataframe.DataFrame objects, which are fundamental data structures used in regression detection within the Perf application. It acts as an iterator, allowing consuming code to process DataFrames one by one. This is particularly useful for performance reasons, as constructing and holding all possible DataFrames in memory simultaneously could be resource-intensive.

The core purpose of dfiter is to abstract away the complexities of fetching and structuring data from the underlying trace store and Git history. It ensures that DataFrames are generated with the correct dimensions and data points based on user-defined queries, commit ranges, and alert configurations.

Design and Implementation Choices

The dfiter module employs a “slicing” strategy for generating DataFrames. This means it typically fetches a larger, encompassing DataFrame from the dataframe.DataFrameBuilder and then yields smaller, overlapping sub-DataFrames.

Why this approach?

  • Efficiency: Fetching a larger chunk of data once from the database (via DataFrameBuilder) is often more efficient than making numerous small queries. The slicing operation itself is a relatively cheap in-memory operation.
  • Context for Regression Detection: Regression detection algorithms often need to look at data points before and after a specific commit (the “radius” of an alert). The slicing approach naturally provides this sliding window of context.

Key Components and Responsibilities:

  • DataFrameIterator Interface:
    • Why: Defines a standard contract for iterating over DataFrames. This promotes loose coupling, allowing different implementations of DataFrame generation if needed in the future, and simplifies how other parts of the system consume DataFrames.
    • How: It provides two methods:
    • Next() bool: Advances the iterator to the next DataFrame. Returns true if a next DataFrame is available, false otherwise.
    • Value(ctx context.Context) (*dataframe.DataFrame, error): Returns the current DataFrame.
  • dataframeSlicer struct:
    • Why: This is the concrete implementation of DataFrameIterator. It embodies the slicing strategy described above.
    • How: It holds a reference to a larger, source dataframe.DataFrame (df), the desired size of the sliced DataFrames (determined by alert.Radius), and the current offset for slicing. The Next() method checks if another slice of the specified size can be made, and Value() performs the actual slicing using df.Slice().
  • NewDataFrameIterator Function:
    • Why: This is the factory function for creating DataFrameIterator instances. It encapsulates the logic for determining how the initial, larger DataFrame should be fetched based on the input parameters.
    • How:
    • Parameter Parsing: Parses the input queryAsString into a query.Query object.
    • Mode Determination (Implicit): The function behaves differently based on domain.Offset: - domain.Offset == 0 (Continuous/Sliding Window Mode): - This mode is typically used for ongoing regression detection across a range of recent commits. - It fetches a DataFrame of domain.N commits ending at domain.End. - Settling Time: If anomalyConfig.SettlingTime is configured, it adjusts domain.End to exclude very recent data points that might not have “settled” (e.g., due to data ingestion delays or pending backfills). This prevents alerts on potentially incomplete or volatile fresh data. - The dataframeSlicer will then produce overlapping DataFrames of size 2*alert.Radius + 1. - domain.Offset != 0 (Specific Commit/Exact DataFrame Mode): - This mode is used when analyzing a specific commit or a small, fixed window around it (e.g., when a user clicks on a specific point in a chart to see its details or re-runs detection for a particular regression). - It aims to return a single DataFrame. - The size of this DataFrame is 2*alert.Radius + 1. - To determine the End time for fetching data, it calculates the commit alert.Radius positions after the domain.Offset. This ensures the commit at domain.Offset is centered within the radius. For example, if domain.Offset is commit 21 and alert.Radius is 3, it will fetch data up to commit 24 (21 + 3). The resulting DataFrame will then contain commits [18, 19, 20, 21, 22, 23, 24]. This is a specific requirement to ensure consistency with how different step detection algorithms expect their input DataFrames.
    • Data Fetching: Uses the injected dataframe.DataFrameBuilder (dfBuilder) to construct the initial DataFrame (dfBuilder.NewNFromQuery). This involves querying the trace store and potentially Git history.
    • Data Sufficiency Check: Verifies if the fetched DataFrame contains enough data points (at least 2*alert.Radius + 1 commits). If not, it returns ErrInsufficientData. This is crucial because regression detection algorithms require a minimum amount of data to operate correctly.
    • Metrics: Records the number of floating-point values queried from the database using metrics2.GetCounter("perf_regression_detection_floats"). This helps in monitoring the data processing load.
    • Iterator Instantiation: Creates and returns a dataframeSlicer instance initialized with the fetched DataFrame and the calculated slice size.
  • ErrInsufficientData:
    • Why: A specific error type to indicate that while the queries were successful, the available data didn't meet the minimum requirements (e.g., not enough commits within the requested range or matching the query). This allows calling code to handle this scenario gracefully, perhaps by informing the user or adjusting parameters.

Key Workflows

1. Continuous Regression Detection (Sliding Window):

This typically happens when domain.Offset is 0.

[Caller]                                     [NewDataFrameIterator]                                      [DataFrameBuilder]
   | -- Request with query, domain (N, End), alert (Radius) --> |                                                            |
   |                                                            | -- Parse query                                             |
   |                                                            | -- (If anomalyConfig.SettlingTime > 0) Adjust domain.End --> |
   |                                                            | -- dfBuilder.NewNFromQuery(ctx, domain.End, q, domain.N) --> |
   |                                                            |                                                            | -- Query TraceStore
   |                                                            |                                                            | -- Build large DataFrame
   |                                                            |                                                            | <----- DataFrame (df)
   |                                                            | -- Check if len(df.Header) >= 2*Radius+1                   |
   |                                                            | -- (If insufficient) Return ErrInsufficientData ----------- |
   |                                                            | -- Create dataframeSlicer(df, size=2*Radius+1, offset=0)   |
   | <----------------- DataFrameIterator (slicer) ------------- |                                                            |

[Caller]                                     [dataframeSlicer]
   |                                                            |
   | -- it.Next() ---------------------------------------------> |
   |                                                            | -- return offset+size <= len(df.Header)
   | <------------------------------ true ---------------------- |
   | -- it.Value() --------------------------------------------> |
   |                                                            | -- subDf = df.Slice(offset, size)
   |                                                            | -- offset++
   | <-------------------------- subDf, nil -------------------- |
   | -- (Process subDf)                                         |
   | ... (loop Next()/Value() until Next() returns false) ...   |

2. Specific Commit Analysis (Exact DataFrame):

This typically happens when domain.Offset is non-zero.

[Caller]                                           [NewDataFrameIterator]                                    [Git]      [DataFrameBuilder]
   | -- Request with query, domain (Offset), alert (Radius) --> |                                                          |                   |
   |                                                          | -- Parse query                                           |                   |
   |                                                          | -- targetCommitNum = domain.Offset + alert.Radius        |                   |
   |                                                          | -- perfGit.CommitFromCommitNumber(targetCommitNum) ------> |                   |
   |                                                          |                                                          | -- Lookup commit  |
   |                                                          | <----------------------------- commitDetails, nil --------- |                   |
   |                                                          | -- dfBuilder.NewNFromQuery(ctx, commitDetails.Timestamp, |                   |
   |                                                          |                            q, n=2*Radius+1) ------------> |                   |
   |                                                          |                                                          |                   | -- Query TraceStore
   |                                                          |                                                          |                   | -- Build DataFrame (size 2*R+1)
   |                                                          | <-------------------------------------------------------- DataFrame (df) ----- |
   |                                                          | -- Check if len(df.Header) >= 2*Radius+1                 |                   |
   |                                                          | -- (If insufficient) Return ErrInsufficientData --------- |                   |
   |                                                          | -- Create dataframeSlicer(df, size=2*Radius+1, offset=0) |                   |
   | <----------------------- DataFrameIterator (slicer) ------ |                                                          |                   |

[Caller]                                     [dataframeSlicer]
   |                                                            |
   | -- it.Next() ---------------------------------------------> |
   |                                                            | -- return offset+size <= len(df.Header) (true for the first call)
   | <------------------------------ true ---------------------- |
   | -- it.Value() --------------------------------------------> |
   |                                                            | -- subDf = df.Slice(offset, size) (returns the whole df)
   |                                                            | -- offset++
   | <-------------------------- subDf, nil -------------------- |
   | -- (Process subDf)                                         |
   | -- it.Next() ---------------------------------------------> |
   |                                                            | -- return offset+size <= len(df.Header) (false for subsequent calls)
   | <------------------------------ false --------------------- |

This design allows for flexible and efficient generation of DataFrames tailored to the specific needs of regression detection, whether it's scanning a wide range of recent commits or focusing on a particular point in time. The use of an iterator pattern also helps manage memory consumption by processing DataFrames sequentially.

Module: /go/dryrun

Dryrun Module Documentation

Overview

The dryrun module provides the capability to test an alert configuration and preview the regressions it would identify without actually creating an alert or sending notifications. This is a crucial tool for developers and performance engineers to fine-tune alert parameters and ensure they accurately capture relevant performance changes.

The core idea is to simulate the regression detection process for a given alert configuration over a historical range of data. This allows users to iterate on alert definitions, observe the potential impact of those definitions, and avoid alert fatigue caused by poorly configured alerts.

Responsibilities and Key Components

The primary responsibility of the dryrun module is to handle HTTP requests for initiating and reporting the progress of these alert simulations.

Key Files and Components:

  • dryrun.go: This is the heart of the dryrun module. It defines the Requests struct, which manages the state and dependencies required for processing dry run requests. It also contains the HTTP handler (StartHandler) that orchestrates the dry run process.

    • Requests struct:

    • Why: Encapsulates all necessary dependencies (like perfgit.Git for Git interactions, shortcut.Store for shortcut lookups, dataframe.DataFrameBuilder for data retrieval, progress.Tracker for reporting progress, and regression.ParamsetProvider for accessing parameter sets) into a single unit. This promotes modularity and makes it easier to manage and test the dry run functionality.

    • How: It is instantiated via the New function, which takes these dependencies as arguments. This allows for dependency injection, making the component more testable and flexible.

    • StartHandler function:

    • Why: This is the entry point for initiating a dry run. It handles the incoming HTTP request, validates the alert configuration, and kicks off the asynchronous regression detection process.

    • How:

      1. It decodes the alert configuration from the HTTP request body.
      2. It performs initial validation on the alert query and other parameters. If validation fails, an error is immediately reported to the client.
      3. It uses a progress.Tracker to allow clients to monitor the status of the long-running dry run operation.
      4. Crucially, it launches the actual regression detection in a separate goroutine. This is essential because regression detection can be a time-consuming process, and blocking the HTTP handler would lead to timeouts and poor user experience.
      5. It defines a detectorResponseProcessor callback function. This function is invoked by the underlying regression.ProcessRegressions function whenever potential regressions are found.
        • Why (callback): This design decouples the core regression detection logic from the specifics of how dry run results are formatted and reported. It allows the regression module to focus on detection, while the dryrun module handles the presentation and progress updates for the dry run scenario.
        • How (callback): The callback processes the raw ClusterResponse objects from the regression detection, converts them into user-friendly RegressionAtCommit structures (which include commit details and the detected regression), and updates the Progress object with these results. This enables real-time feedback to the user as regressions are identified.
      6. The regression.ProcessRegressions function is then called in the goroutine, passing the alert request, the callback, and other necessary dependencies. This function iterates through the relevant data, applies the alert's clustering and detection logic, and invokes the callback for each identified cluster.
      7. The handler immediately responds to the client with the initial Progress object, allowing the client to start polling for updates.
    • RegressionAtCommit struct:

    • Why: Provides a structured way to represent a regression found at a specific commit. This includes both the commit information (CID) and the details of the regression itself (Regression).

    • How: It's a simple struct used for marshalling the results into JSON for the client.

Workflows

Dry Run Initiation and Processing:

Client (UI/API) --HTTP POST /dryrun/start with AlertConfig--> Requests.StartHandler
                                                                    |
                                                                    V
                                                        [Validate AlertConfig]
                                                                    |
                                 +----------------------------------+----------------------------------+
                                 | (Validation Fails)                                                | (Validation Succeeds)
                                 V                                                                   V
                        [Update Progress with Error]                                     [Add to Progress Tracker]
                                 |                                                                   |
                                 V                                                                   V
                  Respond to Client with Error Progress             Launch Goroutine: regression.ProcessRegressions(...)
                                                                    |
                                                                    V
                                                  [Iterate through data, detect regressions]
                                                                    |
                                                                    V
                                             For each potential regression cluster:
                                             Invoke `detectorResponseProcessor` callback
                                                                    |
                                                                    V
                                              Callback: [Convert ClusterResponse to RegressionAtCommit]
                                                        [Update Progress with new RegressionAtCommit]
                                                                    |
                                                                    V
                                                      (Client polls for Progress updates)
                                                                    |
                                                                    V
                                                   When ProcessRegressions completes:
                                                   [Update Progress: Finished or Error]

The StartHandler effectively acts as a controller that receives the request, performs initial setup and validation, and then delegates the heavy lifting of regression detection to the regression.ProcessRegressions function, ensuring the HTTP request can return quickly while the background processing continues. The callback mechanism allows the dryrun module to react to findings from the regression module in a way that's specific to the dry run use case (i.e., accumulating and formatting results for client display).

Module: /go/favorites

Favorites Module

The favorites module provides functionality for users to save and manage “favorite” configurations or views within the Perf application. This allows users to quickly return to specific data explorations or commonly used settings.

The core design philosophy is to provide a persistent storage mechanism for user-specific preferences related to application state (represented as URLs). This is achieved through a Store interface, which abstracts the underlying data storage, and a concrete SQL-based implementation.

Key Components and Responsibilities

  • store.go: This file defines the central Store interface.

    • Why: The interface decouples the business logic of managing favorites from the specific database implementation. This promotes testability (using mocks) and allows for potential future changes to the storage backend without impacting the core application logic.
    • How: It specifies the fundamental CRUD (Create, Read, Update, Delete) operations for favorites, along with a List operation to retrieve all favorites for a specific user and a Liveness check.
    • Favorite: This struct represents a single favorite item, containing fields like ID, UserId, Name, Url, Description, and LastModified. The Url is a key piece of data, as it allows the application to reconstruct the state the user wants to save.
    • SaveRequest: This struct is used for creating and updating favorites, encapsulating the data needed for these operations, notably excluding the ID (which is generated or already known) and LastModified (which is handled by the store).
    • Liveness: The Liveness method is a bit of an outlier. It's used to check the health of the database connection. It was placed in this store “arbitrarily because of its lack of essential function” in other more critical stores, making it a relatively safe place to perform this check without impacting core performance data operations.
  • sqlfavoritestore/sqlfavoritestore.go: This file provides the SQL implementation of the Store interface.

    • Why: CockroachDB (or a similar SQL database) is used as the persistent storage for favorites. This choice provides a robust, scalable, and transactional way to manage this data.
    • How:
    • It defines SQL statements for each operation in the Store interface. These statements interact with a Favorites table.
    • The FavoriteStore struct holds a database connection pool (pool.Pool).
    • Methods like Get, Create, Update, Delete, and List execute their corresponding SQL statements against the database.
    • Timestamps (LastModified) are handled automatically during create and update operations to track when a favorite was last changed.
    • Error handling is done using skerr.Wrapf to provide context to any database errors.
  • sqlfavoritestore/schema/schema.go: This file defines the SQL schema for the Favorites table.

    • Why: It provides a structured, Go-based representation of the database table. This can be useful for schema management, migrations, and ORM-like interactions (though a full ORM isn't used here).
    • How: The FavoriteSchema struct uses struct tags (sql:"...") to define column names, types, constraints (like PRIMARY KEY, NOT NULL), and indexes. The byUserIdIndex is crucial for efficiently listing favorites for a specific user.
  • mocks/Store.go: This file contains a generated mock implementation of the Store interface.

    • Why: Mocks are essential for unit testing components that depend on the Store interface. They allow tests to simulate different store behaviors (e.g., successful operations, errors) without requiring an actual database connection.
    • How: This file is auto-generated by the mockery tool. It provides a Store struct that embeds mock.Mock from the testify library. Each method of the interface has a corresponding mock function that can be configured to return specific values or errors.

Key Workflows

1. Creating a New Favorite:

User Action (e.g., clicks "Save as Favorite" in UI)
    |
    V
Application Handler
    |
    V
[favorites.Store.Create] is called with user ID, name, URL, description
    |
    V
[sqlfavoritestore.FavoriteStore.Create]
    |
    V
Generates current timestamp for LastModified
    |
    V
Executes INSERT SQL statement:
  INSERT INTO Favorites (user_id, name, url, description, last_modified) VALUES (...)
    |
    V
Database stores the new favorite record
    |
    V
Returns success/error to Application Handler

2. Listing User's Favorites:

User navigates to "My Favorites" page
    |
    V
Application Handler
    |
    V
[favorites.Store.List] is called with the current user's ID
    |
    V
[sqlfavoritestore.FavoriteStore.List]
    |
    V
Executes SELECT SQL statement:
  SELECT id, name, url, description FROM Favorites WHERE user_id=$1
    |
    V
Database returns rows matching the user ID
    |
    V
[sqlfavoritestore.FavoriteStore.List] scans rows into []*favorites.Favorite
    |
    V
Returns list of favorites to Application Handler
    |
    V
UI displays the list

3. Retrieving a Specific Favorite (e.g., when a user clicks on a favorite to load it):

User clicks on a specific favorite in their list
    |
    V
Application Handler (obtains favorite ID)
    |
    V
[favorites.Store.Get] is called with the favorite ID
    |
    V
[sqlfavoritestore.FavoriteStore.Get]
    |
    V
Executes SELECT SQL statement:
  SELECT id, user_id, name, url, description, last_modified FROM Favorites WHERE id=$1
    |
    V
Database returns the single matching favorite row
    |
    V
[sqlfavoritestore.FavoriteStore.Get] scans row into a *favorites.Favorite struct
    |
    V
Returns the favorite object to Application Handler
    |
    V
Application uses the `Url` from the favorite object to restore the application state

Module: /go/file

The file module and its submodules are responsible for providing a unified interface for accessing files from different sources, such as local directories or Google Cloud Storage (GCS). This abstraction allows the Perf ingestion system to treat files consistently regardless of their origin.

Core Concepts

The central idea is to define a file.Source interface that abstracts the origin of files. Implementations of this interface are then responsible for monitoring their respective sources (e.g., a GCS bucket via Pub/Sub notifications, or a local directory) and emitting file.File objects through a channel when new files become available.

The file.File struct encapsulates the essential information about a file: its name, an io.ReadCloser for its contents, its creation timestamp, and optionally, the associated pubsub.Message if the file originated from a GCS Pub/Sub notification. This optional field is crucial for acknowledging the message after successful processing, or nack'ing it if an error occurs, ensuring reliable message handling in a distributed system.

file.go

This file defines the core File struct and the Source interface.

  • File struct: Represents a single file.

    • Name: The identifier for the file (e.g., gs://bucket/object or a local path).
    • Contents: An io.ReadCloser to read the file's content. This design allows for streaming file data, which is memory-efficient, especially for large files. The consumer is responsible for closing this reader.
    • Created: The timestamp when the file was created or last modified (depending on the source).
    • PubSubMsg: A pointer to a pubsub.Message. This is populated if the file notification came from a Pub/Sub message (e.g., GCS object change notifications). It's used to Ack or Nack the message, indicating successful processing or a desire to retry/dead-letter.
  • Source interface: Defines the contract for file sources.

    • Start(ctx context.Context) (<-chan File, error): This method initiates the process of watching for new files. It returns a read-only channel (<-chan File) through which File objects are sent as they are discovered. The method is designed to be called only once per Source instance. This design ensures that the resource setup and monitoring logic (like starting a Pub/Sub subscription listener or initiating a directory walk) is done once.

Implementations of file.Source

dirsource

The dirsource submodule provides an implementation of file.Source that reads files from a local filesystem directory.

  • Purpose: Primarily intended for testing and demonstration purposes. It allows developers to simulate file ingestion locally without needing to set up GCS or Pub/Sub.
  • Mechanism:
    • New(dir string): Constructs a DirSource for a given directory path. It resolves the path to an absolute path.
    • Start(_ context.Context): When called, it initiates a filepath.Walk over the specified directory.
    • For each regular file encountered, it opens the file and creates a file.File object.
    • The ModTime of the file is used as the Created timestamp, which is a known simplification for its intended use cases.
    • The file.File objects are sent to an unbuffered channel.
    • The channel is closed after the directory walk is complete.
  • Limitations:
    • It performs a one-time walk of the directory. It does not watch for new files or changes to existing files after the initial walk.
    • It uses the file's modification time as the creation time.
  • Workflow: New(directory) -> DirSource instance | V DirSource.Start() --> Goroutine starts | V filepath.Walk(directory) | +----------------------+ | | V V For each file: For each directory: os.Open(path) (skip) Create file.File{Name, Contents, ModTime} Send file.File to channel | V Caller receives file.File from channel

gcssource

The gcssource submodule implements file.Source for files stored in Google Cloud Storage, using Pub/Sub notifications for new file events.

  • Purpose: This is the production-grade implementation for ingesting files from GCS. It's designed to be robust and scalable.
  • Mechanism:
    • New(ctx context.Context, instanceConfig *config.InstanceConfig):
    • Initializes GCS and Pub/Sub clients using default application credentials.
    • Constructs a Pub/Sub subscription name. It can either use a pre-configured Subscription name from instanceConfig or generate one based on the Topic (often adding a suffix like -prod or using a round-robin scheme for load distribution if multiple ingester instances are running).
    • Creates a sub.Subscription object to manage receiving messages from the configured Pub/Sub topic/subscription. A key configuration here is ReceiveSettings.MaxExtension = -1. This disables automatic ack deadline extension by the Pub/Sub client library. The rationale is that the gcssource itself will explicitly Ack or Nack messages. If automatic extension were enabled and the processing of a file took longer than the extension period, the message might be redelivered while still being processed, leading to duplicate processing or other issues. By disabling it, the ingester has full control over the message lifecycle.
    • Initializes a filter.Filter based on AcceptIfNameMatches and RejectIfNameMatches regular expressions provided in the instanceConfig. This allows for fine-grained control over which files are processed based on their GCS object names.
    • Determines if dead-lettering is enabled based on the instance configuration.
    • Start(ctx context.Context):
    • Creates the output channel for file.File objects.
    • Launches a goroutine that continuously calls s.subscription.Receive(ctx, s.receiveSingleEventWrapper).
      • The Receive method blocks until a message is available or the context is cancelled.
      • receiveSingleEventWrapper is called for each Pub/Sub message.
  • File Event Processing (receiveSingleEventWrapper and receiveSingleEvent):
    1. Deserialize Event: The Pub/Sub message Data is expected to be a JSON payload describing a GCS object event (specifically, {"bucket": "...", "name": "..."}).
    2. Filename Construction: A gs:// URI is constructed from the bucket and name.
    3. Filename Filtering: The filter.Filter (configured with regexes) is applied. If the filename is rejected, the message is acked (as there's no point retrying), and processing stops for this event.
    4. Source Prefix Check: The filename is checked against the Sources list in instanceConfig.IngestionConfig.SourceConfig.Sources. These are typically gs:// prefixes. If the filename doesn‘t match any of these prefixes, it’s considered an unexpected file, the message is acked, and processing stops. This ensures that the ingester only processes files from explicitly configured GCS locations.
    5. Fetch GCS Object Attributes: obj.Attrs(ctx) is called to get metadata like the creation time. If this fails (e.g., object deleted between notification and processing, or transient network error), the message is nacked (if dead-lettering is not enabled) or handled by the dead-letter policy, as retrying might succeed.
    6. Stream GCS Object Contents: obj.NewReader(ctx) is called to get an io.ReadCloser for the file's content. If this fails, the message is nacked (or dead-lettered).
    7. Send file.File: A file.File struct is created with the GCS path, the reader, the attrs.Created time, and the original pubsub.Message. This file.File is sent to the fileChannel.
    8. Message Acknowledgement:
      • The receiveSingleEvent function returns true if the initial stages of processing (up to sending to the channel) were successful and the message should be acked from Pub/Sub's perspective (meaning it was valid, filtered appropriately, and the object was accessible). It returns false for transient errors where a retry might help (e.g., failing to get object attributes or reader).
      • The receiveSingleEventWrapper then uses this boolean:
      • If dead-lettering is enabled (s.deadLetterEnabled):
        • If receiveSingleEvent returned false (transient error or should retry), the message is Nack()-ed. This typically sends it to a dead-letter topic if configured, or allows Pub/Sub to redeliver it after a backoff.
        • If receiveSingleEvent returned true, the message is not explicitly Ack()-ed here. The acknowledgement is deferred to the consumer of the file.File (i.e., the ingester). This is a critical design choice: the message is only truly “done” when the file content has been fully processed by the downstream system.
      • If dead-lettering is not enabled:
        • If receiveSingleEvent returned true, the message is Ack()-ed.
        • If receiveSingleEvent returned false, the message is Nack()-ed.
  • Key Design Choices:
    • Decoupling from Pub/Sub Ack/Nack: The gcssource itself doesn‘t always immediately Ack messages upon successful GCS interaction. Instead, it passes the *pubsub.Message along in the file.File struct. This allows the ultimate consumer of the file’s content (e.g., the Perf ingestion pipeline) to Ack the message only after it has successfully processed and stored the data. This provides end-to-end processing guarantees. If processing fails downstream, the message can be Nack-ed, leading to a retry or dead-lettering.
    • Filtering: Multiple layers of filtering (regex-based filter.Filter and prefix-based SourceConfig.Sources) ensure that only desired files are processed.
    • Error Handling: Distinguishes between errors that warrant an Ack (e.g., file explicitly filtered out) and those that warrant a Nack (e.g., transient GCS errors), especially when dead-letter queues are in use.
    • Scalability: Uses a configurable number of parallel receivers (maxParallelReceives) for Pub/Sub messages, although currently set to 1. This can be tuned for performance.
  • Workflow (Simplified): New(config) -> GCSSource instance (GCS/PubSub clients, filter initialized) | V GCSSource.Start() --> Goroutine starts PubSub subscription.Receive loop | V PubSub message arrives | V receiveSingleEventWrapper(msg) | V receiveSingleEvent(msg) | +-> Deserialize msg data (JSON: bucket, name) -> Error? Ack, return. | +-> Filter filename (regex) -> Rejected? Ack, return. | +-> Check if filename matches config.Sources prefixes -> No match? Ack, return. | +-> GCS: storageClient.Object(bucket, name).Attrs() -> Error? Nack (retryable), return. | +-> GCS: object.NewReader() -> Error? Nack (retryable), return. | V Create file.File{Name, Contents, Created, PubSubMsg: msg} Send file.File to fileChannel | V Caller receives file.File from channel (Caller later Acks/Nacks msg via file.File.PubSubMsg)

This modular approach to file sourcing makes the Perf ingestion system flexible and easier to test and maintain. New file sources can be added by simply implementing the file.Source interface.

Module: /go/filestore

The filestore module provides an abstraction layer for interacting with different file storage systems. It defines a common interface, leveraging Go's io/fs.FS, allowing the application to read files regardless of whether they are stored locally or in a cloud storage service like Google Cloud Storage (GCS). This design promotes flexibility and testability by decoupling file access logic from the specific storage implementation.

The primary goal is to enable Perf, the performance monitoring system, to seamlessly access data files from various sources. Perf often deals with large datasets and trace files, which might be stored in GCS for scalability and durability or locally during development and testing. By using this module, Perf components can be written to consume data using the standard fs.FS interface without needing to know the underlying storage details.

Key components:

  • local: This submodule provides an implementation of fs.FS for the local file system.

    • Why: It's essential for local development, testing, and scenarios where data is directly available on the machine running Perf.
    • How: The local.New(rootDir string) function initializes a filesystem struct. This struct stores the absolute path to a rootDir and uses os.DirFS(rootPath) to create an fs.FS instance scoped to that directory. When Open(name string) is called, it calculates the path relative to rootDir and then uses the underlying os.DirFS to open the file. This ensures that file access is contained within the specified root directory.
    • The local.go file contains the filesystem struct and its methods. The core logic resides in the New function for initialization and the Open method for file access. filepath.Abs and filepath.Rel are used to correctly handle and relativize paths.
  • gcs: This submodule implements fs.FS for Google Cloud Storage.

    • Why: GCS is a common choice for storing large amounts of data in a scalable and accessible manner. Perf relies on GCS for storing trace files and other performance artifacts.
    • How: The gcs.New(ctx context.Context) function initializes a filesystem struct. It authenticates with GCS using google.DefaultTokenSource to obtain an OAuth2 token source and then creates a *storage.Client. The Open(name string) method expects a GCS URI (e.g., gs://bucket-name/path/to/file). It parses this URI into a bucket name and object path using parseNameIntoBucketAndPath. Then, it uses the storage.Client to get a *storage.Reader for the specified object. This reader is wrapped in a custom file struct which implements fs.File.
    • The gcs.go file defines the filesystem struct, which holds the *storage.Client, and the file struct, which wraps *storage.Reader. The New function handles GCS client initialization and authentication. The Open method is responsible for parsing GCS URIs and obtaining a reader for the object. Notably, the Stat() method for gcs.file is intentionally not implemented (returns ErrNotImplemented) because Perf's current usage patterns do not require it, simplifying the implementation. The parseNameIntoBucketAndPath helper function is crucial for translating the GCS URI format into the bucket and object path components required by the GCS client library.

Workflow: Opening a File (Conceptual)

The client code (e.g., a component within Perf) would typically decide which filestore implementation to use based on configuration or the nature of the file path.

  1. Initialization:

    • For local files: fsImpl, err := local.New("/path/to/data/root")
    • For GCS files: fsImpl, err := gcs.New(context.Background())
  2. File Access:

    - The client calls `file, err :=
    

    fsImpl.Open(“relative/path/to/file.json”)(for local) orfile, err := fsImpl.Open(“gs://my-bucket/data/some_trace.json”)` (for GCS).

  3. Behind the Scenes:

    - **Local**: `local.Open("relative/path/to/file.json") | V Calculates
    

    absolute path based on rootDir | V Calls os.DirFS(rootDir).Open(“relative/path/to/file.json”) | V Returns fs.File (os.File) - **GCS**:gcs.Open(“gs://my-bucket/data/some_trace.json”) | V parseNameIntoBucketAndPath(“gs://my-bucket/data/some_trace.json”) --> “my-bucket”, “data/some_trace.json” | V gcsClient.Bucket(“my-bucket”).Object(“data/some_trace.json”).NewReader() | V Wraps storage.Reader in gcs.file | V Returns fs.File (gcs.file)`

  4. Reading Data:

    • The client can then use the returned fs.File (e.g., file.Read(buffer)) in a standard way, irrespective of whether it's an os.File or a gcs.file wrapping a storage.Reader.

This abstraction allows Perf to be agnostic to the underlying storage mechanism when reading files, simplifying its data processing pipelines.

Module: /go/frontend

The frontend module serves as the backbone for the Perf web UI. It's responsible for handling HTTP requests, rendering HTML templates, and interacting with various backend services and data stores to provide a comprehensive performance analysis platform.

The design philosophy emphasizes a separation of concerns. The core frontend.go file initializes and wires together various components, while the api subdirectory houses specific handlers for different categories of user interactions (e.g., alerts, graphs, regressions). This modular approach simplifies development, testing, and maintenance.

Key Components and Responsibilities:

  • frontend.go:

    • Initialization (New, initialize): This is the entry point. It sets up logging, metrics, reads configuration (config.Config), initializes database connections (TraceStore, AlertStore, RegressionStore, etc.), and establishes connections to external services like Git and potentially Chrome Perf.
    • Template Handling (loadTemplates, templateHandler): It loads HTML templates from the dist directory (produced by the build system). These templates are Go templates, allowing for dynamic data injection. Snippets for Google Analytics (googleanalytics.html) and cookie consent (cookieconsent.html) are embedded and can be included in the rendered pages.
    • Page Context (getPageContext): This crucial function generates a JavaScript object (window.perf) that is embedded in every HTML page. This object contains configuration values and settings that the client-side JavaScript needs to function correctly, such as API URLs, display preferences, and feature flags. This avoids hardcoding such values in the JavaScript and allows for easier configuration.
    • Routing (GetHandler, getFrontendApis): It defines the HTTP routes and associates them with their respective handler functions. This is where chi router is configured. It also instantiates and registers all the API handlers from the api sub-module.
    • Authentication and Authorization (loginProvider, RoleEnforcedHandler): It integrates with an authentication system (e.g., proxylogin) to determine user identity and roles. RoleEnforcedHandler is a middleware to protect certain endpoints based on user roles.
    • Long-Running Task Management (progressTracker): For operations that might take a significant amount of time (e.g., generating complex data frames for graphs, running regression detection), it uses a progress.Tracker. This allows the frontend to initiate a task, return an ID to the client, and let the client poll for status and results, preventing HTTP timeouts for long operations.
    • Workflow Example (Frame Request):
      1. Client POSTs to /_/frame/start with query details.
      2. frameStartHandler creates a progress object, adds it to progressTracker.
      3. A goroutine is launched to process the frame request using frame.ProcessFrameRequest.
      4. frameStartHandler immediately returns the progress object's ID.
      5. Client polls /_/status/{id}.
      6. Client fetches results from /_/frame/results/{id} (managed by progressTracker) once finished.
    • Redirections (gotoHandler, old URL handlers): Handles redirects for old URLs to new ones and provides a /g/ endpoint to navigate to specific views based on a Git hash.
    • Liveness Probe (liveness): Provides a /liveness endpoint that checks the health of critical dependencies (like the database connection) for Kubernetes.
  • api (subdirectory): This directory contains the specific HTTP handlers for various features of Perf. Each API is typically encapsulated in its own file (e.g., alertsApi.go, graphApi.go) and implements the FrontendApi interface, primarily its RegisterHandlers method. This design promotes modularity.

    • alertsApi.go: Manages CRUD operations for alert configurations (alerts.Alert). It interacts with alerts.ConfigProvider (for fetching configurations, potentially cached) and alerts.Store (for persistence). It also handles trying out bug filing and notification sending for alerts. Includes endpoints to list subscriptions and manage dry-run requests for alert configurations.
    • anomaliesApi.go: Provides endpoints for fetching anomaly data. It has two modes of operation:
    • Legacy (Chromeperf-backed): Proxies requests to an external Chromeperf instance for sheriff lists, anomaly lists, and group reports. This was likely an initial integration or for instances that rely on Chromeperf's anomaly detection. The test name cleaning logic (cleanTestName) addresses potential incompatibilities in test naming conventions or characters between systems.
    • Skia-internal: Fetches sheriff (subscription) lists and associated alerts directly from the instance's own database (subscription.Store, alerts.Store). This allows Perf instances to manage their own anomaly data.
    • favoritesApi.go: Manages user-specific and instance-wide favorite links. User favorites are stored in favorites.Store, while instance-wide favorites can be defined in the main configuration file (config.Config.Favorites). It provides endpoints to list, create, delete, and update favorites.
    • graphApi.go: Handles requests related to plotting graphs.
    • Frame Requests (frameStartHandler): As described above, this initiates the potentially long process of fetching trace data and constructing a dataframe.DataFrame. It uses dfbuilder.DataFrameBuilder for this.
    • Commit Information (cidHandler, cidRangeHandler, shiftHandler): Provides details about specific commits or ranges of commits by interacting with perfgit.Git.
    • Trace Details (detailsHandler, linksHandler): Fetches raw data or metadata for a specific trace point at a particular commit. This involves reading from tracestore.TraceStore and potentially the ingestedFS (filesystem where raw ingested data is stored) to get information like associated benchmark links from the original JSON files.
    • pinpointApi.go: Facilitates interaction with the Pinpoint bisection service. It allows users to create bisection jobs (to identify the commit that caused a performance regression) or try jobs (to test a patch). It can proxy requests to a legacy Pinpoint service or a newer backend service.
    • queryApi.go: Supports the query construction UI.
    • Parameter Set (initpageHandler, getParamSet): Provides the initial set of queryable parameters (keys and their possible values) to populate the UI. This uses psrefresh.ParamSetRefresher which periodically updates this canonical paramset based on recent data, ensuring the UI reflects available data.
    • Query Preflighting/Counting (countHandler, nextParamListHandler): As the user builds a query in the UI, these handlers can estimate the number of matching traces or provide the next relevant parameter values based on the current partial query. This gives users immediate feedback. The nextParamListHandler is tailored for UIs where parameter selection is ordered (e.g., Chromeperf's UI).
    • regressionsApi.go: Deals with detected regressions.
    • Listing/Counting Regressions (regressionRangeHandler, regressionCountHandler, alertsHandler, regressionsHandler): Fetches regression data from regression.Store based on time ranges, alert configurations, or subscriptions. It can filter by user ownership or category.
    • Triage (triageHandler): Allows users (editors) to mark regressions as triaged (e.g., “positive”, “negative”, “ignored”) and associate them with bug reports. If a regression is marked as negative, it can generate a bug report URL using a configurable template.
    • Manual Clustering (clusterStartHandler): Allows users to initiate the regression detection process for a specific query or set of parameters. This is also a long-running operation managed by progressTracker.
    • Anomaly/Group Redirection (anomalyHandler, alertGroupQueryHandler): Provides redirect URLs to the appropriate graph view for a given anomaly ID or alert group ID from Chromeperf. This involves generating graph shortcuts.
    • sheriffConfigApi.go: Handles interactions related to LUCI Config for sheriff configurations.
    • Metadata (getMetadataHandler): Provides metadata to LUCI Config, indicating which configuration files (e.g., skia-sheriff-configs.cfg) Perf owns and the URL for validating changes to these files. This is part of an automated config management system.
    • Validation (validateConfigHandler): Receives configuration content from LUCI Config and validates it (e.g., using sheriffconfig.ValidateContent). Returns success or a structured error message.
    • shortcutsApi.go: Manages the creation and retrieval of shortcuts.
    • Key Shortcuts (keysHandler): Allows storing a set of trace keys (queries) and getting a short ID for them. This is used, for example, by the “Share” button on the explore page.
    • Graph Shortcuts (getGraphsShortcutHandler, createGraphsShortcutHandler): Manages shortcuts for more complex graph configurations, which can include multiple queries and formulas. These are used for sharing multi-graph views.
    • triageApi.go: Provides endpoints for triaging anomalies, specifically those originating from or managed by Chromeperf. This includes filing new bugs, associating anomalies with existing bugs, and performing actions like ignoring or nudging anomalies. It interacts with chromeperf.ChromePerfClient and potentially an issuetracker.IssueTracker implementation.
    • userIssueApi.go: Manages user-reported issues (Buganizer annotations) associated with specific data points (a trace at a commit). This allows users to link external bug reports directly to performance data points in the UI. It uses userissue.Store for persistence.

The overall goal of the frontend module is to provide a responsive and informative user interface by efficiently querying and presenting performance data, while also enabling users to configure alerts, triage regressions, and collaborate on performance analysis. The interaction with various stores and services is abstracted to keep the request handling logic focused.

Module: /go/git

The go/git module provides an abstraction layer for interacting with Git repositories. It is designed to efficiently retrieve and cache commit information, which is essential for performance analysis in Skia Perf. The primary goal is to offer a consistent interface for accessing commit data, regardless of whether the underlying data source is a local Git checkout or a remote Gitiles API.

Design Decisions and Implementation Choices:

  • Database Caching: To avoid repeated and potentially slow Git operations, commit information is cached in an SQL database. This allows for quick lookups of commit details, commit numbers, and commit ranges. The schema for this database is defined in /go/git/schema/schema.go.
  • Provider Abstraction: The module utilizes a provider.Provider interface (defined in /go/git/provider/provider.go). This allows for different implementations of how Git data is fetched. Currently, two providers are implemented:
    • git_checkout: Interacts with a local Git repository by shelling out to git commands. This is suitable for environments where a local checkout is available and preferred.
    • gitiles: Uses the Gitiles API to fetch commit data. This is useful when direct repository access is not feasible or when leveraging Google's infrastructure for Git operations. The choice of provider is determined by the instance configuration, as seen in /go/git/providers/builder.go.
  • Commit Numbering:
    • Sequential: By default, the system assigns sequential integer CommitNumbers to commits as they are ingested. This provides a simple, ordered way to refer to commits.
    • Repo-Supplied: The system can also be configured to extract commit numbers directly from commit messages using a regular expression (specified in instanceConfig.GitRepoConfig.CommitNumberRegex). This is useful for repositories like Chromium that embed a commit position in their messages. The repoSuppliedCommitNumber flag in impl.go controls this behavior.
  • LRU Cache: In addition to the database cache, an in-memory LRU (Least Recently Used) cache (cache in impl.go) is used for frequently accessed commit details (CommitFromCommitNumber). This further speeds up lookups for commonly requested commits. The size of this cache is defined by commitCacheSize.
  • Background Polling: The StartBackgroundPolling method in impl.go initiates a goroutine that periodically calls the Update method. This ensures that the local database cache stays synchronized with the remote repository.
  • SQL Statements: All SQL queries are predefined as constants in impl.go. This helps in organizing and managing the queries. Separate statements are defined for different SQL dialects if needed (e.g., insert vs insertSpanner).
  • Error Handling: The BadCommit constant provides a sentinel value for functions returning provider.Commit to indicate an error or an invalid commit.

Key Responsibilities and Components:

  • interface.go (Git Interface):
    • Defines the Git interface, which is the public contract for this module. It specifies all the operations that can be performed to retrieve commit information.
    • This interface decouples the consumers of Git data from the specific implementation details (e.g., whether data comes from a local repo or Gitiles).
  • impl.go (Git Implementation):
    • Contains the Impl struct, which is the primary implementation of the Git interface.
    • Data Synchronization (Update method): This is a crucial method responsible for fetching new commits from the configured provider.Provider and storing them in the SQL database. It determines the last known commit and fetches all subsequent commits.
    • If repoSuppliedCommitNumber is true, it parses the commit number from the commit body using commitNumberRegex.
    • It handles potential race conditions where multiple services might try to update simultaneously by checking if a commit already exists before insertion.
    • Commit Retrieval Methods: Implements various methods for fetching commit data, such as:
    • CommitNumberFromGitHash: Retrieves the sequential CommitNumber for a given Git hash.
    • CommitFromCommitNumber: Retrieves the full provider.Commit details for a given CommitNumber. Uses the LRU cache.
    • CommitNumberFromTime: Finds the CommitNumber closest to (but not after) a given timestamp.
    • CommitSliceFromTimeRange, CommitSliceFromCommitNumberRange: Fetches slices of commits based on time or commit number ranges.
    • GitHashFromCommitNumber: Retrieves the Git hash for a given CommitNumber.
    • PreviousGitHashFromCommitNumber, PreviousCommitNumberFromCommitNumber: Finds the Git hash or commit number of the commit immediately preceding a given commit number.
    • CommitNumbersWhenFileChangesInCommitNumberRange: Identifies commit numbers within a range where a specific file was modified. This involves converting commit numbers to hashes and then querying the provider.Provider.
    • URL Generation (urlFromParts): Constructs a URL to view a specific commit, respecting configurations like DebouceCommitURL or custom CommitURL formats.
    • Metrics: Collects various metrics (e.g., updateCalled, commitNumberFromGitHashCalled) to monitor the usage and performance of different operations.
  • provider/provider.go (Provider Interface and Commit Struct):
    • Defines the provider.Provider interface, which abstracts the source of Git commit data. Implementations of this interface (like git_checkout and gitiles) handle the actual fetching of data.
    • Defines the provider.Commit struct, which is the standard representation of a commit used throughout the go/git module and its providers. It includes fields like GitHash, Timestamp, Author, Subject, and Body. The Body is particularly important when repoSuppliedCommitNumber is true, as it's parsed to extract the commit number.
  • providers/builder.go (Provider Factory):
    • Contains the New function, which acts as a factory for creating provider.Provider instances based on the instanceConfig.GitRepoConfig.Provider setting. This allows the system to dynamically choose between git_checkout or gitiles (or potentially other future providers).
  • providers/git_checkout/git_checkout.go (CLI Git Provider):
    • Implements provider.Provider by executing git command-line operations.
    • Handles cloning the repository if it doesn't exist.
    • Manages Git authentication (e.g., via Gerrit) if configured.
    • CommitsFromMostRecentGitHashToHead: Uses git rev-list to get commit information.
    • GitHashesInRangeForFile: Uses git log to find changes to a specific file.
    • parseGitRevLogStream: A helper function to parse the output of git rev-list --pretty.
  • providers/gitiles/gitiles.go (Gitiles Provider):
    • Implements provider.Provider by interacting with a Gitiles API endpoint.
    • CommitsFromMostRecentGitHashToHead: Uses gr.LogFnBatch to fetch commits in batches. It handles logic for main branches versus other branches and respects the startCommit.
    • GitHashesInRangeForFile: Uses gr.Log with appropriate path filtering.
    • Update is a no-op for Gitiles as the API always provides the latest data.
  • schema/schema.go (Database Schema):
    • Defines the Commit struct with SQL annotations, representing the structure of the Commits table in the database. This table stores the cached commit information.
  • gittest/gittest.go (Test Utilities):
    • Provides helper functions (NewForTest) for setting up test environments. This includes creating a temporary Git repository, populating it with commits, and initializing a test database. This is crucial for writing reliable unit and integration tests for the go/git module and its components.
  • mocks/Git.go (Mock Implementation):
    • Provides a mock implementation of the Git interface, generated by mockery. This is used in tests of other modules that depend on go/git, allowing them to isolate their tests from actual Git operations or database interactions.

Key Workflows:

  1. Initial Population / Update:

    Application -> Impl.Update()
        |
        '-> Provider.Update()  (e.g., git pull for git_checkout)
        |
        '-> Impl.getMostRecentCommit() (from local DB)
        |
        '-> Provider.CommitsFromMostRecentGitHashToHead(mostRecentDBHash, ...)
            |
            '-> (For each new commit from Provider)
                |
                '-> [If repoSuppliedCommitNumber] Impl.getCommitNumberFromCommit(commit.Body)
                |
                '-> Impl.CommitNumberFromGitHash(commit.GitHash) (Check if already exists)
                |
                '-> DB.Exec(INSERT INTO Commits ...)
    
  2. Fetching Commit Details by CommitNumber:

    Application -> Impl.CommitFromCommitNumber(commitNum)
        |
        '-> Check LRU Cache (cache.Get(commitNum))
        |   |
        |   '-> [If found] Return cached provider.Commit
        |
        '-> [If not in LRU] DB.QueryRow(SELECT ... FROM Commits WHERE commit_number=$1)
            |
            '-> Construct provider.Commit
            |
            '-> Add to LRU Cache (cache.Add(commitNum, commit))
            |
            '-> Return provider.Commit
    
  3. Finding Commits Where a File Changed: Application -> Impl.CommitNumbersWhenFileChangesInCommitNumberRange(beginNum, endNum, file) | '-> Impl.PreviousGitHashFromCommitNumber(beginNum) -> beginHash (or Impl.GitHashFromCommitNumber if beginNum is 0 and start commit is used) | '-> Impl.GitHashFromCommitNumber(endNum) -> endHash | '-> Provider.GitHashesInRangeForFile(beginHash, endHash, file) -> changedGitHashes[] | '-> (For each changedGitHash) | '-> Impl.CommitNumberFromGitHash(changedGitHash) -> commitNum | '-> Add commitNum to result list | '-> Return result list

This structure allows Perf to efficiently query and manage Git commit information, supporting its core functionality of tracking performance data across different versions of the codebase.

Module: /go/graphsshortcut

The graphsshortcut module provides a mechanism for storing and retrieving shortcuts for graph configurations in Perf. Users often define complex sets of graphs for analysis. Instead of redefining these configurations each time or relying on cumbersome URL sharing, this module allows users to save a collection of graph configurations and access them via a unique, shorter identifier. This significantly improves usability and sharing of common graph views.

The core idea is to represent a set of graphs, each with its own configuration (queries, formulas, keys), as a GraphsShortcut object. This object can then be persisted and retrieved using a Store interface. A key design decision is the generation of a unique ID for each GraphsShortcut. This ID is a hash (MD5) of the content of the shortcut, ensuring that identical graph configurations will always have the same ID. This also provides a form of de-duplication. To ensure consistent ID generation, the queries and formulas within each graph configuration are sorted alphabetically before hashing. However, the order of the GraphConfig objects within a GraphsShortcut does affect the generated ID.

User defines graph configurations --> [GraphsShortcut object] -- InsertShortcut --> [Store] --> Generates ID (MD5 hash) --> Persists (ID, Shortcut)
                                                                     ^
                                                                     |
User provides ID -------------------> [Store] -- GetShortcut --------+------> [GraphsShortcut object] --> Display Graphs

Key Components:

  • graphsshortcut.go: This file defines the central data structures and the Store interface.

    • GraphConfig: Represents the configuration for a single graph. It contains:
    • Queries: A slice of strings, where each string represents a query used to fetch data for the graph.
    • Formulas: A slice of strings, representing any formulas applied to the data.
    • Keys: A string, likely representing a pre-selected set of traces or keys to focus on.
    • GraphsShortcut: This is the primary object that is stored and retrieved. It's essentially a list of GraphConfig objects.
    • GetID(): A method on GraphsShortcut that calculates a unique MD5 hash based on its content. This method is crucial for identifying and de-duplicating shortcuts. It sorts queries and formulas within each GraphConfig before hashing to ensure that the order of these internal elements doesn't change the ID.
    • Store: An interface defining the contract for persisting and retrieving GraphsShortcut objects. It has two methods:
    • InsertShortcut: Takes a GraphsShortcut and stores it, returning its generated ID.
    • GetShortcut: Takes an ID and returns the corresponding GraphsShortcut.
  • graphsshortcutstore/: This subdirectory contains implementations of the graphsshortcut.Store interface.

    • graphsshortcutstore.go (GraphsShortcutStore): This provides an SQL-backed implementation of the Store.
    • Why SQL?: SQL databases offer robust, persistent storage suitable for production environments where data integrity and concurrent access are important.
    • How it works:
      • It uses a connection pool (sql.Pool) to manage database connections.
      • InsertShortcut: Marshals the GraphsShortcut object into JSON and stores it as a string in the GraphsShortcuts table along with its pre-computed ID. It uses ON CONFLICT (id) DO NOTHING to avoid errors if the same shortcut (and thus same ID) is inserted multiple times.
      • GetShortcut: Retrieves the JSON string from the database based on the ID and unmarshals it back into a GraphsShortcut object.
    • cachegraphsshortcutstore.go (cacheGraphsShortcutStore): This provides an in-memory cache-backed implementation of the Store.
    • Why a cache implementation?: This is primarily useful for local development or testing scenarios, especially when connecting to a production database. It allows developers to use features that rely on graph shortcuts (like multigraph) without needing write access (or breakglass permissions) to the production database. The shortcuts are stored locally and ephemerally.
    • How it works:
      • It utilizes a generic cache.Cache client.
      • InsertShortcut: Marshals the GraphsShortcut to JSON and stores it in the cache using the shortcut's ID as the cache key.
      • GetShortcut: Retrieves the JSON string from the cache by ID and unmarshals it.
    • schema/schema.go: Defines the SQL table schema for GraphsShortcuts. The table primarily stores the id (TEXT, PRIMARY KEY) and the graphs (TEXT, storing the JSON representation of the GraphsShortcut).
  • graphsshortcuttest/graphsshortcuttest.go: This file provides a suite of common tests that can be run against any implementation of the graphsshortcut.Store interface.

    • Why shared tests?: This promotes consistency and ensures that all store implementations adhere to the same contract. It makes it easier to add new store implementations and verify their correctness.
    • Key Tests:
    • InsertGet: Verifies that a shortcut can be inserted and then retrieved, and that the retrieved shortcut is identical to the original (accounting for sorted queries/formulas).
    • GetNonExistent: Ensures that attempting to retrieve a shortcut with an unknown ID results in an error.
  • mocks/Store.go: This file contains a mock implementation of the graphsshortcut.Store interface, generated by the testify/mock library.

    • Why mocks?: Mocks are essential for unit testing components that depend on the Store interface without needing a real database or cache. They allow for controlled testing of different scenarios, such as simulating errors from the store.

In summary, the graphsshortcut module provides a flexible way to save and share complex graph views by defining a clear data structure (GraphsShortcut), a standardized way to identify them (GetID), and an interface (Store) for various persistence mechanisms, with current implementations for SQL databases and in-memory caches.

Module: /go/ingest

The /go/ingest module is responsible for the entire process of taking performance data files, parsing them, and storing the data into a trace store. This involves identifying the format of the input file, extracting relevant measurements and metadata, associating them with specific commits, and then writing this information to the configured data storage backend.

A key design principle is to support multiple ingestion file formats and to be resilient to errors in individual files. The system attempts to parse files in a specific order, falling back to legacy formats if the primary parsing fails. This allows for graceful upgrades of the ingestion format over time without breaking existing data producers.

The ingestion process also handles trybot data, extracting issue and patchset information, which is crucial for pre-submit performance analysis.

Key Components and Files

/go/ingest/filter/filter.go

This component provides a mechanism to selectively process or ignore input files based on their names using regular expressions.

Why: In many scenarios, not all files in a data source are relevant for performance analysis. For example, temporary files, logs, or files matching specific patterns might need to be excluded. This filter allows for fine-grained control over which files are ingested.

How:

  • It uses two regular expressions: accept and reject.
  • An accept regex, if provided, means only filenames matching this regex will be considered for processing. If empty, all files are initially accepted.
  • A reject regex, if provided, means any filename matching this regex will be ignored, even if it matched the accept regex. If empty, no files are rejected based on this rule.
  • The Reject(name string) bool method implements this logic: a file is rejected if it doesn't match the accept regex (if one is provided) OR if it does match the reject regex (if one is provided).

Workflow:

File Name -> Filter.Reject()
              |
              +-- accept_regex_exists? -- Yes -> name_matches_accept? -- No -> REJECT
              |                             |
              |                             +-------------------------- Yes --+
              +----------------------------- No -----------------------------+
                                                                             |
                                                                             V
                                                               reject_regex_exists? -- Yes -> name_matches_reject? -- Yes -> REJECT
                                                                             |                             |
                                                                             |                             +-- No --+
                                                                             +----------------------------- No -----+
                                                                                                                    |
                                                                                                                    V
                                                                                                                  ACCEPT

/go/ingest/format/format.go and /go/ingest/format/legacyformat.go

These files define the structure of the data files that the ingestion system can understand. format.go defines the current standard format (Version 1), while legacyformat.go defines an older format primarily used by nanobench.

Why: A well-defined input format is essential for reliable data ingestion. Versioning allows the format to evolve while maintaining backward compatibility or clear error handling for older, unsupported versions. The current format (Format struct) is designed to be flexible, allowing for common metadata (like git hash, issue/patchset), global key-value pairs applicable to all results, and a list of individual results. Each result can have its own set of keys and either a single measurement or a map of “sub-measurements” (e.g., min, max, median for a single test). This structure allows for rich and varied performance data to be represented. The legacy format (BenchData) exists to support older systems that still produce data in that schema.

How:

  • format.go (Version 1):
    • Format struct: The top-level structure. Contains Version, GitHash, optional trybot info (Issue, Patchset), a global Key map, a slice of Result structs, and global Links.
    • Result struct: Represents one or more measurements. It has its own Key map (which gets merged with the global Key), and critically, either a single Measurement (float32) or a Measurements map.
    • SingleMeasurement struct: Used within Measurements map. It allows associating a value (e.g., “min”, “median”) with a Measurement (float32) and optional Links. This is how multiple metrics for a single conceptual test run are represented.
    • Parse(r io.Reader): Decodes JSON data from a reader into a Format struct. It specifically checks fileFormat.Version == FileFormatVersion.
    • Validate(r io.Reader): Uses a JSON schema (formatSchema.json) to validate the structure of the input data. This ensures that incoming files adhere to the expected contract, preventing malformed data from causing issues downstream.
    • GetLinksForMeasurement(traceID string): Retrieves links associated with a specific measurement, combining global links with measurement-specific ones.
  • legacyformat.go:
    • BenchData struct: Defines the older nanobench format. It has fields like Hash, Issue, PatchSet, Key, Options, and Results. The Results are nested maps leading to BenchResult.
    • BenchResult: A map representing individual test results, typically map[string]interface{} where values are float64s, except for an “options” key.
    • ParseLegacyFormat(r io.Reader): Decodes JSON data into a BenchData struct.

The system will first attempt to parse an input file using format.Parse. If that fails (e.g., due to a version mismatch or JSON parsing error), it may then attempt to parse it using format.ParseLegacyFormat as a fallback.

/go/ingest/format/formatSchema.json

This file contains the JSON schema definition for the Format struct defined in format.go.

Why: A JSON schema provides a formal, machine-readable definition of the expected data structure. This is used for validation, ensuring that ingested files conform to the specified format. This helps catch errors early and provides clear feedback on what is wrong with a non-conforming file.

How: It's a standard JSON Schema file. The format.Validate function uses this schema to check the structure and types of the fields in an incoming JSON file. The schema is embedded into the Go binary.

/go/ingest/format/generate/main.go

This is a utility program used to automatically generate formatSchema.json from the Go Format struct definition.

Why: Manually keeping a JSON schema synchronized with Go struct definitions is error-prone. This generator ensures that the schema always accurately reflects the Go types.

How: It uses the go.skia.org/infra/go/jsonschema library, which can reflect on Go structs and produce a corresponding JSON schema. The //go:generate directive in the file allows this program to be run easily (e.g., via go generate).

/go/ingest/parser/parser.go

This is the core component responsible for taking an input file (as file.File), attempting to parse it using the defined formats, and extracting the performance data into a standardized intermediate representation.

Why: This component decouples the specifics of file formats from the process of writing data to the trace store. It handles the logic of trying different parsers, extracting common information like Git hashes and trybot details, and transforming the data into lists of parameter maps (paramtools.Params) and corresponding measurement values (float32). It also enforces rules like branch name filtering and parameter key/value validation.

How:

  • New(...): Initializes a Parser with instance-specific configurations, such as recognized branch names and a regex for invalid characters in parameter keys/values.
  • Parse(ctx context.Context, file file.File): This is the main entry point for processing a regular data file.
    1. It first attempts to parse the file using extractFromVersion1File (which uses format.Parse).
    2. If that fails, it falls back to extractFromLegacyFile (which uses format.ParseLegacyFormat).
    3. It checks if the branch name (if present in the file's common keys) is in the allowed list. If not, it returns ErrFileShouldBeSkipped.
    4. It ensures that the extracted parameter keys and values are valid, potentially modifying them using query.ForceValidWithRegex based on the invalidParamCharRegex from the instance configuration. This is crucial because trace IDs (which are derived from these parameters) often have restrictions on allowed characters.
    5. Returns params (a slice of paramtools.Params), values (a slice of float32), the gitHash, any global links from the file, and an error.
  • ParseTryBot(file file.File): A specialized function to extract only the Issue and Patchset information from a file, trying both V1 and legacy formats. This is likely used for systems that only need to identify the tryjob associated with a file without processing all the measurement data.
  • ParseCommitNumberFromGitHash(gitHash string): Extracts an integer commit number from a specially formatted git hash string (e.g., “CP:12345” -> 12345). This supports systems that use such commit identifiers.
  • Helper functions like getParamsAndValuesFromLegacyFormat and getParamsAndValuesFromVersion1Format do the actual work of traversing the parsed file structures (BenchData or Format) and flattening them into the params and values slices.
    • For the V1 format, it iterates through f.Results. If a Result has a single Measurement, it combines f.Key and result.Key to form the paramtools.Params.
    • If a Result has Measurements (a map of string to []SingleMeasurement), it iterates through this map. For each entry, it takes the map's key and the Value from SingleMeasurement to add more key-value pairs to the paramtools.Params.
  • GetSamplesFromLegacyFormat(b *format.BenchData): Extracts raw sample data (if present) from the legacy format. This seems to be for specific use cases where individual sample values, rather than just aggregated metrics, are needed.

Key Workflow (Simplified Parse):

Input: file.File
Output: ([]paramtools.Params, []float32, gitHash, links, error)

1. Read file contents.
2. Attempt Parse as Version 1 Format:
   `f, err := format.Parse(contents)`
   If success:
     `params, values := getParamsAndValuesFromVersion1Format(f, p.invalidParamCharRegex)`
     `gitHash = f.GitHash`
     `links = f.Links`
     `commonKeys = f.Key`
   Else (error):
     Reset reader.
     Attempt Parse as Legacy Format:
       `benchData, err := format.ParseLegacyFormat(contents)`
       If success:
         `params, values := getParamsAndValuesFromLegacyFormat(benchData)`
         `gitHash = benchData.Hash`
         `links = nil` (legacy format doesn't have global links in the same way)
         `commonKeys = benchData.Key`
       Else (error):
         Return error.

3. `branch, ok := p.checkBranchName(commonKeys)`
   If !ok:
     Return `ErrFileShouldBeSkipped`.

4. If len(params) == 0:
   Return `ErrFileShouldBeSkipped`.

5. Return `params, values, gitHash, links, nil`.

/go/ingest/process/process.go

This component orchestrates the entire ingestion pipeline. It takes files from a source (e.g., a directory, GCS bucket), uses the parser to extract data, interacts with git to resolve commit information, and then writes the processed data to a tracestore.TraceStore and tracestore.MetadataStore. It also handles sending Pub/Sub events for ingested files.

Why: This provides the high-level control flow for ingestion. It manages concurrency (multiple worker goroutines), error handling at a macro level (retries for writing to the store), and integration with external systems like Git and Pub/Sub.

How:

  • Start(...):
    1. Initializes tracing, Pub/Sub client (if a topic is configured), the file.Source (to get files), the tracestore.TraceStore and tracestore.MetadataStore (to write data), and perfgit.Git (to map git hashes to commit numbers).
    2. Starts a number of worker goroutines specified by numParallelIngesters.
    3. Each worker listens on a channel provided by the file.Source.
  • worker(...):
    1. Creates a parser.Parser instance.
    2. Enters a loop, receiving file.File objects from the channel.
    3. For each file, it calls workerInfo.processSingleFile.
  • workerInfo.processSingleFile(f file.File): This is the heart of the per-file processing.
    1. Increments metrics for files received.
    2. Calls p.Parse(ctx, f) to get params, values, gitHash, and fileLinks.
    3. Handles errors from Parse:
      • If parser.ErrFileShouldBeSkipped, acks the Pub/Sub message (if any) and skips.
      • For other parsing errors, increments metrics and nacks the Pub/Sub message (if dead-lettering is enabled, allowing for retries or manual inspection).
    4. If gitHash is empty, logs an error and nacks.
    5. If the Git repo supplies commit numbers directly (e.g. “CP:12345”), it calls p.ParseCommitNumberFromGitHash.
    6. Calls g.GetCommitNumber(ctx, gitHash, commitNumberFromFile) to resolve the gitHash (or verify the supplied commit number) against the Git repository. It includes logic to update the local Git repository clone if the hash isn‘t initially found. If the commit cannot be resolved, it logs an error, acks the Pub/Sub message (as retrying won’t help for an unknown commit), and skips.
    7. Builds a paramtools.ParamSet from all the extracted params.
    8. Writes the data to the tracestore.TraceStore using store.WriteTraces or store.WriteTraces2 (depending on instanceConfig.IngestionConfig.TraceValuesTableInlineParams). This involves retries in case of transient store errors.
      • WriteTraces2 suggests an optimized path where some parameter data might be stored directly with trace values, potentially for performance reasons.
    9. If writing fails after retries, increments metrics and nacks.
    10. If writing succeeds, acks the Pub/Sub message and increments success metrics.
    11. Calls sendPubSubEvent to publish information about the ingested file (trace IDs, paramset, filename) to a configured Pub/Sub topic. This allows other services to react to new data ingestion.
    12. If fileLinks were present in the input, it calls metadataStore.InsertMetadata to store these links.
  • sendPubSubEvent(...): If a FileIngestionTopicName is configured, this function constructs an ingestevents.IngestEvent containing the trace IDs, the overall ParamSet for the file, and the filename. It then publishes this event to the specified Pub/Sub topic.

Overall Ingestion Workflow:

File Source (e.g., GCS bucket watcher)
     |
     v
[ file.File channel ] -> Worker Goroutine(s)
                             |
                             v
                       processSingleFile(file)
                             |
  +--------------------------+--------------------------+
  |                          |                          |
  v                          v                          v
Parser.Parse(file) --> Git.GetCommitNumber(hash) --> TraceStore.WriteTraces(...)
  |      ^                   |                          |   ^
  |      | (if parsing fails)|                          |   | (retries)
  |      +-------------------| (update repo if needed)  |   |
  |                          |                          |   |
  +-----> ParamSet Creation  +--------------------------+   |
  |                                                        |
  v                                                        |
sendPubSubEvent (if success) ------------------------------+
  |
  v
MetadataStore.InsertMetadata (if links exist)

This architecture allows for robust and scalable ingestion of performance data from various sources and formats, with clear separation of concerns between parsing, data transformation, Git interaction, and storage. The use of Pub/Sub facilitates downstream processing and real-time reactions to newly ingested data.

Module: /go/ingestevents

The ingestevents module is designed to facilitate the communication of ingestion completion events via PubSub. This is a critical part of the event-driven alerting system within Perf, where the completion of data ingestion for a file triggers subsequent processes like regression detection in a clusterer.

The core of this module revolves around the IngestEvent struct. This struct encapsulates the necessary information to be transmitted when a file has been successfully ingested. It includes:

  • TraceIDs: A slice of strings representing all the unencoded trace identifiers found within the ingested file. These IDs are fundamental for identifying the specific data points that have been processed.
  • ParamSet: An unencoded, read-only representation of the paramtools.ParamSet that summarizes the TraceIDs. This provides a consolidated view of the parameters associated with the ingested traces.
  • Filename: The name of the file that was ingested. This helps in tracking the source of the ingested data.

To handle the transmission of IngestEvent data over PubSub, the module provides two key functions:

  • CreatePubSubBody: This function takes an IngestEvent struct as input and prepares it for PubSub transmission. The “how” here involves a two-step process:

    1. The IngestEvent is first encoded into a JSON format. This provides a structured and widely compatible representation of the data.
    2. The resulting JSON data is then compressed using gzip. The “why” for this step is to ensure that the message size stays within the PubSub message size limits (currently 10MB). This is particularly important when dealing with files that contain a large number of traces, as the raw JSON representation could exceed the limit. The function returns the gzipped JSON data as a byte slice.
    IngestEvent (struct) ---> JSON Encoding ---> Gzip Compression ---> []byte (for PubSub)
    
  • DecodePubSubBody: This function performs the reverse operation. It takes a byte slice (presumably received from a PubSub message) and decodes it back into an IngestEvent struct. The process is:

    1. The input byte slice is first decompressed using gzip.
    2. The decompressed data, which is expected to be in JSON format, is then decoded into an IngestEvent struct. Error handling is incorporated at each step to manage potential issues during decompression or JSON decoding.
    []byte (from PubSub) ---> Gzip Decompression ---> JSON Decoding ---> IngestEvent (struct)
    

The primary responsibility of this module is therefore to provide a standardized and efficient way to serialize and deserialize ingestion event information for PubSub communication. The design choice of using JSON for structure and gzip for compression balances readability, interoperability, and an efficient use of PubSub resources.

The file ingestevents.go contains the definition of the IngestEvent struct and the implementation of the CreatePubSubBody and DecodePubSubBody functions. The corresponding test file, ingestevents_test.go, ensures that the encoding and decoding processes work correctly, verifying that an IngestEvent can be successfully round-tripped through the serialization and deserialization process.

Module: /go/initdemo

The initdemo module provides a command-line application designed to initialize a database instance, specifically targeting CockroachDB or a Spanner emulator, for demonstration or development purposes.

Its primary purpose is to automate the creation of a named database and the application of the latest database schema. This ensures a consistent and ready-to-use database environment, removing the manual steps often required for setting up a database for applications like Skia Perf.

The core functionality revolves around connecting to a specified database URL, attempting to create the database (gracefully handling cases where it already exists), and then executing the appropriate schema definition. The choice of schema (standard SQL or Spanner-specific) is determined by a command-line flag.

Key Components and Responsibilities:

  • main.go: This is the entry point and sole Go source file for the application.
    • Flag Parsing: It defines and parses command-line flags to configure the database connection and behavior.
    • --databasename: Specifies the name of the database to be created (defaults to “demo”). This allows users to customize the database name for different environments or purposes.
    • --database_url: Provides the connection string for the CockroachDB instance (defaults to a local instance postgresql://root@127.0.0.1:26257/?sslmode=disable). This allows connection to different database servers or configurations.
    • --spanner: A boolean flag that, when set, instructs the application to use the Spanner-specific schema. This is crucial for ensuring compatibility when targeting a Spanner emulator, which may have different SQL syntax or feature support compared to CockroachDB.
    • Database Connection: It establishes a connection to the database using the pgxpool library, which is a PostgreSQL driver and connection pool for Go. This library was chosen for its robustness and performance in handling PostgreSQL-compatible databases like CockroachDB.
    • Database Creation: It attempts to execute a CREATE DATABASE SQL statement. The implementation includes error handling to log an informational message if the database already exists, rather than failing, making the script idempotent in terms of database creation.
    • Database Selection (CockroachDB specific): If not targeting Spanner, it executes SET DATABASE to switch the current session's context to the newly created (or existing) database. This is a CockroachDB-specific command.
    • Schema Selection: Based on the --spanner flag, it selects the appropriate schema definition.
    • If --spanner is false, it uses sql.Schema from the //perf/go/sql module, which contains the standard SQL schema for Perf.
    • If --spanner is true, it uses spanner.Schema from the //perf/go/sql/spanner module, which contains the schema adapted for Spanner. This separation allows maintaining distinct schema versions tailored to the nuances of each database system.
    • Schema Application: It executes the selected schema DDL statements against the connected database. This step creates all the necessary tables, indexes, and other database objects required by the Perf application.
    • Connection Closure: Finally, it closes the database connection pool to release resources.

Workflow:

The typical workflow of the initdemo application can be visualized as:

  1. Parse Flags: Application Start -> Read --databasename, --database_url, --spanner

  2. Connect to Database: Use --database_url -> pgxpool.Connect() -> Connection Pool (conn)

  3. Create Database: conn + Use --databasename -> Execute "CREATE DATABASE <name>" | +-- Success | +-- Error (e.g., already exists) -> Log Info "Database <name> already exists."

  4. Set Active Database (if not Spanner): Is --spanner false? | +-- Yes -> conn + Use --databasename -> Execute "SET DATABASE <name>" | | | +-- Error -> sklog.Fatal() | +-- No (Spanner enabled) -> Skip this step

  5. Select Schema: Is --spanner true? | +-- Yes -> dbSchema = spanner.Schema | +-- No -> dbSchema = sql.Schema

  6. Apply Schema: conn + dbSchema -> Execute schema DDL | +-- Error -> sklog.Fatal()

  7. Close Connection: conn.Close() -> Application End

This process ensures that a target database is either created or confirmed to exist, and then the correct schema is applied, making it ready for use. The choice of using pgxpool for database interaction and providing separate schema definitions for standard SQL and Spanner demonstrates a design focused on supporting multiple database backends for the Perf system. The error handling, particularly for the database creation step, aims for robust and user-friendly operation.

Module: /go/issuetracker

Perf Issue Tracker Module

This module provides an interface and implementation for interacting with the Google Issue Tracker API, specifically tailored for Perf's needs. The primary goal is to abstract the complexities of the Issue Tracker API and provide a simpler, more focused way to retrieve issue details and add comments to existing issues. This allows other parts of the Perf system to integrate with issue tracking without needing to directly handle API authentication, request formatting, or response parsing.

Core Functionality and Design

The module is designed around the IssueTracker interface, which defines the core operations:

  1. Listing Issues (ListIssues): This function allows retrieving details for a set of specified issue IDs.

    - **Why**: Perf often needs to fetch information about bugs that have been
      filed (e.g., to display their status or link to them from alerts).
      Providing a bulk retrieval mechanism based on IDs is efficient.
    - **How**: The implementation takes a `ListIssuesRequest` containing a
      slice of integer issue IDs. It constructs a query string by joining
      these IDs with " | " (OR operator in Issue Tracker query language) and
      prepending "id:()". This formatted query is then sent to the Issue
      Tracker API.
    - **Example Workflow**: `Perf System --- ListIssuesRequest (IDs: [123,
    

    456]) ---> issuetracker Module | v Construct Query: “id:(123 | 456)” | v Issue Tracker API <--- GET Request --- issueTrackerImpl | v Perf System <--- []*issuetracker.Issue --- Response Parsing <--- API Response`

  2. Creating Comments (CreateComment): This function allows adding a new comment to an existing issue.

    - **Why**: Perf might need to automatically update bugs with new
      information, such as when a regression is fixed or when more data about
      an alert becomes available.
    - **How**: It takes a `CreateCommentRequest` containing the `IssueId` and
      the `Comment` string. The implementation constructs an
      `issuetracker.IssueComment` object and uses the Issue Tracker client
      library to post this comment to the specified issue.
    - **Example Workflow**: `Perf System --- CreateCommentRequest (ID: 789,
    

    Comment: “...”) ---> issuetracker Module | v Issue Tracker API <--- POST Request --- issueTrackerImpl | v Perf System <--- CreateCommentResponse <--- Response Parsing <--- API Response`

Key Components

  • issuetracker.go:

    • IssueTracker interface: Defines the contract for interacting with the issue tracker. This allows for decoupling the client code from the specific implementation and facilitates testing using mocks.
    • issueTrackerImpl struct: The concrete implementation of the IssueTracker interface. It holds an instance of the issuetracker.Service client, which is the generated Go client for the Google Issue Tracker API.
    • NewIssueTracker function: This is the factory function for creating an issueTrackerImpl instance.
    • Authentication: It handles the authentication by fetching an API key from Google Secret Manager. The secret project and name are configurable via config.IssueTrackerConfig. It then uses google.DefaultClient with the “https://www.googleapis.com/auth/buganizer” scope to obtain an authenticated HTTP client. This client and the API key are then used to initialize the issuetracker.Service.
    • Configuration: The BasePath of the issuetracker.Service is explicitly set to “https://issuetracker.googleapis.com” to ensure it points to the correct API endpoint.
    • Request/Response Structs (ListIssuesRequest, CreateCommentRequest, CreateCommentResponse): These simple structs define the data structures for requests and responses, making the interface clear and easy to use. They are designed to be minimal and specific to the needs of the Perf system.
  • mocks/IssueTracker.go:

    • This file contains a mock implementation of the IssueTracker interface, generated using the testify/mock library.
    • Why: Mocks are crucial for unit testing components that depend on the issuetracker module. They allow tests to simulate various responses (success, failure, specific data) from the issue tracker without making actual API calls. This makes tests faster, more reliable, and independent of external services.
    • How: The IssueTracker mock struct embeds mock.Mock and provides mock implementations for ListIssues and CreateComment. The NewIssueTracker function in this file is a constructor for the mock, which also sets up test cleanup to assert that all expected mock calls were made.

Design Decisions and Trade-offs

  • Interface-based design: Using an interface (IssueTracker) promotes loose coupling and testability. Consumers depend on the abstraction rather than the concrete implementation.
  • Simplified API: The module exposes only the functionality currently needed by Perf (listing issues by ID and creating comments). It doesn't attempt to be a full-fledged Issue Tracker client, which simplifies its own implementation and usage. If more advanced features are needed in the future, the interface can be extended.
  • Secret Management for API Key: Storing the API key in Google Secret Manager is a security best practice, preventing it from being hardcoded or checked into version control.
  • Error Handling: The module uses skerr.Wrapf to wrap errors, providing context and making debugging easier. It also includes input validation for CreateCommentRequest to prevent invalid API calls.
  • Logging: Debug logs (sklog.Debugf) are included to trace requests and responses, which can be helpful during development and troubleshooting.

The module relies on the external go.skia.org/infra/go/issuetracker/v1 library, which is the auto-generated client for the Google Issue Tracker API. This design choice leverages existing, well-tested client libraries instead of reimplementing API interaction from scratch.

Module: /go/kmeans

K-Means Clustering Module

This module provides a generic implementation of the k-means clustering algorithm. The primary goal is to offer a flexible way to group a set of data points (observations) into a predefined number of clusters (k) based on their similarity. The “similarity” is determined by a distance metric, and the “center” of each cluster is represented by a centroid.

Design and Implementation Choices

The module is designed with generality in mind. Instead of being tied to a specific data type or distance metric, it uses interfaces (Clusterable, Centroid) and a function type (CalculateCentroid). This approach allows users to define their own data structures and distance calculations, making the k-means algorithm applicable to a wide variety of problems.

Interfaces for Flexibility:

  • Clusterable: This is a marker interface. Any data type that needs to be clustered must satisfy this interface. In practice, this means you can use interface{} and then perform type assertions within your custom distance and centroid calculation functions. This design choice prioritizes ease of use for simple cases, where the same type might represent both an observation and a centroid.
  • Centroid: This interface defines the contract for centroids.
    • AsClusterable() Clusterable: This method is crucial for situations where a centroid itself can be treated as a data point (e.g., when calculating distances or when a centroid is part of the initial observation set). It allows the algorithm to seamlessly integrate centroids into lists of clusterable items. If a centroid cannot be meaningfully converted to a Clusterable, it returns nil.
    • Distance(c Clusterable) float64: This method is the core of the similarity measure. It calculates the distance between the centroid and a given Clusterable data point. The user provides the specific implementation for this, enabling the use of various distance metrics (Euclidean, Manhattan, etc.).
  • CalculateCentroid func([]Clusterable) Centroid: This function type defines how a new centroid is computed from a set of Clusterable items belonging to a cluster. This allows users to implement different strategies for centroid calculation, such as taking the mean, median, or other representative points.

Lloyd's Algorithm Implementation:

The core clustering logic is implemented in the Do function, which performs a single iteration of Lloyd's algorithm. This is a common and relatively straightforward iterative approach to k-means.

The KMeans function orchestrates multiple iterations of Do. A key design consideration here is the convergence criteria. Currently, it runs for a fixed number of iterations (iters). A more sophisticated approach, would be to iterate until the total error (or the change in centroid positions) falls below a certain threshold, indicating that the clusters have stabilized. This was likely deferred for simplicity in the initial implementation, but it's an important aspect for practical applications to avoid unnecessary computations or premature termination.

Why modify centroids in-place in Do?

The Do function modifies the centroids slice passed to it. The documentation explicitly advises calling it as centroids = Do(observations, centroids, f). This design choice might have been made for efficiency, avoiding the allocation of a new centroids slice in every iteration if the number of centroids remains the same. However, it also means the caller needs to be aware of this side effect. The function does return the potentially new slice of centroids, which is important because centroids can be “lost” if a cluster becomes empty.

Key Responsibilities and Components

  • kmeans.go: This is the sole source file and contains all the logic for the k-means algorithm.

    • Clusterable (interface): Defines the contract for data points that can be clustered. Its main purpose is to allow generic collections of items.
    • Centroid (interface): Defines the contract for cluster centers, including how to calculate their distance to data points and how to treat them as data points themselves.
    • CalculateCentroid (function type): A user-provided function that defines the logic for computing a new centroid from a group of data points. This separation of concerns is key to the module's flexibility.
    • closestCentroid(observation Clusterable, centroids []Centroid) (int, float64): A helper function that finds the index of the centroid closest to a given observation and the distance to it. This is a fundamental step in assigning observations to clusters.
    • Do(observations []Clusterable, centroids []Centroid, f CalculateCentroid) []Centroid:
    • Responsibility: Performs a single iteration of the k-means algorithm (Lloyd's algorithm).
    • How it works: 1. Assigns each observation to its nearest centroid, forming temporary clusters. Observations --> [Find Closest Centroid for each] --> Temporary Cluster Assignments 2. For each temporary cluster, it recalculates a new centroid using the user-provided f function. Temporary Cluster Assignments --> [Group by Cluster] --> Sets of Clusterable items | V [Apply 'f'] --> New Centroids 3. If a cluster becomes empty (no observations are closest to its centroid), that centroid is effectively removed in this iteration, as f will not be called for an empty set of Clusterable items, and newCentroids will not include it.
    • Design Rationale: Encapsulates one core step of the iterative process, making the overall KMeans function clearer. The in-place modification (and return value) addresses the potential for the number of centroids to change.
    • GetClusters(observations []Clusterable, centroids []Centroid) ([][]Clusterable, float64):
    • Responsibility: Organizes the observations into their final clusters based on the provided (presumably converged) centroids and calculates the total sum of squared errors (or whatever distance metric is used).
    • How it works:
      1. Initializes a list of clusters, where each cluster initially contains only its centroid (if AsClusterable() is not nil).
      2. Iterates through all observations, assigning each to its closest centroid and adding it to the corresponding cluster list.
      3. Accumulates the distance from each observation to its assigned centroid to compute the totalError.
    • Design Rationale: Provides a way to retrieve the actual cluster memberships after the algorithm has run. The inclusion of the centroid as the first element in each returned cluster is a convention for easy identification.
    • KMeans(observations []Clusterable, centroids []Centroid, k, iters int, f CalculateCentroid) ([]Centroid, [][]Clusterable):
    • Responsibility: The main entry point for running the k-means algorithm for a specified number of iterations.
    • How it works: Initial Centroids --(iter 1)--> Do() --(updates)--> Centroids' | --(iter 2)--> Do() --(updates)--> Centroids'' ... --(iter 'iters')--> Do() --(updates)--> Final Centroids | V GetClusters() --> Final Clusters
    • Design Rationale: Provides a simple interface to run the entire process. The fixed number of iterations (iters) is a straightforward stopping condition, though, as mentioned, convergence-based stopping would be more robust. The k parameter seems redundant given that the initial number of centroids is determined by len(centroids). If k was intended to specify the desired number of clusters and the initial centroids were just starting points, the implementation would need to handle cases where len(centroids) != k. However, the current Do function naturally adjusts the number of centroids if some clusters become empty.
    • TotalError(observations []Clusterable, centroids []Centroid) float64:
    • Responsibility: Calculates the sum of distances from each observation to its closest centroid. This is often used as a measure of the “goodness” of the clustering.
    • How it works: It simply calls GetClusters and returns the totalError computed by it.
    • Design Rationale: Provides a convenient way to evaluate the clustering quality without needing to manually iterate and sum distances.

Key Workflows

1. Single K-Means Iteration (Do function):

Input: Observations (O), Current Centroids (C_curr), CalculateCentroid function (f)

1. For each Observation o in O:
   Find c_closest in C_curr such that Distance(o, c_closest) is minimized.
   Assign o to the cluster associated with c_closest.
   ---> Result: A mapping of each Observation to a Centroid index.

2. Initialize NewCentroids (C_new) as an empty list.

3. For each unique Centroid index j (from 0 to k-1):
   a. Collect all Observations (O_j) assigned to cluster j.
   b. If O_j is not empty:
      Calculate new_centroid_j = f(O_j).
      Add new_centroid_j to C_new.
   ---> Potentially, some original centroids might not have any observations assigned,
        so C_new might have fewer centroids than C_curr.

Output: New Centroids (C_new)

2. Full K-Means Clustering (KMeans function):

Input: Observations (O), Initial Centroids (C_init), Number of Iterations (iters), CalculateCentroid function (f)

1. Set CurrentCentroids = C_init.

2. Loop 'iters' times:
   CurrentCentroids = Do(O, CurrentCentroids, f)  // Perform one iteration
   ---> CurrentCentroids are updated.

3. FinalCentroids = CurrentCentroids.

4. Clusters, TotalError = GetClusters(O, FinalCentroids)
   ---> Assigns each observation to its final cluster based on FinalCentroids.
        The first element of each sub-array in Clusters is the centroid itself.

Output: FinalCentroids, Clusters

The unit tests in kmeans_test.go provide excellent examples of how to implement the Clusterable, Centroid, and CalculateCentroid requirements for a simple 2D point scenario. They demonstrate the expected behavior of the Do and KMeans functions, including edge cases like empty inputs or losing centroids when clusters become empty.

Module: /go/maintenance

Maintenance Module Documentation

High-Level Overview

The maintenance module in Perf is responsible for executing a set of long-running background processes that are essential for the health and operational integrity of a Perf instance. These tasks ensure that data is kept up-to-date, system configurations are current, and storage is managed efficiently. The module is designed to be started once and run continuously, performing its duties at predefined intervals.

Design Rationale and Implementation Choices

The core design principle behind the maintenance module is to centralize various periodic tasks that would otherwise be scattered or require manual intervention. By consolidating these operations, the system becomes more robust and easier to manage.

Key design choices include:

  • Asynchronous Operations: Most maintenance tasks are designed to run in separate goroutines, triggered by timers. This allows the main application thread (if any) to remain responsive and prevents one maintenance task from blocking others.
  • Configurability via Flags and Instance Configuration: The behavior of the maintenance tasks (e.g., whether to perform regression migration, refresh query cache, or delete old data) is controlled by command-line flags (config.MaintenanceFlags) and the instance-specific configuration (config.InstanceConfig). This provides flexibility for different Perf deployments and operational needs.
  • Dependency Injection: Components like database connections (builders.NewDBPoolFromConfig), Git interfaces (builders.NewPerfGitFromConfig), and caching mechanisms (builders.GetCacheFromConfig) are created and passed into the respective maintenance tasks. This promotes modularity and testability.
  • Error Handling and Logging: Each maintenance task incorporates error handling and logging (sklog) to provide visibility into its operations and to aid in diagnosing issues. While errors in one task might be logged, the overall Start function aims to keep other independent tasks running.
  • Idempotency (Implicit): While not explicitly stated for all tasks, many maintenance operations are inherently idempotent or designed to be safe to run repeatedly (e.g., schema migration, data deletion based on age).
  • Phased Introduction of Features: Features like regression migration or Sheriff config integration are gated by flags (flags.MigrateRegressions, instanceConfig.EnableSheriffConfig). This allows for gradual rollouts and testing in production environments.

Responsibilities and Key Components

The maintenance module orchestrates several distinct background processes.

1. Core Initialization and Schema Management (maintenance.go)

  • Why: Before any maintenance tasks can run, essential services like tracing need to be initialized. Crucially, the database schema must be validated and migrated to the expected version. This ensures that all subsequent database operations are performed against a compatible and up-to-date schema.
  • How:
    • tracing.Init: Sets up the distributed tracing system.
    • builders.NewDBPoolFromConfig: Establishes a connection pool to the database.
    • expectedschema.ValidateAndMigrateNewSchema: Checks the current database schema version against the expected version defined in the codebase. If they don't match, it applies the necessary migrations to bring the schema up to date. This is a critical step to prevent data corruption or application errors due to schema mismatches.

2. Git Repository Synchronization (maintenance.go)

  • Why: Perf relies on an up-to-date view of the monitored Git repository to associate performance data with specific commits. This process ensures that new commits are continuously ingested into the Perf system.
  • How:
    • builders.NewPerfGitFromConfig: Creates an instance of perfgit.Git, which provides an interface to the Git repository.
    • g.StartBackgroundPolling(ctx, gitRepoUpdatePeriod): This method launches a goroutine within the perfgit component. This goroutine periodically fetches the latest changes from the remote Git repository (origin) and updates the local representation, typically also updating a Commits table in the database with new commit information. The gitRepoUpdatePeriod constant (e.g., 1 minute) defines how frequently this update occurs.

3. Regression Schema Migration (maintenance.go)

  • Why: Over time, the way regression data is stored might need to be changed for performance, new features, or data integrity reasons. This component handles the migration of existing regression data from an older schema/table to a newer one. This is often a long-running process for instances with a large history of regressions.
  • How:
    • Controlled by the flags.MigrateRegressions flag.
    • migration.New: Creates a Migrator instance, likely configured with database connections for both the old and new regression storage mechanisms.
    • migrator.RunPeriodicMigration(regressionMigratePeriod, regressionMigrationBatchSize): Starts a goroutine that, at intervals defined by regressionMigratePeriod, processes a regressionMigrationBatchSize number of regressions, moving them from the old storage to the new. This batching approach prevents overwhelming the database and allows the migration to proceed incrementally.

4. Sheriff Configuration Import (maintenance.go)

  • Why: Perf allows defining alert configurations (Sheriff configs) that specify how and when alerts should be triggered for performance regressions. These configurations can be managed externally (e.g., via LUCI Config). This component ensures that Perf stays synchronized with the latest configurations.
  • How:
    • Conditional on instanceConfig.EnableSheriffConfig and a non-empty instanceConfig.InstanceName.
    • It initializes AlertStore and SubscriptionStore for managing alert and subscription data within Perf.
    • luciconfig.NewApiClient: Creates a client to communicate with the LUCI Config service.
    • sheriffconfig.New: Initializes the SheriffConfig service, which encapsulates the logic for fetching, parsing, and applying Sheriff configurations.
    • sheriffConfig.StartImportRoutine(configImportPeriod): Launches a goroutine that periodically (every configImportPeriod) polls the LUCI Config service for the specified instance. If new or updated configurations are found, they are processed and stored/updated in Perf's database (e.g., in the Alerts and Subscriptions tables).

5. Query Cache Refresh (maintenance.go)

  • Why: To speed up common queries (e.g., retrieving the set of available trace parameters, known as ParamSets), Perf can cache this information. This component is responsible for periodically rebuilding and refreshing these caches.

  • How:

    • Controlled by the flags.RefreshQueryCache flag.
    • builders.NewTraceStoreFromConfig: Gets an interface to the trace data.
    • dfbuilder.NewDataFrameBuilderFromTraceStore: Creates a utility for building data frames from traces, which is likely used to derive the ParamSet.
    • psrefresh.NewDefaultParamSetRefresher: Initializes a component specifically designed to refresh ParamSets. It uses the DataFrameBuilder to scan trace data and determine the current set of unique parameter key-value pairs.
    • psRefresher.Start(time.Hour): Starts a goroutine to refresh the primary ParamSet (perhaps stored directly in the database or an in-memory representation) hourly.
    • builders.GetCacheFromConfig: If a distributed cache like Redis is configured, this obtains a client for it.
    • psrefresh.NewCachedParamSetRefresher: Wraps the primary psRefresher with a caching layer.
    • cacheParamSetRefresher.StartRefreshRoutine(redisCacheRefreshPeriod): Starts another goroutine that takes the ParamSet generated by psRefresher and populates the external cache (e.g., Redis) at redisCacheRefreshPeriod intervals (e.g., every 4 hours). This provides a faster lookup path for frequently accessed ParamSet data.

    Workflow:

    Trace Data --> DataFrameBuilder --> ParamSetRefresher (generates primary ParamSet)
                                          |
                                          v
                                     CachedParamSetRefresher --> External Cache (e.g., Redis)
    

6. Old Data Deletion (deletion/deleter.go, maintenance.go)

  • Why: Over time, Perf accumulates a large amount of data, including regression information and associated shortcuts (which are often links or identifiers for specific data views). To manage storage costs and maintain system performance, very old data that is unlikely to be accessed needs to be periodically deleted.

  • How:

    • Controlled by the flags.DeleteShortcutsAndRegressions flag.
    • deletion.New(db, ...): Initializes a Deleter object. This object encapsulates the logic for identifying and removing outdated regressions and shortcuts. It takes a database connection pool (db) and the datastore type. Internally, it creates instances of sqlregressionstore and sqlshortcutstore to interact with the respective database tables.
    • deleter.RunPeriodicDeletion(deletionPeriod, deletionBatchSize): This method in maintenance.go calls the RunPeriodicDeletion method on the Deleter instance.
    • Inside deleter.go, RunPeriodicDeletion starts a goroutine.
    • This goroutine ticks at intervals specified by deletionPeriod (e.g., every 15 minutes).
    • On each tick, it calls d.DeleteOneBatch(deletionBatchSize).
    • Deleter.DeleteOneBatch(shortcutBatchSize):
    • Calls d.getBatch(ctx, shortcutBatchSize) to identify a batch of regressions and shortcuts eligible for deletion.
      • Deleter.getBatch(...):
        • Finds the oldest commit number present in the Regressions table.
        • Iteratively queries the Regressions table for ranges of commits, starting from the oldest.
        • For each regression found, it checks the timestamp of its Low and High StepPoints.
        • If a StepPoint's timestamp is older than the defined ttl (Time-To-Live, currently -18 months), the associated shortcut and the commit number of the regression are marked for deletion.
        • It continues collecting these until the number of shortcuts to be deleted reaches approximately shortcutBatchSize.
        • Returns the list of commit numbers whose regressions will be deleted and the list of shortcut IDs to be deleted.
    • Calls d.deleteBatch(ctx, commitNumbers, shortcuts) to perform the actual deletion.
      • Deleter.deleteBatch(...):
        • Starts a database transaction.
        • Iterates through the commitNumbers and calls d.regressionStore.DeleteByCommit() for each, removing the regression data associated with that commit.
        • Iterates through the shortcuts and calls d.shortcutStore.DeleteShortcut() for each, removing the shortcut entry.
        • If all deletions are successful, it commits the transaction. If any error occurs, it rolls back the transaction to ensure data consistency.

    Deletion Workflow:

    Timer (every deletionPeriod) --> DeleteOneBatch
                                       |
                                       v
                                    getBatch (identifies old data based on TTL)
                                       |
                                       | Returns (commit_numbers_to_delete, shortcut_ids_to_delete)
                                       v
                                    deleteBatch (deletes in a transaction)
                                       |
                                       +--> RegressionStore.DeleteByCommit
                                       +--> ShortcutStore.DeleteShortcut
    

    The ttl variable in deleter.go is set to -18 months, meaning regressions and their associated shortcuts older than 1.5 years are targeted for deletion. This value was determined based on stakeholder requirements for data retention.

The select {} at the end of the Start function in maintenance.go is a common Go idiom to make the main goroutine (the one that called Start) block indefinitely. Since all the actual work is done in background goroutines launched by Start, this prevents the Start function from returning and thus keeps the maintenance processes alive.

Module: /go/notify

The notify module in Perf is responsible for handling notifications related to performance regressions. It provides a flexible framework for formatting and sending notifications through various channels like email, issue trackers, or custom endpoints like Chromeperf.

Core Concepts and Design:

The notification system is built around a few key abstractions:

  1. Notifier Interface (notify.go): This is the central interface for sending notifications. It defines methods for:

    • RegressionFound: Called when a new regression is detected.
    • RegressionMissing: Called when a previously detected regression is no longer found (e.g., due to new data or fixes).
    • ExampleSend: Used for sending test/dummy notifications to verify configuration.
    • UpdateNotification: For updating an existing notification (e.g., adding a comment to an issue).
  2. Formatter Interface (notify.go): This interface is responsible for constructing the content (body and subject) of a notification. Implementations exist for:

    • HTMLFormatter (html.go): Generates HTML-formatted notifications, suitable for email.
    • MarkdownFormatter (markdown.go): Generates Markdown-formatted notifications, suitable for issue trackers or other systems that support Markdown. The formatters use Go's text/template package, allowing for customizable notification messages. Templates can access a TemplateContext (or AndroidBugTemplateContext for Android-specific notifications) which provides data about the regression, commit, alert, etc.
  3. Transport Interface (notify.go): This interface defines how a formatted notification is actually sent. Implementations include:

    • EmailTransport (email.go): Sends notifications via email using the emailclient module.
    • IssueTrackerTransport (issuetracker.go): Interacts with an issue tracking system (configured for Google's Issue Tracker/Buganizer) to create or update issues. It uses the go/issuetracker/v1 client and requires an API key for authentication.
    • NoopTransport (noop.go): A “do nothing” implementation, useful for disabling notifications or for testing.
  4. NotificationDataProvider Interface (notification_provider.go): This interface is responsible for gathering the necessary data to populate the notification templates.

    • The defaultNotificationDataProvider uses a Formatter to generate the notification body and subject based on RegressionMetadata.
    • androidNotificationProvider (android_notification_provider.go) is a specialized provider for Android-specific bug reporting. It uses its own AndroidBugTemplateContext which includes Android-specific details like Build ID diff URLs. It leverages the MarkdownFormatter for content generation but with Android-specific templates.

Workflow for Sending a Notification (Simplified):

  1. A regression is detected (e.g., by the alerter module).
  2. The Notifier's RegressionFound method is called with details about the regression (commit, alert configuration, cluster summary, etc.).
  3. The Notifier (typically defaultNotifier) uses its NotificationDataProvider to get the raw notification data (body and subject).
    • The NotificationDataProvider populates a context object (e.g., TemplateContext or AndroidBugTemplateContext).
    • It then uses a Formatter (e.g., MarkdownFormatter) to execute the appropriate template with this context, producing the final body and subject.
  4. The Notifier then calls its Transport's SendNewRegression method, passing the formatted body and subject.
  5. The Transport implementation handles the actual sending (e.g., makes an API call to the issue tracker or sends an email).
Regression Detected --> Notifier.RegressionFound(...)
                           |
                           v
                  NotificationDataProvider.GetNotificationDataRegressionFound(...)
                           |
                           | (Populates Context, e.g., TemplateContext)
                           v
                  Formatter.FormatNewRegressionWithContext(...)
                           |  (Uses Go templates)
                           v
                  Formatted Body & Subject
                           |
                           v
                  Transport.SendNewRegression(body, subject)
                           |
                           +------------------> EmailTransport --> Email Server
                           |
                           +------------------> IssueTrackerTransport --> Issue Tracker API
                           |
                           +------------------> NoopTransport --> (Does nothing)

Key Files and Responsibilities:

  • notify.go:

    • Defines the core interfaces: Notifier, Formatter, Transport.
    • Provides the defaultNotifier implementation, which orchestrates the notification process by composing a NotificationDataProvider, Formatter, and Transport.
    • Contains the New() factory function that constructs the appropriate Notifier based on the NotifyConfig. This is the main entry point for creating a notifier.
    • Defines TemplateContext used by generic formatters.
    • Includes logic in getRegressionMetadata to fetch additional information like source file links from TraceStore if the alert is for an individual trace.
  • notification_provider.go:

    • Defines the NotificationDataProvider interface.
    • Provides defaultNotificationDataProvider which uses a generic Formatter.
    • The purpose is to abstract the data gathering logic for notifications, allowing for different data providers (like the Android-specific one) without changing the core Notifier or Transport mechanisms.
  • android_notification_provider.go:

    • Implements NotificationDataProvider specifically for Android bug creation.
    • Uses AndroidBugTemplateContext to provide Android-specific data to templates, such as GetBuildIdUrlDiff for generating links to compare Android build CLs.
    • Relies on MarkdownFormatter but configures it with Android-specific notification templates defined in the NotifyConfig. This allows Android teams to customize their bug reports.
  • markdown.go & html.go:

    • Implement the Formatter interface for Markdown and HTML respectively.
    • Define default templates for new regressions and when regressions go missing.
    • MarkdownFormatter can be configured with custom templates via NotifyConfig. It also provides a buildIDFromSubject template function, specifically designed for Android's commit message format, to extract build IDs.
    • viewOnDashboard is a utility function to construct a URL to the Perf explore page for the given regression.
  • email.go & issuetracker.go & noop.go:

    • Implement the Transport interface.
    • email.go: Uses emailclient to send emails. Splits comma/space-separated recipient lists.
    • issuetracker.go: Interacts with the Google Issue Tracker API. It requires API key secrets (configured via NotifyConfig) and uses OAuth2 for authentication. It can create new issues and update existing ones (e.g., to mark them obsolete).
    • noop.go: A null implementation for disabling notifications.
  • chromeperfnotifier.go:

    • Implements the Notifier interface directly, without using the Formatter or Transport abstractions in the same way as defaultNotifier. This is because it communicates directly with the Chrome Performance Dashboard's Anomaly API.
    • It translates Perf's regression data into the format expected by the Chromeperf API (ReportRegression).
    • Includes logic (isParamSetValid, getTestPath) to ensure the data conforms to Chromeperf's requirements (e.g., specific param keys like master, bot, benchmark, test).
    • Determines if a regression is an improvement based on the improvement_direction parameter and the step direction.
  • commitrange.go:

    • Provides URLFromCommitRange, a utility function to generate a URL for a commit or a range of commits. If a commitRangeURLTemplate is provided (e.g., via configuration), it will be used to create a URL showing the diff between two commits. Otherwise, it defaults to the individual commit's URL. This is used by formatters to create links in notifications.
  • common/notificationData.go:

    • Defines NotificationData (simple struct for body and subject) and RegressionMetadata (a comprehensive struct holding all relevant information about a regression needed for notification generation). This promotes a common data structure for passing regression details.

Configuration and Customization (NotifyConfig):

The behavior of the notify module is heavily influenced by config.NotifyConfig. This configuration allows users to:

  • Choose the notification type (Notifications field): None, HTMLEmail, MarkdownIssueTracker, ChromeperfAlerting, AnomalyGrouper.
  • Specify the NotificationDataProvider: DefaultNotificationProvider or AndroidNotificationProvider.
  • Customize the subject and body of notifications using Go templates (Subject, Body, MissingSubject, MissingBody). This is particularly relevant for MarkdownFormatter and androidNotificationProvider.
  • Provide settings for IssueTrackerTransport (API key secret locations).

This design allows for flexibility in how notifications are generated and delivered, catering to different needs and integrations. For instance, the Android team can have highly customized bug reports, while other users might prefer standard email notifications. The ChromeperfNotifier demonstrates a direct integration with another system, bypassing some of the general-purpose formatting/transport layers when a specific API is targeted.

Module: /go/notifytypes

Perf Notifytypes Module

Overview

The notifytypes module in Perf defines the various types of notification mechanisms that can be triggered in response to performance regressions or other significant events. It also defines types for data providers that supply the necessary information for these notifications. This module serves as a central point for enumerating and categorizing notification strategies, enabling flexible and extensible notification handling within the Perf system.

Why: Design Decisions

The primary goal of this module is to provide a structured and type-safe way to manage notification types.

  • Extensibility: By defining notification types as constants of a custom Type string, new notification methods can be easily added in the future without requiring significant code changes in consuming modules. This promotes loose coupling and allows the notification system to evolve independently.
  • Clarity and Readability: Using named constants (e.g., HTMLEmail, MarkdownIssueTracker) instead of raw strings makes the code more self-documenting and reduces the likelihood of errors due to typos.
  • Centralized Definition: Having all notification types defined in one place simplifies maintenance and provides a clear overview of the available notification options.
  • Separation of Concerns: The NotificationDataProviderType allows for different sources or formats of data to be used for generating notifications, separating the concern of what data is needed from how the notification is delivered. This is crucial, for example, when different platforms (like Android) might require specific data formatting or additional information.

How: Implementation Choices

  • Type (string alias): The Type is defined as an alias for string. This allows for string-based storage and transmission of notification types (e.g., in configuration files or database entries) while still providing a degree of type safety within Go code.
  • Constants for Notification Types: Specific notification mechanisms are defined as constants of type Type. This ensures that only valid, predefined notification types can be used.
    • HTMLEmail: Indicates notifications sent as HTML-formatted emails. This is suitable for rich content and direct user communication.
    • MarkdownIssueTracker: Represents notifications formatted in Markdown, intended for integration with issue tracking systems. This facilitates automated ticket creation or updates.
    • ChromeperfAlerting: Specifies that regression data should be sent to the Chromeperf alerting system. This allows for integration with a specialized alerting infrastructure.
    • AnomalyGrouper: Designates that regressions should be processed by an anomaly grouping logic, which then determines the appropriate action. This enables more sophisticated handling of multiple related anomalies.
    • None: A special type indicating that no notification should be sent. This is useful for disabling notifications in certain contexts or for configurations where alerting is not desired.
  • AllNotifierTypes Slice: This public variable provides a convenient way for other parts of the system to iterate over or validate against all known notification types.
  • NotificationDataProviderType (string alias): Similar to Type, this defines the kind of data provider to use for notifications.
    • DefaultNotificationProvider: Represents the standard or default data provider.
    • AndroidNotificationProvider: Indicates a specialized data provider tailored for Android-specific notification requirements. This might involve fetching different metrics, formatting data in a particular way, or including Android-specific metadata.

Responsibilities and Key Components

  • notifytypes.go: This is the sole file in the module and contains all the definitions.
    • Defines Notification Types: Its primary responsibility is to enumerate the supported notification mechanisms (HTMLEmail, MarkdownIssueTracker, ChromeperfAlerting, AnomalyGrouper, None). This acts as a contract for other modules that implement or consume notification functionalities.
    • Defines Data Provider Types: It also defines the types of data providers (DefaultNotificationProvider, AndroidNotificationProvider) that can be used to source information for notifications. This allows the notification system to adapt to different data sources or formats.
    • Provides an Exhaustive List: The AllNotifierTypes variable makes it easy for other components to get a list of all valid notification types, for example, for display in a UI or for validation purposes.

Key Workflows/Processes

While this module itself doesn't implement workflows, it underpins them. A typical conceptual workflow where these types would be used is:

  1. Regression Detected: The Perf system identifies a performance regression. Regression Event -->
  2. Configuration Checked: The system checks the configuration associated with the metric/test that regressed. This configuration would specify a notifytypes.Type. Configuration Lookup (specifies notifytypes.Type, e.g., HTMLEmail) -->
  3. Notifier Selected: Based on the notifytypes.Type from the configuration, the appropriate notifier implementation is selected. Notification System -->
  4. Data Provider Selected (if applicable): If the configuration also specifies a notifytypes.NotificationDataProviderType, the corresponding data provider is chosen. Data Provider (e.g., AndroidNotificationProvider) -->
  5. Notification Sent: The selected notifier uses the data (potentially from the selected data provider) to construct and send the notification. Notification Delivered (e.g., Email Sent)

For example, if a regression is detected for an Android benchmark and the configuration specifies HTMLEmail as the Type and AndroidNotificationProvider as the NotificationDataProviderType:

Regression Event -> Config: {Type: HTMLEmail, DataProvider: AndroidNotificationProvider} -> Select EmailNotifier -> Select AndroidDataProvider -> AndroidDataProvider fetches data -> EmailNotifier formats and sends HTML email

Module: /go/perf-tool

The perf-tool module provides a command-line interface (CLI) for interacting with various aspects of the Perf performance monitoring system. It allows developers and administrators to manage configurations, inspect data, perform database maintenance tasks, and validate ingestion files.

The primary motivation behind perf-tool is to offer a centralized and scriptable way to perform common Perf operations that would otherwise require manual intervention or direct database interaction. This simplifies workflows and enables automation of routine tasks.

The core functionality is organized into subcommands, each addressing a specific area of Perf:

  • config: Manages Perf instance configurations.
    • create-pubsub-topics-and-subscriptions: Sets up the necessary Google Cloud Pub/Sub topics and subscriptions required for data ingestion. This is crucial for ensuring that Perf instances can receive and process performance data.
    • validate: Checks the syntax and validity of a Perf instance configuration file. This helps prevent deployment of misconfigured instances.
  • tiles: Interacts with the tiled data storage used by Perf's tracestore. Tiles are segments of time-series data.
    • last: Displays the index of the most recent tile, providing insight into the current state of data ingestion.
    • list: Shows a list of recent tiles and the number of traces they contain, useful for understanding data volume and distribution.
  • traces: Allows querying and exporting trace data.
    • list: Retrieves and displays the IDs of traces that match a given query within a specific tile. This is useful for ad-hoc data exploration.
    • export: Exports trace data matching a query and commit range to a JSON file. This enables external analysis or data migration.
  • ingest: Manages the data ingestion process.
    • force-reingest: Triggers the re-ingestion of data files from Google Cloud Storage (GCS) for a specified time range. This is useful for reprocessing data after configuration changes or to fix ingestion errors. The workflow is:
    • Parse start and stop time parameters.
    • Iterate through configured GCS source prefixes.
    • For each prefix, determine hourly GCS directories within the time range.
    • List files in each directory.
    • For each file, create a Pub/Sub message with the GCS object attributes (bucket and name).
    • Publish these messages to the configured ingestion topic. This simulates the GCS notification events that trigger ingestion.
    • validate: Validates the format and content of an ingestion file against the expected schema and parsing rules. This helps ensure data quality before ingestion.
  • database: Provides tools for backing up and restoring Perf database components. This is critical for disaster recovery and data migration.
    • backup:
    • alerts: Backs up alert configurations to a zip file.
    • shortcuts: Backs up saved shortcut configurations to a zip file.
    • regressions: Backs up regression data (detected performance changes) and associated shortcuts to a zip file. It backs up data up to a specified date (defaulting to four weeks ago). The process involves iterating backward through commits in batches, fetching regressions for each commit range, and storing them along with any shortcuts referenced in those regressions.
    • restore:
    • alerts: Restores alert configurations from a backup file.
    • shortcuts: Restores shortcut configurations from a backup file.
    • regressions: Restores regression data and their associated shortcuts from a backup file. It's important to note that restoring regressions also attempts to re-create the associated shortcuts.
  • trybot: Contains experimental functionality related to trybot (pre-submit testing) data.
    • reference: Generates a synthetic nanobench reference file. This file is constructed by loading a specified trybot results file, identifying all trace IDs within it, and then fetching historical sample data for these traces from the main Perf instance (specifically, from the last N ingested files). The aggregated historical samples are then formatted into a new nanobench JSON file. This allows for comparing trybot results against a baseline derived from recent production data using tools like nanostat.
  • markdown: Generates Markdown documentation for the perf-tool CLI itself.

The main.go file sets up the CLI application using the urfave/cli library. It defines flags, commands, and subcommands, and maps them to corresponding functions in the application package. It handles flag parsing, configuration loading (from a file, with optional connection string overrides), and initialization of logging.

The application/application.go file defines the Application interface and its concrete implementation app. This interface abstracts the core logic for each command, promoting testability and separation of concerns. The app struct implements methods that interact with various Perf components like tracestore, alertStore, shortcutStore, regressionStore, and GCS.

Key design choices include:

  • Interface-based application logic (Application interface): This allows for mocking the application logic during testing (as seen in main_test.go and application/mocks/Application.go), ensuring that the CLI command parsing and flag handling can be tested independently of the actual backend operations.
  • Configuration-driven: Most operations require an instance configuration file (--config_filename), which defines data store connections, GCS sources, etc. This makes the tool adaptable to different Perf deployments.
  • Use of helper builders: Functions from perf/go/builders are used to instantiate components like TraceStore, AlertStore, etc., based on the provided instance configuration. This centralizes component creation logic.
  • Zip format for backups: Database backups for alerts, shortcuts, and regressions are stored in zip files. Inside these zip files, data is typically serialized using encoding/gob. This provides a simple and portable backup solution.
  • Batching for large operations: When backing up regressions, data is fetched in batches of commits (regressionBatchSize) to manage memory and avoid overwhelming the database.
  • Pub/Sub for re-ingestion: The ingest force-reingest command leverages Pub/Sub by publishing messages that mimic GCS notifications, effectively triggering the standard ingestion pipeline.

The application/mocks/Application.go file contains a mock implementation of the Application interface, generated by the mockery tool. This is used in main_test.go to test the command-line argument parsing and dispatch logic without actually performing the underlying operations.

Module: /go/perfclient

The perfclient module provides an interface for sending performance data to Skia Perf's ingestion system. The primary goal of this module is to abstract the complexities of interacting with Google Cloud Storage (GCS), which is the underlying mechanism Perf uses for data ingestion. By providing a dedicated client, it simplifies the process for other applications and services that need to report performance metrics.

The core design centers around a ClientInterface and its concrete implementation, Client. This approach allows for easy mocking and testing, promoting loose coupling between the perfclient and its consumers.

Key Components and Responsibilities:

  • perf_client.go:

    • ClientInterface: This interface defines the contract for pushing performance data. The key method is PushToPerf. The decision to use an interface here is crucial for testability and dependency injection. It allows consumers to use a real GCS-backed client in production and a mock client in tests.
    • Client: This struct is the concrete implementation of ClientInterface. It holds a gcs.GCSClient instance, which is responsible for the actual communication with Google Cloud Storage, and a basePath string that specifies the root directory within the GCS bucket where performance data will be stored. The constructor New takes these as arguments, allowing users to configure the GCS bucket and the top-level folder for their data.
    • PushToPerf method: This is the workhorse of the module.
    • It takes a time.Time object (now), a folderName, a filePrefix, and a format.BenchData struct (which represents the performance metrics).
    • The format.BenchData is first marshaled into a JSON string. This is the standard format Perf expects for ingestion.
    • The JSON data is then compressed using gzip. This is a performance optimization, as GCS can automatically decompress gzipped files with the correct ContentEncoding header, reducing storage costs and transfer times.
    • A deterministic GCS object path is constructed using the objectPath helper function. This path incorporates the basePath, the current timestamp (formatted as YYYY/MM/DD/HH/), the folderName, and a filename composed of the filePrefix, an MD5 hash of the JSON data, and a millisecond-precision timestamp. The inclusion of the MD5 hash helps in avoiding duplicate uploads of identical data and can be useful for debugging or data verification. The timestamp in the path and filename ensures that data from different runs or times are stored separately and can be easily queried.
    • Finally, the compressed data is uploaded to GCS using the storageClient.SetFileContents method. Crucially, it sets ContentEncoding: "gzip" and ContentType: "application/json" in the gcs.FileWriteOptions. This metadata informs GCS about the compression and data type, enabling features like automatic decompression.
    • objectPath function: This helper function is responsible for constructing the unique GCS path for each performance data file. The rationale for this specific path structure (basePath/YYYY/MM/DD/HH/folderName/filePrefix_hash_timestamp.json) is to organize data chronologically and by task, making it easier to browse, query, and manage within GCS. The hash ensures uniqueness and integrity.
  • mock_perf_client.go:

    • MockPerfClient: This provides a mock implementation of ClientInterface using the testify/mock library. This is essential for unit testing components that depend on perfclient without requiring actual GCS interaction. It allows developers to define expected calls to PushToPerf and verify that their code interacts with the client correctly. The NewMockPerfClient constructor returns a pointer to ensure that the methods provided by mock.Mock (like On and AssertExpectations) are accessible.

Workflow: Pushing Performance Data

The primary workflow involves a client application using perfclient to send performance data:

Client App                   perfclient.Client                  gcs.GCSClient
     |                            |                                 |
     | -- Call PushToPerf(now,    |                                 |
     |    folder, prefix, data) ->|                                 |
     |                            | -- Marshal data to JSON         |
     |                            | -- Compress JSON (gzip)         |
     |                            | -- Construct GCS objectPath     |
     |                            |    (includes time, folder,     |
     |                            |     prefix, data hash)          |
     |                            |                                 |
     |                            | -- Call SetFileContents(path,   |
     |                            |    options, compressed_data) -> |
     |                            |                                 | -- Upload to GCS
     |                            |                                 |    with gzip encoding
     |                            |                                 |    and JSON content type
     |                            | <-------------------------------| -- Return success/error
     | <--------------------------|                                 |
     | -- Receive success/error   |                                 |

The design emphasizes creating a clear separation of concerns: the perfclient handles the formatting, compression, and path generation logic specific to Perf's ingestion requirements, while the underlying gcs.GCSClient handles the raw GCS communication. This makes the perfclient a focused and reusable component for any system needing to integrate with Skia Perf.

Module: /go/perfresults

Module Overview

The perfresults module is responsible for fetching, parsing, and processing performance results data generated by Telemetry-based benchmarks in the Chromium project. This data typically resides in perf_results.json files. The module provides functionalities to:

  1. Load Performance Data: Retrieve performance results from various sources, primarily Buildbucket builds. This involves interacting with Buildbucket to get build information, Swarming to identify relevant tasks and their outputs, and RBE-CAS (Content Addressable Storage) to download the actual perf_results.json files.
  2. Parse Performance Data: Interpret the structure of perf_results.json files. These files contain sets of histograms, where each histogram represents a specific benchmark measurement. The parser extracts these histograms and associated metadata.
  3. Process and Transform Data: Convert the parsed performance data into a format suitable for ingestion by other systems, such as the Perf ingestion pipeline. This includes aggregating histogram samples (e.g., calculating mean, max, min) and structuring the data according to a defined schema.

The primary goal is to provide a reliable and efficient way to access and utilize Chromium's performance data for analysis and monitoring.

Design Decisions and Implementation Choices

Data Loading Workflow

The process of loading performance results from a Buildbucket build involves several steps:

Buildbucket ID -> BuildInfo -> Swarming Task ID -> Child Swarming Task IDs -> CAS Outputs -> PerfResults

  1. Buildbucket Interaction (buildbucket.go):

    • Why: Buildbucket is the entry point for CI/CQ builds. It contains information about the build, including the associated Swarming task and crucial metadata like git revision and commit position.
    • How: The bbClient interacts with the Buildbucket PRPC API to fetch build details using a given buildID. It specifically requests fields like builder, status, infra.backend.task.id (for the Swarming task ID), output.properties (for git revision information), and input.properties (for perf_dashboard_machine_group).
    • The BuildInfo struct is populated with this information, providing a consolidated view of the build's context. The GetPosition() method on BuildInfo is crucial as it determines the commit identifier (either commit position or git hash) used for associating the performance data with a specific point in the codebase.
  2. Swarming Interaction (swarming.go):

    • Why: The main Buildbucket task often spawns multiple child Swarming tasks, each running a subset of benchmarks. We need to identify all these child tasks to gather all performance results.
    • How: The swarmingClient uses the Swarming PRPC API.
      • findChildTaskIds: Given a parent Swarming task ID (obtained from BuildInfo), this function lists all child tasks by querying for tasks with a matching parent_task_id tag. The query is scoped by the parent task's creation and completion timestamps to narrow down the search.
      • findTaskCASOutputs: For each child task ID, this function retrieves the task result, specifically looking for the CasOutputRoot. This reference points to the RBE-CAS location where the task's output files (including perf_results.json) are stored.
  3. RBE-CAS Interaction (rbecas.go):

    • Why: perf_results.json files are stored in RBE-CAS. RBE-CAS provides efficient and reliable storage for large build artifacts.
    • How: The RBEPerfLoader uses the RBE SDK to interact with CAS.
      • fetchPerfDigests: Given a CAS reference (pointing to the root directory of a task's output), this function:
      • Reads the root Directory proto.
      • Retrieves the entire directory tree using GetDirectoryTree.
      • Flattens the tree to get a map of file paths to their digests.
      • Filters for files named perf_results.json. The path structure is expected to be benchmark_name/perf_results.json, allowing association of results with a specific benchmark.
      • loadPerfResult: Given a digest for a perf_results.json file, this reads the blob from CAS and parses it using NewResults.
      • LoadPerfResults: This orchestrates the loading for multiple CAS references (from multiple child Swarming tasks). It iterates through each CAS reference, fetches the digests of perf_results.json files, loads each file, and then merges results from the same benchmark. Merging is important because a single benchmark might have its results split across multiple files or tasks.
  4. Orchestration (perf_loader.go):

    • Why: A central loader is needed to tie together the interactions with Buildbucket, Swarming, and RBE-CAS.
    • How: The loader.LoadPerfResults method coordinates the entire workflow:
      1. Initializes bbClient to get BuildInfo.
      2. Initializes swarmingClient to find child task IDs and then their CAS outputs.
      3. It performs a sanity check (checkCasInstances) to ensure all CAS outputs come from the same RBE instance, simplifying client initialization.
      4. Initializes RBEPerfLoader (via rbeProvider for testability) for the determined CAS instance.
      5. Calls RBEPerfLoader.LoadPerfResults with the list of CAS references to fetch and parse all perf_results.json files.
    • The use of rbeProvider is a good example of dependency injection, allowing tests to mock the RBE-CAS interaction.

Performance Data Parsing (perf_results_parser.go)

  • Why: perf_results.json files have a specific, somewhat complex structure. A dedicated parser is needed to extract meaningful data (histograms and their metadata).
  • How:
    • The PerfResults struct is the main container, holding a map of TraceKey to Histogram.
    • TraceKey uniquely identifies a trace, composed of ChartName (metric name), Unit, Story (user journey/test case), Architecture, and OSName. These fields are extracted from the histogram's own properties and its associated “diagnostics” which are references to other metadata objects within the JSON file.
    • Histogram stores the SampleValues (the actual measurements).
    • Streaming JSON Decoding: NewResults uses json.NewDecoder to process the input io.Reader in a streaming fashion.
    • Why Streaming?: perf_results.json files can be very large (10MB+). Reading the entire file into memory before parsing would be inefficient and could lead to high memory usage. Streaming allows processing the JSON array element by element.
    • Implementation:
      1. It first expects and consumes the opening [ of the JSON array.
      2. It then iterates while decoder.More() is true, decoding each element into a singleEntry struct.
      3. singleEntry is a union-like struct that can hold different types of objects found in the JSON (histograms, generic sets, date ranges, related name maps). This is determined by checking fields like Name (present for histograms) or Type.
      4. If an entry is a histogram (entry.Name != ""), it‘s converted to TraceKey and Histogram via histogramRaw.asTraceKeyAndHistogram. This conversion involves looking up GUIDs from the histogram’s Diagnostics map in a locally maintained metadata map (md).
      5. Other entry types (GenericSet, DateRange, RelatedNameMap) are stored in the md map, keyed by their GUID, so they can be referenced by histograms later in the stream.
      6. Parsed histograms are merged into pr.Histograms. If a TraceKey already exists, sample values are appended.
      7. Finally, it consumes the closing ] of the JSON array.
    • Aggregation: The Histogram type provides methods for common aggregations (Min, Max, Mean, Stddev, Sum, Count). AggregationMapping provides a convenient way to access these aggregation functions by string keys, which is used by downstream consumers like the ingestion module.
    • Legacy UnmarshalJSON: An UnmarshalJSON method exists, which reads the entire byte slice into memory. This is less efficient and marked for deprecation in favor of NewResults.

Data Ingestion Preparation (ingest/)

This submodule focuses on transforming the parsed PerfResults into the format.Format structure required by the Perf ingestion system.

  • json.go (ConvertPerfResultsFormat):

    • Why: The raw PerfResults structure is not directly ingestible. It needs to be reshaped.
    • How:
    • Iterates through each (TraceKey, Histogram) pair in the input PerfResults.
    • For each pair, it creates a format.Result. The Key map within format.Result is populated from TraceKey fields (chart, unit, story, arch, os).
    • The Measurements map within format.Result is populated by calling toMeasurement on the Histogram.
    • toMeasurement iterates through perfresults.AggregationMapping, applying each aggregation function to the histogram's samples. Each resulting aggregation (e.g., “max”, “mean”) becomes a format.SingleMeasurement with the aggregation type as its Value and the computed metric as its Measurement.
    • The final format.Format object includes the version, commit hash (GitHash), and any provided headers and links.
  • gcs.go:

    • Why: Provides utilities for determining the correct Google Cloud Storage (GCS) path where the transformed JSON files should be stored. This is based on conventions used by the Perf ingestion system.
    • How:
    • convertPath: Constructs a GCS path like gs://<bucket>/ingest/<time_path>/<build_info_path>/<benchmark>.
    • convertTime: Formats a time.Time into YYYY/MM/DD/HH (UTC).
    • convertBuildInfo: Formats BuildInfo into <MachineGroup>/<BuilderName>. It defaults MachineGroup to “ChromiumPerf” and BuilderName to “BuilderNone” if they are empty.
    • isInternal: Determines if the results are internal or public based on the BuilderName. It checks against a list of known external bot configurations (pinpoint/go/bot_configs). If not found, it defaults to internal. This determines whether PublicBucket (chrome-perf-public) or InternalBucket (chrome-perf-non-public) is used.

Key Components and Files

  • perf_loader.go: Orchestrates the loading of performance results from Buildbucket. NewLoader().LoadPerfResults() is the main entry point.
  • buildbucket.go: Handles interaction with the Buildbucket API to fetch build metadata. Defines BuildInfo.
  • swarming.go: Handles interaction with the Swarming API to find child tasks and their CAS outputs.
  • rbecas.go: Handles interaction with RBE-CAS to download and parse perf_results.json files. Defines RBEPerfLoader.
  • perf_results_parser.go: Parses the content of perf_results.json files. Defines PerfResults, TraceKey, Histogram, and the streaming NewResults parser.
  • ingest/json.go: Transforms parsed PerfResults into the format.Format structure for ingestion.
  • ingest/gcs.go: Provides utilities to determine GCS paths for storing transformed results.
  • cli/main.go: A command-line interface utility that uses the perfresults library to fetch results for a given Buildbucket ID and outputs them as JSON files in the ingestion format. This serves as a practical example and a tool for ad-hoc data retrieval.
  • testdata/: Contains JSON files used for replaying HTTP and gRPC interactions during tests (*.json, *.rpc), and sample perf_results.json files for parser testing. replay_test.go sets up the replay mechanism.

Workflows

Primary Workflow: Loading Perf Results from Buildbucket

User/System --Buildbucket ID--> perf_loader.LoadPerfResults()
  |
  +--> buildbucket.findBuildInfo() --PRPC call--> Buildbucket API
  |      (Returns BuildInfo: Swarming Task ID, Git Revision, Machine Group, etc.)
  |
  +--> swarming.findChildTaskIds() --PRPC call--> Swarming API (using Parent Task ID)
  |      (Returns list of Child Swarming Task IDs)
  |
  +--> swarming.findTaskCASOutputs() --PRPC calls--> Swarming API (for each Child Task ID)
  |      (Returns list of CASReference objects)
  |
  (Error if CAS instances differ for CASReferences)
  |
  +--> rbecas.RBEPerfLoader.LoadPerfResults() (with list of CASReferences)
       |
       +--> For each CASReference:
       |    |
       |    +--> rbecas.fetchPerfDigests() --RBE SDK calls--> RBE-CAS
       |    |      (Returns map of benchmark_name to digest of perf_results.json)
       |    |
       |    +--> For each (benchmark_name, digest):
       |         |
       |         +--> rbecas.loadPerfResult() --RBE SDK call (ReadBlob)--> RBE-CAS
       |         |      |
       |         |      +--> perf_results_parser.NewResults() (Parses JSON stream)
       |         |             (Returns PerfResults object for this file)
       |         |
       |         +--> (Merge with existing PerfResults for the same benchmark_name)
       |
       (Returns map[benchmark_name]*PerfResults and BuildInfo)

CLI Workflow: Fetching and Converting Perf Results

CLI User --Build ID, Output Dir--> cli/main.main()
  |
  +--> perfresults.NewLoader().LoadPerfResults(Build ID)
  |      (Executes the Primary Workflow described above)
  |      (Returns BuildInfo, map[benchmark]*PerfResults)
  |
  +--> For each (benchmark, perfResult) in results:
       |
       +--> ingest.ConvertPerfResultsFormat(perfResult, buildInfo.GetPosition(), headers, links)
       |      (Transforms PerfResults to ingest.Format)
       |
       +--> Marshal ingest.Format to JSON
       |
       +--> Write JSON to output file: <outputDir>/<benchmark>_<BuildID>.json
       |
       +--> Print output filename to stdout

Temporal Worker (Placeholder)

The workflows/worker/main.go file sets up a Temporal worker. Currently, it‘s a basic skeleton that initializes a worker and connects to a Temporal server. It doesn’t register any specific activities or workflows from the perfresults module itself. Its presence suggests an intention to integrate perfresults functionalities into Temporal workflows in the future, possibly for automated ingestion or processing tasks. The worker itself is a generic Temporal worker setup.

Testing Strategy

The module employs a robust testing strategy:

  • Unit Tests: Each Go file generally has a corresponding _test.go file with unit tests for its specific logic. For example, perf_results_parser_test.go tests the JSON parsing, and buildbucket_test.go tests BuildInfo logic.
  • Replay Testing (replay_test.go, testdata/):
    • Why: Directly calling external services (Buildbucket, Swarming, RBE-CAS) in tests makes them slow, flaky, and dependent on external state. Replay testing records actual interactions once and then “replays” them during subsequent test runs.
    • How:
    • HTTP interactions (with Buildbucket and Swarming PRPC servers) are replayed using cloud.google.com/go/httpreplay. Recorded interactions are stored as .json files in testdata/.
    • gRPC interactions (with RBE-CAS) are replayed using cloud.google.com/go/rpcreplay. Recorded interactions are stored as gzipped .rpc files in testdata/.
    • A command-line flag (-record_path) controls whether tests run in replay mode (reading from testdata/) or record mode (writing new replay files to the specified path). This allows updating replay files when external APIs change or new test cases are needed.
    • setupReplay() and newRBEReplay() in replay_test.go are helper functions that configure the HTTP client and RBE client for either recording or replaying.
  • Test Data (testdata/perftest/): Contains various perf_results.json files (e.g., full.json, empty.json, merged.json) to test different scenarios for the perf_results_parser.go. This ensures the parser correctly handles different valid and edge-case inputs.
  • Example Usage as Test (cli/main.go): The CLI itself serves as an integration test for the core loading and conversion logic. Its tests (perf_loader_test.go for example) often use the replay mechanism to test the end-to-end flow from Build ID to parsed PerfResults.

This combination ensures both isolated unit correctness and reliable integration testing without external dependencies during typical test runs.

Module: /go/perfserver

The perfserver module serves as the central executable for the Perf performance monitoring system. It consolidates various essential components into a single command-line tool, simplifying deployment and management. The primary goal is to provide a unified entry point for running the web UI, data ingestion processes, regression detection, and maintenance tasks. This approach avoids the complexity of managing multiple separate services and their configurations.

The module leverages the urfave/cli library to define and manage sub-commands, each corresponding to a distinct functional area of Perf. This design allows for clear separation of concerns while maintaining a single binary. Configuration for each sub-command is handled through flags, with the config package providing structured types for these flags.

Key components and their responsibilities:

  • main.go: This is the entry point of the perfserver executable.

    • Why: It orchestrates the initialization and execution of the different Perf sub-systems.

    • How: It defines a cli.App with several sub-commands:

    • frontend: This sub-command launches the main web user interface for Perf.

      • Why: To provide users with a visual way to explore performance data, configure alerts, and view regressions.
      • How: It initializes and runs the frontend component (from //perf/go/frontend). Configuration is passed via config.FrontendFlags. The frontend component itself handles serving HTTP requests and rendering the UI.
    • maintenance: This sub-command starts background maintenance tasks.

      • Why: Certain operations, like data cleanup, schema migrations, or periodic recalculations, are necessary for the long-term health and efficiency of the Perf system. These tasks often need to be run as singletons to avoid conflicts.
      • How: It initializes and runs the maintenance component (from //perf/go/maintenance). It first validates the instance configuration (using //perf/go/config/validate) and then starts the maintenance routines. Prometheus metrics are exposed for monitoring.
    • ingest: This sub-command runs the data ingestion process.

      - **Why**: To continuously import performance data from various
        sources (e.g., build artifacts, test results) and populate the
        central data store (TraceStore).
      - **How**: It initializes and runs the ingestion process logic (from
        `//perf/go/ingest/process`). Similar to `maintenance`, it validates
        the instance configuration. It supports parallel ingestion for
        improved throughput. Prometheus metrics are also exposed.
      - Data Ingestion Workflow: `Configured Sources --> [Ingest Process]
      

      --Parses/Validates--> [TraceStore] | Handles incoming files Populates data`

    • cluster: This sub-command runs the regression detection process.

      - **Why**: To automatically analyze incoming performance data against
        configured alerts and identify significant performance regressions.
      - **How**: Interestingly, this sub-command also utilizes the
        `frontend.New` and `f.Serve()` mechanism, similar to the `frontend`
        sub-command. This suggests that the regression detection logic might
        be tightly coupled with or exposed through the same underlying
        service framework as the main UI, potentially for sharing
        configuration or common infrastructure. It uses
        `config.FrontendFlags` but specifically for clustering-related
        settings (indicated by `AsCliFlags(true)`).
      - Regression Detection Workflow: `[TraceStore] --New Data--> [Cluster
      

      Process] --Applies Alert Rules--> [Alerts/Notifications] ^ | | Identifies Regressions +-------------------------+`

    • markdown: A utility sub-command to generate Markdown documentation for perfserver itself.

      • Why: To provide up-to-date command-line help in a portable format.
      • How: It uses the ToMarkdown() method provided by the urfave/cli library.
    • Logging: The Before hook in the cli.App configures sklog to output logs to standard output, ensuring that operational messages from any sub-command are visible.

    • Configuration Loading: For sub-commands like ingest and maintenance, instance configuration is loaded from a specified file (ConfigFilename flag) and validated using //perf/go/config/validate. The database connection string can be overridden via a command-line flag.

    • Metrics: The ingest and maintenance sub-commands initialize Prometheus metrics, allowing for monitoring of their operational health and performance.

The design emphasizes modularity by delegating the core logic of each function (UI, ingestion, clustering, maintenance) to dedicated packages (//perf/go/frontend, //perf/go/ingest/process, //perf/go/maintenance). perfserver acts as the conductor, parsing command-line arguments, loading appropriate configurations, and invoking the correct sub-system. This structure makes the overall Perf system more maintainable and easier to understand, as each component has a well-defined responsibility.

Module: /go/pinpoint

The /go/pinpoint module provides a Go client for interacting with the Pinpoint service, which is part of Chromeperf. Pinpoint is a performance testing and analysis tool used to identify performance regressions and improvements. This client enables other Go applications within the Skia infrastructure to programmatically trigger Pinpoint jobs.

Core Functionality:

The primary purpose of this module is to abstract the complexities of making HTTP requests to the Pinpoint API. It handles authentication, request formatting, and response parsing. This allows other services to easily initiate two main types of Pinpoint jobs:

  1. Bisect Jobs: These jobs are used to identify the specific commit that caused a performance regression or improvement between two given git revisions. The client constructs the appropriate URL and parameters for the pinpointURL endpoint.
  2. Try Jobs (A/B Testing): These jobs compare the performance of a base commit (or patch) against an experimental commit (or patch). This is particularly useful for evaluating the performance impact of a pending code change. The client uses the pinpointLegacyURL for these types of jobs.

Design Decisions and Implementation Choices:

  • Separate Endpoints for Bisect and Try Jobs: The Pinpoint service has distinct API endpoints for creating bisect jobs (pinpointURL) and legacy try jobs (pinpointLegacyURL). The client reflects this by having separate methods (CreateBisect and CreateTryJob) and corresponding request URL builder functions (buildBisectRequestURL and buildTryJobRequestURL). This design choice directly maps to the underlying Pinpoint API structure, making it clear which type of job is being created.
  • URL Parameter Encoding: Both types of Pinpoint jobs are initiated via HTTP GET requests where all parameters are encoded in the URL query string. The buildBisectRequestURL and buildTryJobRequestURL functions are responsible for constructing these URLs by populating url.Values and then encoding them. This is a direct consequence of how the Pinpoint API is designed.
  • Authentication: The client utilizes Google's default token source for authentication (google.DefaultTokenSource) with the auth.ScopeUserinfoEmail scope. This is a standard approach for service-to-service authentication within the Google Cloud ecosystem, ensuring secure communication with the Pinpoint API.
  • Metrics Collection: The client integrates with go/metrics2 to track the number of times bisect and try jobs are called and the number of times these calls fail. This is crucial for monitoring the reliability and usage of the Pinpoint integration.
  • Error Handling: The module uses go/skerr for wrapping errors. This provides more context to errors, making debugging easier. For example, if a Pinpoint request fails, the HTTP status code and response body are included in the error message.
  • Dependency on pinpoint/go/bot_configs: For try jobs, the target parameter is required by the Pinpoint API. This target is derived from the Configuration (bot) and Benchmark using the bot_configs.GetIsolateTarget function. This indicates a specific configuration setup for running the performance tests.
  • test_path Parameter for Bisect Jobs: The Pinpoint API requires a test_path parameter for bisect jobs. This parameter is constructed by joining several components like “ChromiumPerf”, configuration, benchmark, chart, and story. This specific formatting is a legacy requirement of the Chromeperf API.
  • Mandatory bug_id for Bisect Jobs: The Pinpoint API mandates the bug_id parameter for bisect jobs. If not provided by the caller, the client defaults it to "null". This reflects a specific constraint of the upstream service.
  • tags Parameter: Both job types include a tags parameter set to {"origin":"skia_perf"}. This helps in tracking and filtering jobs originating from the Skia infrastructure within the Pinpoint system.

Key Components/Files:

  • pinpoint.go: This is the sole Go file in the module and contains all the logic.
    • Client struct: Represents the Pinpoint client. It holds the authenticated http.Client and counters for metrics.
    • New() function: The constructor for the Client. It initializes the HTTP client with appropriate authentication.
    • CreateLegacyTryRequest and CreateBisectRequest structs: Define the structure of the data required to create try jobs and bisect jobs, respectively. These fields directly map to the parameters expected by the Pinpoint API.
    • CreatePinpointResponse struct: Defines the structure of the JSON response from Pinpoint, which includes the JobID and JobURL.
    • CreateTryJob() method:
    • Takes a CreateLegacyTryRequest and a context.Context.
    • Calls buildTryJobRequestURL to construct the request URL.
    • Makes an HTTP POST request (though parameters are in URL, Pinpoint endpoint expects POST for creation) to the pinpointLegacyURL.
    • Parses the JSON response into a CreatePinpointResponse.
    • Handles errors and increments metrics.
    • CreateBisect() method:
    • Similar to CreateTryJob(), but takes a CreateBisectRequest.
    • Calls buildBisectRequestURL.
    • Makes an HTTP POST request to the pinpointURL.
    • Parses the response and handles errors/metrics.
    • buildTryJobRequestURL() function:
    • Takes a CreateLegacyTryRequest.
    • Validates required fields like Benchmark and Configuration.
    • Retrieves the target using bot_configs.GetIsolateTarget.
    • Populates url.Values with all relevant parameters from the request, including hardcoded values like comparison_mode and tags.
    • Returns the fully formed URL string.
    • buildBisectRequestURL() function:
    • Takes a CreateBisectRequest.
    • Populates url.Values with parameters from the request.
    • Sets a default value for bug_id if not provided.
    • Constructs the test_path parameter based on available request fields.
    • Includes the tags parameter.
    • Returns the fully formed URL string.

Key Workflows:

  1. Creating a Bisect Job:

    Application Code                   go/pinpoint.Client                    Pinpoint API
    ----------------                   ------------------                    ------------
    1. CreateBisectRequest data ---->
    2. Calls client.CreateBisect() -->
                                       3. buildBisectRequestURL()
                                          (constructs URL with params)
                                       4. HTTP POST to pinpointURL -------->
                                                                             5. Processes request
                                                                             6. Returns JSON response
                                       <----------------------------------- 7. Receives HTTP response
                                       8. Parses JSON into
                                          CreatePinpointResponse
    <--------------------------------- 9. Returns CreatePinpointResponse
    
  2. Creating a Try Job (A/B Test):

    Application Code                   go/pinpoint.Client                         Pinpoint API (Legacy)
    ----------------                   ------------------                         ---------------------
    1. CreateLegacyTryRequest data ->
    2. Calls client.CreateTryJob() -->
                                       3. buildTryJobRequestURL()
                                          (gets 'target' from bot_configs,
                                           constructs URL with params)
                                       4. HTTP POST to pinpointLegacyURL ----->
                                                                                  5. Processes request
                                                                                  6. Returns JSON response
                                       <---------------------------------------- 7. Receives HTTP response
                                       8. Parses JSON into
                                          CreatePinpointResponse
    <--------------------------------- 9. Returns CreatePinpointResponse
    

Module: /go/pivot

Pivot Module Documentation

High-Level Overview

The pivot module provides functionality analogous to pivot tables in spreadsheets or GROUP BY operations in SQL. Its primary purpose is to aggregate and summarize trace data within a DataFrame based on specified grouping criteria and operations. This allows users to transform raw trace data into more insightful, summarized views, facilitating comparisons and analysis across different dimensions of the data. For example, one might want to compare the performance of ‘arm’ architecture machines against ‘intel’ architecture machines by summing or averaging their respective performance metrics.

Design and Implementation

The core of the pivot module revolves around the Request struct and the Pivot function.

Request Struct:

The Request struct encapsulates the parameters for a pivot operation. It defines:

  • GroupBy: A slice of strings representing the parameter keys to group the traces by. This is the fundamental dimension along which the data will be aggregated. For instance, if GroupBy is ["arch"], all traces with the same ‘arch’ value will be grouped together.
  • Operation: An Operation type (e.g., Sum, Avg, Geo) that specifies how the values within each group of traces should be combined. This operation is applied to each point in the traces within a group, resulting in a new, summarized trace for that group.
  • Summary: An optional slice of Operation types. If provided, these operations are applied to the resulting traces from the GroupBy step. Each Summary operation generates a single value (a column in the final output if viewed as a table) for each grouped trace. If Summary is empty, the output is a DataFrame where each row is a summarized trace (suitable for plotting).

Pivot Function Workflow:

The Pivot function executes the aggregation and summarization process. Here's a breakdown of its key steps and the reasoning behind them:

  1. Input Validation (req.Valid()):

    • Why: To ensure the request is well-formed before proceeding with potentially expensive computations. This prevents errors due to missing GroupBy keys or invalid Operation or Summary values.
    • How: It checks if GroupBy is non-empty and if the specified Operation and Summary operations are among the predefined valid operations (AllOperations).
  2. Initialization and Grouping Structure (groupedTraceSets):

    • Why: To efficiently organize traces into their respective groups. A map is used where keys are the group identifiers (e.g., “,arch=arm,”) and values are types.TraceSet containing traces belonging to that group.
    • How:
      • It pre-populates groupedTraceSets by determining all possible unique combinations of values for the GroupBy keys present in the input DataFrame's ParamSet. This is done using df.ParamSet.CartesianProduct(req.GroupBy). This pre-population ensures that even groups with no matching traces are considered, although they will be filtered out later if they remain empty.
      • It then iterates through each trace in the input DataFrame (df.TraceSet).
      • For each trace, it extracts the relevant parameter values specified in req.GroupBy to form a groupKey using groupKeyFromTraceKey. This function ensures that only traces containing all the GroupBy keys contribute to a group. If a trace is missing a GroupBy key, it's ignored.
      • The trace is then added to the types.TraceSet associated with its groupKey in groupedTraceSets.
    Input DataFrame (df.TraceSet)
            |
            v
    For each traceID, trace in df.TraceSet:
      Parse traceID into params
      groupKey = groupKeyFromTraceKey(params, req.GroupBy)
      If groupKey is valid:
        Add trace to groupedTraceSets[groupKey]
            |
            v
    Grouped Traces (groupedTraceSets)
    
  3. Applying the GroupBy Operation:

    • Why: To perform the primary aggregation based on the req.Operation.
    • How:
      • It iterates through the groupedTraceSets.
      • For each non-empty group, it applies the groupByOperation function corresponding to req.Operation (obtained from opMap) to the types.TraceSet of that group. The opMap is a crucial design choice, mapping Operation constants to their respective implementation functions (one for grouping traces, another for summarizing single traces). This provides a clean and extensible way to manage different aggregation functions.
      • The result of this operation is a single summarized trace for that group, which is stored in the ret.TraceSet of the new DataFrame.
      • Context cancellation (ctx.Err()) is checked periodically to allow for early termination if the operation is cancelled.
    Grouped Traces (groupedTraceSets)
            |
            v
    For each groupID, traces in groupedTraceSets:
      If len(traces) > 0:
        summarizedTrace = opMap[req.Operation].groupByOperation(traces)
        ret.TraceSet[groupID] = summarizedTrace
            |
            v
    DataFrame with GroupBy Applied (ret)
    
  4. Building ParamSet for the Result:

    • Why: The resulting DataFrame needs its own ParamSet reflecting the new structure where trace keys only contain the GroupBy parameters.
    • How: ret.BuildParamSet() is called.
  5. Applying Summary Operations (Optional):

    • Why: To further reduce the data into single summary values per group if req.Summary is specified. This is useful for generating tabular summaries rather than plots.
    • How:
      • If req.Summary is empty, the original DataFrame's Header is used for the new DataFrame, and the function returns. The result is a DataFrame of summarized traces.
      • If req.Summary is not empty:
      • It iterates through each summarized trace in ret.TraceSet.
      • For each trace, it creates a new types.Trace (called summaryValues) whose length is equal to the number of Summary operations.
      • For each Operation in req.Summary, it applies the corresponding summaryOperation function (from opMap) to the current grouped trace. The result is stored in summaryValues.
      • The original summarized trace in ret.TraceSet[groupKey] is replaced with summaryValues.
      • The Header of the ret DataFrame is rebuilt. Each column in the header now corresponds to one of the Summary operations, with offsets from 0 to len(req.Summary) - 1.
    DataFrame with GroupBy Applied (ret)
            |
            v
    If len(req.Summary) > 0:
      For each groupKey, trace in ret.TraceSet:
        summaryValues = new Trace of length len(req.Summary)
        For i, op in enumerate(req.Summary):
          summaryValues[i] = opMap[op].summaryOperation(trace)
        ret.TraceSet[groupKey] = summaryValues
      Adjust ret.Header to match Summary operations
            |
            v
    Final Pivoted DataFrame (ret)
    

Operations (Operation type and opMap):

The module defines a set of standard operations like Sum, Avg, Geo, Std, Count, Min, Max.

  • Why: To provide common aggregation methods.
  • How:
    • Each Operation is a string constant.
    • The opMap is a map where each Operation key maps to an operationFunctions struct. This struct holds two function pointers:
    • groupByOperation: Takes a types.TraceSet (a group of traces) and returns a single aggregated types.Trace. These functions are typically sourced from the go/calc module.
    • summaryOperation: Takes a single []float32 (a trace) and returns a single float32 summary value. These functions are typically sourced from go/vec32 or defined locally (like stdDev).
    • This design makes it easy to add new operations by defining the constant and adding corresponding entries to opMap with the appropriate implementation functions.

Error Handling:

  • Why: To provide clear feedback on invalid inputs or issues during processing.
  • How: The Pivot function returns an error if req.Valid() fails or if an error occurs during grouping (e.g., a GroupBy key is not found in the ParamSet of the input DataFrame). Context cancellation is also handled, allowing long-running pivot operations to be interrupted. Errors are wrapped using skerr.Wrap to provide context.

Key Components and Files

  • pivot.go: This is the main file containing all the logic for the pivot functionality.

    • Request struct: Defines the parameters for a pivot operation. Its design allows for flexible grouping and summarization.
    • Operation type and constants: Define the set of available aggregation operations.
    • opMap variable: A critical data structure mapping Operation types to their respective implementation functions for both grouping and summarizing. This is the heart of how different operations are dispatched.
    • Pivot function: The primary public function that performs the pivot operation. Its step-by-step process of grouping, applying the main operation, and then optionally applying summary operations is central to its functionality.
    • groupKeyFromTraceKey function: A helper function responsible for constructing the group identifier for each trace based on the GroupBy keys. It handles cases where a trace might not have all the required keys.
    • Valid() method on Request: Ensures that the pivot request is well-formed before processing begins.
  • pivot_test.go: Contains unit tests for the pivot module.

    • Why: To ensure the correctness and robustness of the pivot logic under various scenarios, including valid inputs, invalid inputs, edge cases (like empty groups or traces not matching any group), and context cancellation.
    • How: It uses the testify assertion library and defines test cases that cover different aspects of the Request validation, groupKeyFromTraceKey logic, and the Pivot function itself with various combinations of Operation and Summary settings. The dataframeForTesting() helper function provides a consistent dataset for testing.

This module is designed to be a general-purpose tool for transforming and understanding large datasets of traces by allowing users to aggregate data along arbitrary dimensions and apply various statistical operations.

Module: /go/progress

Module: /go/progress

The /go/progress module provides a mechanism for tracking the progress of long-running tasks on the backend and exposing this information to the UI. This is crucial for user experience in applications where operations like data queries or complex computations can take a significant amount of time. Without progress tracking, users might perceive the application as unresponsive or encounter timeouts.

Why: The Need for Asynchronous Task Monitoring

Many backend operations, such as those initiated by API endpoints like /frame/start or /dryrun/start, are asynchronous. The initial HTTP request might return quickly, but the actual work continues in the background. This module addresses the need to:

  1. Provide Feedback: Inform the user that a task is ongoing and how it's progressing.
  2. Avoid Timeouts: Prevent HTTP requests from timing out while waiting for a long task to complete. The UI can poll for updates instead of holding a connection open.
  3. Communicate Complex State: Allow tasks to report detailed, multi-stage progress information, not just a simple percentage. For example, a “dry run” might involve several distinct steps, each with its own status and relevant data.

How: Design and Implementation

The core idea is to represent the state of a long-running task as a Progress object. This object can be updated by the task as it executes. A Tracker then manages multiple Progress objects, making them accessible via HTTP polling.

Key Components:

  • progress.go: Defines the Progress interface and its concrete implementation progress.

    • Progress Interface: This is the central abstraction for a single long-running task.
    • Why an interface? It allows for potential future extensions or alternative implementations (e.g., different storage mechanisms for progress data if needed, though the current implementation is in-memory).
    • Key Methods:
      • Message(key, value string): Allows the task to report arbitrary key-value string pairs. This is flexible enough to accommodate diverse progress information (e.g., current step, commit being processed, number of items filtered). If a key already exists, its value is updated.
      • Results(interface{}): Stores intermediate or final results of the task. The interface{} type allows any JSON-serializable data to be stored. This is useful for showing partial results or accumulating data incrementally.
      • Error(string): Marks the task as failed and stores an error message.
      • Finished(): Marks the task as successfully completed.
      • FinishedWithResults(interface{}): Atomically sets the results and marks the task as finished. This is preferred over separate Results() and Finished() calls to avoid race conditions where the UI might poll between the two calls.
      • Status() Status: Returns the current status (Running, Finished, Error).
      • URL(string): Sets the URL that the client should poll for further updates. This is typically set by the Tracker.
      • JSON(w io.Writer) error: Serializes the current progress state (status, messages, results, next URL) into JSON and writes it to the provided writer.
    • progress struct (concrete implementation):
    • Uses a sync.Mutex to ensure thread-safe updates to its internal SerializedProgress state. This is critical because long-running tasks often execute in separate goroutines, and the Progress object might be accessed concurrently by the task updating its state and by the Tracker serving HTTP requests.
    • Maintains its state in a SerializedProgress struct, which is designed for easy JSON serialization.
    • State Transitions: A Progress object starts in the Running state. Once it transitions to Finished or Error, it becomes immutable. Any attempt to modify it (e.g., calling Message() or Results() again) will result in a panic. This design simplifies reasoning about the lifecycle of a task's progress.
    • SerializedProgress struct: Defines the JSON structure sent to the client. It includes the Status, an array of Message (key-value pairs), the Results (if any), and the URL for the next poll.
    • Status enum: Running, Finished, Error.
  • tracker.go: Defines the Tracker interface and its concrete implementation tracker.

    • Tracker Interface: Manages a collection of Progress objects.
    • Add(prog Progress): Registers a new Progress object with the tracker. The tracker assigns a unique ID to this progress and sets its polling URL.
    • Handler(w http.ResponseWriter, r *http.Request): An HTTP handler function that clients use to poll for progress updates. It extracts the progress ID from the request URL, retrieves the corresponding Progress object, and sends its JSON representation.
    • Start(ctx context.Context): Starts a background goroutine for periodic cleanup of completed tasks from the cache.
    • tracker struct (concrete implementation):
    • lru.Cache: Uses a Least Recently Used (LRU) cache (github.com/hashicorp/golang-lru) to store cacheEntry objects.
      • Why LRU? To prevent unbounded memory growth if many tasks are tracked. Older, completed tasks are eventually evicted.
    • basePath: A string prefix for the polling URLs (e.g., /_/status/). Each progress object gets a unique ID appended to this base path to form its polling URL.
    • cacheEntry struct: Wraps a Progress object and a Finished timestamp. The timestamp is used by the cleanup routine to determine when a completed task can be removed from the cache.
    • Cleanup Mechanism:
      • The Start method launches a goroutine that periodically calls singleStep.
      • singleStep iterates through the cache:
      • It updates the Finished timestamp in a cacheEntry when the corresponding Progress object transitions out of the Running state.
      • It removes entries from the cache if they have been in a Finished or Error state for longer than cacheDuration (currently 5 minutes). This prevents the cache from holding onto completed tasks indefinitely.
      • This ensures that resources are eventually freed up while still allowing clients a reasonable window to fetch the final results of a completed task.
    • UUIDs for IDs: Uses github.com/google/uuid to generate unique IDs for each tracked Progress. This makes the polling URLs distinct and hard to guess.

Key Workflows

  1. Starting and Tracking a Long-Running Task:

    Backend HTTP Handler (e.g., /api/start_long_task)
        |
        | 1. Create a new Progress object:
        |    prog := progress.New()
        |
        | 2. Add it to the global Tracker instance:
        |    trackerInstance.Add(prog)  // Tracker sets prog.URL() internally
        |
        | 3. Respond to the initial HTTP request with the Progress JSON.
        |    // The client now has prog.URL() to poll.
        |    prog.JSON(w)
        |
        V
    Goroutine (executing the long-running task)
        |
        | 1. Periodically update progress:
        |    prog.Message("Step", "Processing item X")
        |    prog.Message("PercentComplete", "30%")
        |    prog.Results(partialData) // Optional: intermediate results
        |
        | 2. When finished:
        |    If error:
        |        prog.Error("Something went wrong")
        |    Else:
        |        prog.FinishedWithResults(finalData)
    
  2. Client Polling for Updates:

    Client (e.g., browser UI)
        |
        | 1. Receives initial response with prog.URL (e.g., /_/status/some-uuid)
        |
        | 2. Makes a GET request to prog.URL
        V
    Backend Tracker.Handler
        |
        | 1. Extracts "some-uuid" from the request path.
        |
        | 2. Looks up the Progress object in its cache using "some-uuid".
        |    If not found --> HTTP 404 Not Found
        |
        | 3. Calls prog.JSON(w) to send the current state.
        V
    Client
        |
        | 1. Receives JSON with current status, messages, results.
        |
        | 2. If Status is "Running", schedules another poll to prog.URL.
        |
        | 3. If Status is "Finished" or "Error", displays final results/error and stops polling.
    
  3. Tracker Cache Management (Background Process):

    Tracker.Start()
        |
        V
    Goroutine (periodic execution, e.g., every minute)
        |
        | Calls tracker.singleStep()
        |   |
        |   V
        |   Iterate through cache entries:
        |     - If Progress.Status() is not "Running" AND cacheEntry.Finished is zero:
        |         Set cacheEntry.Finished = now()
        |     - If cacheEntry.Finished is not zero AND now() > cacheEntry.Finished + cacheDuration:
        |         Remove entry from cache
        |     - Update metrics (numEntriesInCache)
        |
        V
    (Loop back to periodic execution)
    

This system provides a robust and flexible way to communicate the progress of backend tasks to the user interface, improving the overall user experience for operations that might otherwise seem opaque or unresponsive. The use of JSON for data interchange makes it easy for web frontends to consume the progress information.

Module: /go/psrefresh

The psrefresh module is designed to manage and provide access to paramtools.ParamSet instances, which are collections of key-value pairs representing the parameters of traces in a performance monitoring system. The primary goal is to efficiently retrieve and cache these parameter sets, especially for frequently accessed queries, to reduce database load and improve response times.

The module addresses the need for up-to-date parameter sets by periodically fetching data from a trace store (represented by the OPSProvider interface). It combines parameter sets from recent time intervals (tiles) to provide a comprehensive view of available parameters.

A key challenge is handling potentially large and complex parameter sets. To mitigate this, the module offers a caching layer (CachedParamSetRefresher). This caching mechanism is configurable and can pre-populate caches (e.g., local in-memory or Redis) with filtered parameter sets based on predefined query levels. This pre-population significantly speeds up queries that match these common filter patterns.

Key Components and Responsibilities:

  • psrefresh.go:

    • Defines the core interfaces OPSProvider and ParamSetRefresher.
    • OPSProvider: Abstractly represents a source of ordered parameter sets (e.g., a trace data store). It provides methods to get the latest “tile” (a time-based segment of data) and the parameter set for a specific tile. This abstraction allows psrefresh to be independent of the underlying data storage implementation.
    • ParamSetRefresher: Defines the contract for components that can provide the full parameter set and parameter sets filtered by a query. It also includes a Start method to initiate the refresh process.
    • Implements defaultParamSetRefresher, which is the standard implementation of ParamSetRefresher.
    • Why: This struct is responsible for the fundamental logic of periodically fetching parameter sets from the OPSProvider. It merges parameter sets from a configurable number of recent tiles to create a comprehensive view.
    • How: It uses a background goroutine (refresh) that periodically calls oneStep. The oneStep method fetches the latest tile, then iterates backward through the configured number of previous tiles, retrieving and merging their parameter sets using paramtools.ParamSet.AddParamSet. The resulting merged set is then normalized and stored.
    • A sync.Mutex is used to protect concurrent access to the ps (paramtools.ReadOnlyParamSet) field, ensuring thread safety when GetAll is called.
    • GetParamSetForQuery delegates the actual filtering and counting of traces to a dataframe.DataFrameBuilder, demonstrating a separation of concerns.
    • UpdateQueryValueWithDefaults is a helper to automatically add default parameter selections to queries if configured, simplifying common query patterns.
  • cachedpsrefresh.go:

    • Implements CachedParamSetRefresher, which wraps a defaultParamSetRefresher and adds a caching layer.
    • Why: To improve performance for common queries by avoiding repeated database lookups or expensive filtering operations. For frequently accessed subsets of data (e.g., specific benchmarks or configurations), retrieving pre-computed parameter sets from a cache is much faster.
    • How: It takes a cache.Cache instance (which could be local, Redis, etc.) and a defaultParamSetRefresher.
    • PopulateCache: This is a crucial method that proactively fills the cache. It uses the QueryCacheConfig (part of config.QueryConfig) to determine which levels of parameter sets to cache.
      • It starts by getting the full parameter set from the underlying psRefresher.
      • It then iterates through configured “Level 1” parameter keys and their specified values. For each combination, it performs a PreflightQuery (via the dfBuilder) to get the filtered parameter set and the count of matching traces.
      • Both the filtered parameter set (as a string) and the count are stored in the cache using distinct keys.
      • If “Level 2” keys and values are configured, it recursively calls populateChildLevel to cache parameter sets for combinations of Level 1 and Level 2 parameters.
      • The cache keys are generated by paramSetKey and countKey, ensuring a consistent naming scheme.
    • GetParamSetForQuery: When a query is made, getParamSetForQueryInternal first tries to retrieve the result from the cache.
      • It determines the appropriate cache key based on the query parameters and the configured cache levels (getParamSetKey). It only attempts to serve from the cache if the query matches the configured cache levels (1 or 2 parameters, potentially adjusted for default parameters).
      • If a cache hit occurs, it reconstructs the paramtools.ParamSet from the cached string and retrieves the count.
      • If there's a cache miss or an error, it falls back to the underlying psRefresher.GetParamSetForQuery.
    • StartRefreshRoutine: This method starts a goroutine that periodically calls PopulateCache to keep the cached data fresh.

Key Workflows:

  1. Initialization and Periodic Refresh (Default Refresher):

    NewDefaultParamSetRefresher(opsProvider, ...) -> pf
    pf.Start(refreshPeriod)
      -> pf.oneStep() // Initial fetch
           -> opsProvider.GetLatestTile() -> latestTile
           -> LOOP (numParamSets times):
                -> opsProvider.GetParamSet(tile) -> individualPS
                -> mergedPS.AddParamSet(individualPS)
                -> tile = tile.Prev()
           -> mergedPS.Normalize()
           -> pf.ps = mergedPS.Freeze()
      -> GO pf.refresh()
           -> LOOP (every refreshPeriod):
                -> pf.oneStep() // Subsequent fetches
    
  2. Cache Population (Cached Refresher):

    NewCachedParamSetRefresher(defaultRefresher, cacheImpl) -> cr
    cr.StartRefreshRoutine(cacheRefreshPeriod)
      -> cr.PopulateCache() // Initial population
           -> defaultRefresher.GetAll() -> fullPS
           -> // For each configured Level 1 key/value:
                -> qValues = {level1Key: [level1Value]}
                -> defaultRefresher.UpdateQueryValueWithDefaults(qValues) // If applicable
                -> query.New(qValues) -> lv1Query
                -> defaultRefresher.dfBuilder.PreflightQuery(ctx, lv1Query, fullPS) -> count, filteredPS
                -> psCacheKey = paramSetKey(qValues, [level1Key])
                -> cr.addToCache(ctx, psCacheKey, filteredPS.ToString(), count)
                -> // If Level 2 is configured:
                     -> cr.populateChildLevel(ctx, level1Key, level1Value, filteredPS, level2Key, level2Values)
                          -> // For each configured Level 2 value:
                               -> qValues = {level1Key: [level1Value], level2Key: [level2Value]}
                               -> ... (similar PreflightQuery and addToCache)
      -> GO LOOP (every cacheRefreshPeriod):
           -> cr.PopulateCache() // Subsequent cache refreshes
    
  3. Querying with Cache: cr.GetParamSetForQuery(ctx, queryObj, queryValues) -> cr.getParamSetForQueryInternal(ctx, queryObj, queryValues) -> cr.getParamSetKey(queryValues) -> cacheKey, err -> IF cacheKey is valid AND exists: -> cache.GetValue(ctx, cacheKey) -> cachedParamSetString -> cache.GetValue(ctx, countKey(cacheKey)) -> cachedCountString -> paramtools.FromString(cachedParamSetString) -> paramSet -> strconv.ParseInt(cachedCountString) -> count -> RETURN count, paramSet, nil -> ELSE (cache miss or invalid key for caching): -> defaultRefresher.GetParamSetForQuery(ctx, queryObj, queryValues) -> count, paramSet, err -> RETURN count, paramSet, err

The use of config.QueryConfig and config.Experiments allows for instance-specific tuning of caching behavior (which keys/values to pre-populate) and handling of default parameters. The separation between defaultParamSetRefresher and CachedParamSetRefresher promotes modularity, allowing the caching layer to be optional or replaced with different caching strategies if needed.

Module: /go/redis

The redis module in Skia Perf is designed to manage interactions with Redis instances, primarily to support and optimize the query UI. It leverages Redis for caching frequently accessed data, thereby improving the responsiveness and performance of the Perf frontend.

The core idea is to periodically fetch information about available Redis instances within a Google Cloud Project and then interact with a specific, configured Redis instance to store or retrieve cached data. This cached data typically represents results of expensive computations or frequently requested data points, like recent trace data for specific queries.

Key Responsibilities and Components:

  • redis.go: This is the central file of the module.

    • RedisWrapper interface: Defines the contract for Redis-related operations. This abstraction allows for easier testing and potential future replacements of the underlying Redis client implementation. The key methods are:
    • StartRefreshRoutine: Initiates a background process (goroutine) that periodically discovers and interacts with the configured Redis instance.
    • ListRedisInstances: Retrieves a list of all Redis instances available within a specified GCP project and location.
    • RedisClient struct: This is the concrete implementation of the RedisWrapper interface.
    • It holds a gcp_redis.CloudRedisClient for interacting with the Google Cloud Redis API (e.g., listing instances).
    • It also has a reference to tracestore.TraceStore, which is likely used to fetch the data that needs to be cached in Redis.
    • The tilesToCache field suggests that the caching strategy might involve pre-calculating and storing “tiles” of data, which is a common pattern in Perf systems for displaying graphs over time.
    • NewRedisClient: The constructor for RedisClient.
    • StartRefreshRoutine: - Why: To ensure that Perf is always aware of the correct Redis instance to use and to periodically update the cache. Network configurations or instance details might change, and this routine helps adapt to such changes. - How: It takes a refreshPeriod and a config.InstanceConfig (which is actually redis_client.RedisConfig in the current implementation, indicating the target project, zone, and instance name). It then starts a goroutine that, at regular intervals defined by refreshPeriod: _ Calls ListRedisInstances to get all Redis instances in the configured project/zone. _ Iterates through the instances to find the one matching the config.Instance name. * If the target instance is found, it calls RefreshCachedQueries. [StartRefreshRoutine] | V (Goroutine - Ticks every 'refreshPeriod') | V [ListRedisInstances] -> (GCP API Call) -> [List of Redis Instances] | V (Find Target Instance by Name) | V (If Target Found) [RefreshCachedQueries]
    • ListRedisInstances:
      • Why: To discover available Redis instances within the specified GCP project and location. This is the first step before Perf can connect to and use a specific Redis instance.
      • How: It uses the gcpClient (an instance of cloud.google.com/go/redis/apiv1.CloudRedisClient) to make an API call to GCP to list instances under the given parent (e.g., “projects/my-project/locations/us-central1”). It iterates through the results and returns a slice of redispb.Instance objects.
    • RefreshCachedQueries:
      • Why: This is the heart of the caching mechanism. Its purpose is to update the data stored in the target Redis instance. The specific data to be cached would depend on the needs of the Perf query UI.
      • How:
      • It establishes a connection to the specified Redis instance (instance.Host and instance.Port) using github.com/redis/go-redis/v9.
      • It acquires a mutex (r.mutex.Lock()) to prevent concurrent modifications to the cache or shared resources, though the current implementation only has placeholder logic.
      • The current implementation contains placeholder logic:
        • It attempts to GET a key named “FullPS”.
        • It then SETs the key “FullPS” to the current time, with an expiration of 30 seconds.
      • Future Work (as hinted by TODO(wenbinzhang) and tilesToCache): This method is expected to be expanded to:
        • Identify which queries or data segments are good candidates for caching.
        • Fetch the necessary data, potentially using the traceStore.
        • Store this data in Redis, likely with appropriate keys and expiration times. The tilesToCache parameter suggests it might pre-cache a certain number of recent “tiles” of trace data.
  • mocks/RedisWrapper.go: This file contains a mock implementation of the RedisWrapper interface, generated by the mockery tool.

    • Why: To facilitate unit testing of components that depend on RedisWrapper. By using a mock, tests can simulate various Redis behaviors (e.g., successful connection, instance not found, errors) without needing an actual Redis instance or GCP connectivity.
    • How: It provides a RedisWrapper struct that embeds mock.Mock from the testify library. For each method in the RedisWrapper interface, there's a corresponding method in the mock that records calls and can be configured to return specific values or errors, allowing test authors to define expected interactions.

Design Decisions and Rationale:

  • Interface-based Design (RedisWrapper): Using an interface decouples the rest of the Perf system from the concrete Redis client implementation. This is good for:
    • Testability: As seen with the mocks package.
    • Flexibility: If Skia decides to switch to a different Redis client library or even a different caching technology in the future, the changes would be localized to the implementation of RedisWrapper without affecting its consumers.
  • Periodic Refresh Routine: Instead of connecting to Redis on-demand for every operation or assuming a static configuration, the StartRefreshRoutine provides a more robust approach.
    • It handles potential changes in the Redis instance's availability or address.
    • It centralizes the logic for keeping the cache up-to-date.
  • Separation of Concerns:
    • The module clearly separates GCP Redis instance management (listing instances via GCP API) from data interaction with a specific Redis instance (using a Redis client library like go-redis).
  • Use of Standard Libraries:
    • cloud.google.com/go/redis/apiv1 for GCP infrastructure management.
    • github.com/redis/go-redis/v9 for standard Redis data operations. This ensures reliance on well-maintained and feature-rich libraries.

Workflow: Cache Refresh Process

The primary workflow driven by this module is the periodic refresh of cached data:

System Starts
    |
    V
Initialize RedisClient (NewRedisClient)
    |
    V
Call StartRefreshRoutine
    |
    V
[Background Goroutine - Loop every 'refreshPeriod']
    |
    |--> 1. List GCP Redis Instances (ListRedisInstances)
    |       - Input: GCP project, location
    |       - Output: List of *redispb.Instance
    |
    |--> 2. Identify Target Redis Instance
    |       - Based on configuration (e.g., instance name)
    |
    |--> 3. If Target Instance Found: Refresh Cache (RefreshCachedQueries)
            |
            |--> a. Connect to Target Redis (using go-redis)
            |       - Host, Port from *redispb.Instance
            |
            |--> b. Determine data to cache (e.g., recent trace data for popular queries)
            |       - Likely involves `traceStore`
            |
            |--> c. Write data to Redis (SET commands)
            |       - Use appropriate keys and expiration times
            |
            |--> (Current placeholder: SET "FullPS" = current_time with 30s TTL)

This module provides the foundational components for integrating Redis as a caching layer in Skia Perf, aiming to improve UI performance by serving frequently requested data quickly from an in-memory store. The current implementation focuses on instance discovery and has placeholder logic for the actual caching, which is expected to be expanded based on Perf's specific caching needs.

Module: /go/regression

The /go/regression module is responsible for detecting, storing, and managing performance regressions in Skia. It analyzes performance data over time, identifies significant changes (regressions or improvements), and provides mechanisms for triaging and tracking these changes.

Core Functionality & Design:

The primary goal is to automatically flag performance changes that might indicate a problem or an unexpected improvement. This involves:

  1. Data Analysis: Analyzing time-series performance data (traces) across different commits.
  2. Clustering: Grouping similar traces together to identify patterns and changes affecting multiple tests or configurations.
  3. Step Detection: Identifying abrupt changes (steps) in performance metrics that signify a potential regression or improvement.
  4. Alerting & Notification: Informing relevant parties when a potential regression is detected.
  5. Persistence: Storing detected regressions and their triage status.

Key Components & Files:

  • detector.go: This file contains the core logic for processing regression detection requests.

    • Why: It orchestrates the process of fetching data, applying clustering algorithms, and identifying regressions. It's designed to handle potentially large datasets and long-running analyses.
    • How: ProcessRegressions is the main entry point. It takes a RegressionDetectionRequest (which specifies the alert configuration and the time domain to analyze) and a DetectorResponseProcessor callback.
    • It can expand a single alert configuration with GroupBy parameters into multiple, more specific requests using allRequestsFromBaseRequest. This allows for targeted analysis of specific trace groups.
    • It iterates over data using dfiter.DataFrameIterator, which provides dataframes for analysis.
    • For each dataframe, it filters out traces with too much missing data (tooMuchMissingData) to ensure the reliability of the detection.
    • It applies a clustering algorithm (either K-Means via clustering2.CalculateClusterSummaries or individual StepFit via StepFit) to identify clusters of traces exhibiting similar behavior. The choice of K (number of clusters for K-Means) can be automatic or user-specified.
    • It generates RegressionDetectionResponse objects containing the cluster summaries and the relevant data frame. These responses are passed to the DetectorResponseProcessor.
    • Shortcuts for identified clusters are created using shortcutFromKeys for easier referencing.
    • Workflow: RegressionDetectionRequest -> Expand (if GroupBy) -> Multiple Requests | V For each Request: DataFrameIterator -> DataFrame -> Filter Traces -> Apply Clustering (KMeans or StepFit) | | V V Shortcut Creation <- ClusterSummaries -> DetectorResponseProcessor
  • regression.go: Defines the primary data structures for representing regressions and their triage status.

    • Why: Provides a standardized way to model regressions, including details about the low (performance degradation) and high (performance improvement) steps, the associated data frame, and triage information.
    • How:
    • Regression: The central struct holding Low and High ClusterSummary objects (from clustering2), the FrameResponse (data context), and TriageStatus for both low and high. It also includes fields for the newer regression2 schema (like Id, CommitNumber, AlertId, MedianBefore, MedianAfter).
    • TriageStatus: Represents whether a regression is Untriaged, Positive (expected/acceptable), or Negative (a bug).
    • AllRegressionsForCommit: A container for all regressions found for a specific commit, keyed by the alert ID.
    • Merge: A method to combine information from two Regression objects, typically used when new data provides a more significant regression for an existing alert.
  • types.go: Defines the Store interface, which abstracts the persistence layer for regressions.

    • Why: Decouples the regression detection logic from the specific database implementation, allowing for different storage backends.
    • How: The Store interface specifies methods for:
    • Range: Retrieving regressions within a commit range.
    • SetHigh/SetLow: Storing newly detected high/low regressions.
    • TriageHigh/TriageLow: Updating the triage status of regressions.
    • Write: Bulk writing of regressions.
    • GetRegressionsBySubName, GetByIDs: Retrieving regressions based on subscription names or specific IDs (primarily for the regression2 schema).
    • GetOldestCommit, GetRegression: Utility methods for fetching specific data.
    • DeleteByCommit: Removing regressions associated with a commit.
  • stepfit.go: Implements an alternative regression detection strategy that analyzes each trace individually using step fitting.

    • Why: Useful when GroupBy is used in an alert, or when K-Means clustering is not the desired approach. It focuses on finding significant steps in individual time series.
    • How: The StepFit function iterates through each trace in the input DataFrame.
    • For each trace, it calls stepfit.GetStepFitAtMid to determine if there's a significant step (low or high) around the midpoint of the trace.
    • If an “interesting” step is found (based on stddevThreshold and interesting parameters), the trace is added to either the low or high ClusterSummary.
    • The low and high summaries collect all traces that show a downward or upward step, respectively.
    • Parametric summaries (ParamSummaries) are generated for the keys within these clusters.
  • fromsummary.go: Provides a utility function to convert a RegressionDetectionResponse into a Regression object.

    • Why: Bridges the output of the detection process with the structured Regression type used for storage and display.
    • How: RegressionFromClusterResponse takes a RegressionDetectionResponse, an alerts.Alert configuration, and a perfgit.Git instance.
    • It identifies the commit at the midpoint of the response's data frame.
    • It iterates through the ClusterSummary objects in the response.
    • If a cluster‘s step point matches the midpoint commit and meets the alert’s criteria (minimum number of traces, direction), it populates the Low or High fields of the Regression object. It prioritizes the regression with the largest absolute magnitude if multiple are found.

Submodules:

  • continuous/ (continuous.go): Manages the continuous, background detection of regressions.

    • Why: Ensures that new performance data is promptly analyzed for regressions as it arrives, without requiring manual intervention.
    • How:
    • Continuous struct: Holds dependencies like perfgit.Git, regression.Store, alerts.ConfigProvider, notify.Notifier, etc.
    • Run(): The main entry point, which starts either event-driven or polling-based regression detection.
    • Event-Driven (RunEventDrivenClustering):
      • Listens to Pub/Sub messages from FileIngestionTopicName indicating new data ingestion (ingestevents.IngestEvent).
      • For each event, it identifies matching alert configurations using getTraceIdConfigsForIngestEvent (which calls matchingConfigsFromTraceIDs).
      • matchingConfigsFromTraceIDs refines alert queries if GroupBy is present to be more specific to the incoming trace.
      • It then calls ProcessAlertConfig (or ProcessAlertConfigForTraces if StepFitGrouping is used) for each matching config and the specific traces.
    • Polling (RunContinuousClustering):
      • Periodically (defined by pollingDelay), fetches all alert configurations using buildConfigAndParamsetChannel.
      • Shuffles the configs to distribute the load if multiple detectors are running.
      • Calls ProcessAlertConfig for each configuration.
    • ProcessAlertConfig():
      • Sets the current config being processed.
      • Optionally performs a “smoketest” for GroupBy alerts to ensure the query is valid and returns data.
      • Calls regression.ProcessRegressions to perform the actual detection.
      • The clusterResponseProcessor (which is reportRegressions) is called with the detection results.
    • reportRegressions():
      • For each detected regression (RegressionDetectionResponse), it determines the commit and previous commit details.
      • It checks if the regression meets the alert criteria (direction, minimum number).
      • It calls updateStoreAndNotification to persist the regression and send notifications.
    • updateStoreAndNotification():
      • Checks if the regression already exists in the regression.Store.
      • If new, it stores the regression (using store.SetLow or store.SetHigh) and sends a notification via notifier.RegressionFound. The notification ID is stored with the regression.
      • If existing, but the direction of the regression has changed (e.g., was low, now also high), it updates the store and the notification using notifier.UpdateNotification.
      • If existing and the direction is the same, it only updates the store.
    • Key Decision: The system supports both event-driven (preferred for responsiveness) and polling-based detection (as a fallback or for periodic full checks). The choice is controlled by the EventDrivenRegressionDetection flag.
    • Workflow (Event-Driven): Pub/Sub Message (New Data) -> Decode IngestEvent -> Get Matching Alert Configs | V For each (Config, Matched Traces): ProcessAlertConfig -> regression.ProcessRegressions | V reportRegressions -> updateStoreAndNotification | | V V Store Notifier
  • migration/ (migrator.go): Handles the data migration from an older regressions table schema to the newer regressions2 schema.

    • Why: Facilitates schema evolution without data loss. The newer schema (Regression2Schema) aims to store regression data more granularly, typically one row per detected step (high or low), rather than combining high and low for the same commit/alert into a single JSON blob.
    • How:
    • RegressionMigrator: Contains instances of the legacy sqlregressionstore.SQLRegressionStore and the new sqlregression2store.SQLRegression2Store.
    • RunPeriodicMigration: Sets up a ticker to periodically run RunOneMigration.
    • RunOneMigration / migrateRegressions:
      • Fetches a batch of unmigrated regressions from the legacy store (legacyStore.GetRegressionsToMigrate).
      • For each legacy Regression object:
      • It begins a database transaction.
      • It populates fields specific to the regression2 schema (e.g., Id, PrevCommitNumber, MedianBefore, MedianAfter, IsImprovement, ClusterType) if they are not already present from the legacy data. This is crucial as the sqlregression2store.WriteRegression expects these.
      • The sqlregression2store.WriteRegression function might split a single legacy Regression object (if it has both High and Low components) into two separate entries in the Regressions2 table, one for HighClusterType and one for LowClusterType.
      • It then marks the corresponding row in the legacy Regressions table as migrated using legacyStore.MarkMigrated, storing the new regression ID.
      • Commits the transaction. If any step fails, the transaction is rolled back.
    • Key Decision: Migration is performed in batches and within transactions to ensure atomicity and prevent data duplication or loss during the migration process.
  • sqlregressionstore/: Implements the regression.Store interface using a generic SQL database. This is the older SQL storage mechanism.

    • Why: Provides a persistent storage solution for regressions identified by the detection system.
    • How:
    • SQLRegressionStore: The main struct, holding a database connection pool (pool.Pool) and prepared SQL statements. It supports different SQL dialects (e.g., CockroachDB via statements, Spanner via spannerStatements).
    • The schema (sqlregressionstore/schema/RegressionSchema.go) typically stores one row per (commit_number, alert_id) pair. The actual regression.Regression object (which might contain both high and low details, along with the frame) is serialized into a JSON string and stored in a regression TEXT column.
    • readModifyWrite: A core helper function that encapsulates the common pattern of reading a Regression from the DB, allowing a callback to modify it, and then writing it back. This is done within a transaction to prevent lost updates. If mustExist is true, it errors if the regression isn't found; otherwise, it creates a new one.
    • SetHigh/SetLow: Use readModifyWrite to update the High or Low part of the JSON-serialized Regression object. They also update the triage status to Untriaged if it was previously None.
    • TriageHigh/TriageLow: Use readModifyWrite to update the HighStatus or LowStatus within the JSON-serialized Regression.
    • GetRegressionsToMigrate: Fetches regressions that haven't been migrated to the regression2 schema.
    • MarkMigrated: Updates a row to indicate it has been migrated, storing the new regression_id from the regression2 table.
    • Limitation: Storing the entire Regression object as JSON can make querying for specific aspects of the regression (e.g., only high regressions, or regressions with a specific triage status) less efficient and more complex. This is one of the motivations for the sqlregression2store.
  • sqlregression2store/: Implements the regression.Store interface using a newer SQL schema (Regressions2).

    • Why: Addresses limitations of the older sqlregressionstore by storing regression data in a more normalized and queryable way.
    • How:
    • SQLRegression2Store: The main struct.
    • Schema (sqlregression2store/schema/Regression2Schema.go): Designed to store each regression step (high or low) as a separate row. Key columns include id (UUID, primary key), commit_number, prev_commit_number, alert_id, creation_time, median_before, median_after, is_improvement, cluster_type (e.g., “high”, “low”), cluster_summary (JSONB), frame (JSONB), triage_status, and triage_message.
    • writeSingleRegression: The core writing function. It takes a regression.Regression object and writes its relevant parts (either high or low, but not both in the same DB row) to the Regressions2 table.
    • convertRowToRegression: Converts a database row from Regressions2 back into a regression.Regression object. Depending on the cluster_type in the row, it populates either the High or Low part of the Regression object.
    • SetHigh/SetLow:
      • These methods now interact with updateBasedOnAlertAlgo.
      • updateBasedOnAlertAlgo: This function is crucial. It considers the Algo type of the alert (KMeansGrouping vs. StepFitGrouping).
      • For KMeansGrouping, it expects to potentially update an existing regression for the same (commit_number, alert_id) as new data might refine the cluster. It uses readModifyWriteCompat to achieve this.
      • For StepFitGrouping (individual trace analysis), it generally expects to create a new regression entry if one doesn‘t exist for the exact frame, avoiding updates to pre-existing ones unless it’s truly a new detection.
      • The updateFunc passed to updateBasedOnAlertAlgo populates the necessary fields in the regression.Regression object (e.g., setting r.High or r.Low, and calling populateRegression2Fields).
    • populateRegression2Fields: This helper populates the fields specific to the Regressions2 schema (like PrevCommitNumber, MedianBefore, MedianAfter, IsImprovement) from the ClusterSummary and FrameResponse within the Regression object.
    • WriteRegression (used by migrator): If a legacy Regression object has both High and Low components, this function splits it and calls writeSingleRegression twice, creating two rows in Regressions2.
    • Range: When retrieving regressions, if multiple rows from Regressions2 correspond to the same (commit_number, alert_id) (e.g., one for high, one for low), it merges them back into a single regression.Regression object for compatibility with how the rest of the system might expect the data.
    • Key Improvement: Storing regression components (high/low) as separate rows with dedicated columns for medians, triage status, etc., allows for much more efficient and direct SQL querying compared to parsing JSON in the older store.

Overall Workflow Example (Simplified):

  1. Continuous Detection (continuous.go):
    • New data arrives (e.g., via Pub/Sub).
    • Continuous identifies relevant alerts.Alert configurations.
    • ProcessAlertConfig is called.
  2. Regression Processing (detector.go):
    • ProcessRegressions fetches data, builds DataFrames.
    • Clustering (KMeans or stepfit.go) is applied.
    • RegressionDetectionResponses are generated.
  3. Reporting & Storing (continuous.go calls back into regression store):
    • reportRegressions processes these responses.
    • updateStoreAndNotification interacts with a regression.Store implementation (e.g., sqlregression2store.go):
      • Checks if the regression is new or an update.
      • Calls SetLow or SetHigh on the store.
      • The store (sqlregression2store) writes the data to the Regressions2 table, potentially creating a new row or updating an existing one based on the alert's algorithm type.
      • A notification might be sent.

The system is designed to be modular, with interfaces like regression.Store and alerts.ConfigProvider allowing for flexibility in implementation details. The migration path from sqlregressionstore to sqlregression2store highlights the evolution towards a more structured and queryable data model for regressions.

Module: /go/samplestats

The samplestats module is designed to perform statistical analysis on sets of performance data, specifically to identify significant changes between two sample sets, often referred to as “before” and “after” states. This is crucial for detecting regressions or improvements in performance metrics over time or across different code versions.

The core functionality revolves around comparing these two sets of samples for each trace (a unique combination of parameters identifying a specific test or metric). It calculates various statistical metrics for each set and then employs statistical tests to determine if the observed differences are statistically significant.

Key Design Choices and Implementation Details:

  • Statistical Significance: The module uses p-values to determine significance. A user-configurable alpha level (defaulting to 0.05) acts as the threshold. If the calculated p-value for a trace is below this alpha, the change is considered significant.
  • Choice of Statistical Tests: The module offers two common statistical tests:
    • Mann-Whitney U Test (default): This is a non-parametric test used to compare two independent samples. It‘s often preferred when the data doesn’t necessarily follow a normal distribution.
    • Two Sample Welch's t-test: This parametric test is used to compare the means of two independent samples, particularly when their variances might be unequal. The choice allows users to select the most appropriate test based on the characteristics of their data.
  • Outlier Removal: An optional Interquartile Range Rule (IQRR) can be applied to remove outliers from the sample data before calculating statistics. This helps in reducing the influence of extreme values that might skew the results. The decision to make this optional acknowledges that outlier removal isn't always desired or appropriate.
  • Delta Calculation: For changes deemed significant, the module calculates the percentage change in the mean between the “before” and “after” samples. If a change isn't significant, the delta is reported as NaN (Not a Number), clearly distinguishing it from actual zero-percentage changes.
  • Configurability: The Config struct provides a centralized way to control the analysis process. This includes setting the alpha level, choosing the statistical test, enabling outlier removal, and deciding whether to include all traces in the output or only those with significant changes. This configurability makes the module adaptable to various analysis needs.
  • Result Structure: The Result struct encapsulates the outcome of the analysis, including a list of Row structs (one per trace) and a count of skipped traces. Each Row contains the trace identifier, its parameters, the calculated metrics for both “before” and “after” samples, the percentage delta, the p-value, and any informational notes (e.g., errors during statistical test calculation). This structured output facilitates further processing or display of the results.
  • Sorting: The results can be sorted based on different criteria, with the default being by the calculated Delta. This allows users to quickly identify the most impactful changes. The Order type and functions like ByName, ByDelta, and Reverse provide a flexible sorting mechanism.

Responsibilities and Key Components:

  • analyze.go: This is the heart of the module.

    • Analyze function: This is the primary entry point. It takes the Config and two maps of samples (before and after, where keys are trace IDs and values are parser.Samples).
    • It iterates through all unique trace IDs present in either the “before” or “after” sets.
    • For each trace, it retrieves the corresponding samples, skipping the trace if data isn't present in both sets.
    • It calls calculateMetrics (from metrics.go) for both “before” and “after” samples.
    • Based on the Config.Test setting, it performs either the Mann-Whitney U test or the Two Sample Welch's t-test using functions from the github.com/aclements/go-moremath/stats library.
    • It compares the resulting p-value with the configured alpha level. If p < alpha, it calculates the percentage Delta between the means. Otherwise, Delta is NaN.
    • It constructs a Row struct with all the calculated information.
    • It optionally filters out rows where no significant change was detected if Config.All is false.
    • Finally, it sorts the resulting Rows based on Config.Order (or by Delta if no order is specified) using the Sort function from sort.go.
    • It returns a Result struct containing the list of Rows and the count of skipped traces.
    • Config struct: Defines the parameters that control the analysis, such as Alpha for p-value cutoff, Order for sorting, IQRR for outlier removal, All for including all results, and Test for selecting the statistical test.
    • Result struct: Encapsulates the output of the Analyze function, holding the Rows of analysis data and the Skipped count.
    • Row struct: Represents the analysis results for a single trace, including its name, parameters, “before” and “after” Metrics, the percentage Delta, the P value, and any Note.
  • metrics.go: This file is responsible for calculating basic statistical metrics from a given set of sample values.

    • calculateMetrics function: Takes a Config (primarily to check IQRR) and parser.Samples.
    • If Config.IQRR is true, it applies the Interquartile Range Rule to filter out outliers from samples.Values. The values within 1.5 * IQR from the first and third quartiles are retained.
    • It then calculates the Mean, StdDev (standard deviation), and Percent (coefficient of variation: StdDev / Mean * 100) of the (potentially filtered) values.
    • It returns these calculated statistics in a Metrics struct, along with the (potentially filtered) Values.
    • Metrics struct: Holds the calculated Mean, StdDev, raw Values (after potential outlier removal), and Percent (coefficient of variation).
  • sort.go: This file provides utilities for sorting the results (Row slices).

    • Order type: A function type func(rows []Row, i, j int) bool defining a less-than comparison for sorting Rows.
    • ByName function: An Order implementation that sorts rows alphabetically by Row.Name.
    • ByDelta function: An Order implementation that sorts rows by Row.Delta. It specifically places NaN delta values (insignificant changes) at the beginning.
    • Reverse function: A higher-order function that takes an Order and returns a new Order that represents the reverse of the input order.
    • Sort function: A convenience function that sorts a slice of Rows in place using sort.SliceStable and a given Order.

Illustrative Workflow (Simplified Analyze Process):

Input: before_samples, after_samples, config

For each trace_id in (before_samples keys + after_samples keys):
  If trace_id not in before_samples OR trace_id not in after_samples:
    Increment skipped_count
    Continue

  before_metrics = calculateMetrics(config, before_samples[trace_id])
  after_metrics  = calculateMetrics(config, after_samples[trace_id])

  If config.Test == UTest:
    p_value = MannWhitneyUTest(before_metrics.Values, after_metrics.Values)
  Else (config.Test == TTest):
    p_value = TwoSampleWelchTTest(before_metrics.Values, after_metrics.Values)

  alpha = config.Alpha (or defaultAlpha if config.Alpha is 0)

  If p_value < alpha:
    delta = ((after_metrics.Mean / before_metrics.Mean) - 1) * 100
  Else:
    delta = NaN
    If NOT config.All:
      Continue // Skip if not showing all results and change is not significant

  Add new Row{Name: trace_id, Delta: delta, P: p_value, ...} to results_list

Sort results_list using config.Order (or ByDelta by default)

Return Result{Rows: results_list, Skipped: skipped_count}

Module: /go/sheriffconfig

The sheriffconfig module is responsible for managing configurations for Skia Perf's anomaly detection and alerting system. These configurations, known as “Sheriff Configs,” are defined in Protocol Buffer format and are typically stored in LUCI Config. This module handles fetching these configurations, validating them, and transforming them into a format suitable for storage and use by other Perf components, specifically the alerts and subscription modules.

The core idea is to allow users to define rules for which performance metrics they care about and how anomalies in those metrics should be detected and handled. This provides a flexible and centralized way to manage alerting for a large number of performance tests.

Key Responsibilities and Components:

  • Protocol Buffer Definitions (/proto/v1):

    • This directory defines the structure of Sheriff Configurations using Protocol Buffers. This is the “source of truth” for what constitutes a valid configuration.
    • sheriff_config.proto: Defines the main messages like SheriffConfig, Subscription, AnomalyConfig, and Rules.
    • SheriffConfig: The top-level message, containing a list of Subscriptions. This represents the entire set of alerting configurations for a Perf instance.
    • Subscription: Represents a user‘s or team’s interest in a specific set of metrics. It includes details for creating bug reports (e.g., contact email, bug component, labels, priority, severity) and a list of AnomalyConfigs that define how to detect anomalies for the metrics covered by this subscription.
    • AnomalyConfig: Specifies the parameters for anomaly detection for a particular subset of metrics. This includes:
      • Rules: Define which metrics this AnomalyConfig applies to, using match and exclude patterns. These patterns are query strings (e.g., “master=ChromiumPerf&benchmark=Speedometer2”).
      • Detection parameters: step (algorithm for step detection), radius (commits to consider), threshold (sensitivity), minimum_num (number of interesting traces to trigger an alert), sparse (handling of missing data), k (for K-Means clustering), group_by (for breaking down clustering), direction (up, down, or both), action (no action, triage, or bisect), and algo (clustering algorithm like StepFit or KMeans).
    • Rules: Contains lists of match and exclude strings. Match strings define positive criteria for selecting metrics, while exclude strings define negative criteria. The combination allows for precise targeting of metrics.
    • sheriff_config.pb.go: The Go code generated from sheriff_config.proto. This provides the Go structs and methods to work with these configurations programmatically.
    • generate.go: Contains go:generate directives used to regenerate sheriff_config.pb.go whenever sheriff_config.proto changes. This ensures the Go code stays in sync with the proto definition.
  • Validation (/validate):

    • validate.go: This is crucial for ensuring the integrity and correctness of Sheriff Configurations before they are processed or stored. It performs a series of checks:
    • Pattern Validation: Ensures that match and exclude strings in Rules are well-formed query strings. It checks for valid regex if a value starts with ~. It also enforces that exclude patterns only target a single key-value pair.
    • AnomalyConfig Validation: Ensures that each AnomalyConfig has at least one match pattern.
    • Subscription Validation: Verifies that essential fields like name, contact_email, bug_component, and instance are present. It also checks that each subscription has at least one AnomalyConfig.
    • SheriffConfig Validation: Ensures there's at least one Subscription and that all subscription names within a config are unique.
    • DeserializeProto: A helper function to convert a base64 encoded string (as typically retrieved from LUCI Config) into a SheriffConfig protobuf message.
  • Service (/service):

    • service.go: This component orchestrates the process of fetching Sheriff Configurations from LUCI Config, processing them, and storing them in the database.
    • New function: Initializes the sheriffconfigService, taking dependencies like a database connection pool (sql.Pool), subscription.Store, alerts.Store, and a luciconfig.ApiClient. If no luciconfig.ApiClient is provided, it creates one.
    • ImportSheriffConfig method: This is the main entry point for importing configurations.
    • It uses the luciconfig.ApiClient to fetch configurations from a specified LUCI Config path (e.g., “skia-sheriff-configs.cfg”).
    • For each fetched configuration file content:
      • It calls processConfig.
    • It then inserts all derived subscription_pb.Subscription objects into the subscriptionStore and all alerts.SaveRequest objects into the alertStore within a single database transaction. This ensures atomicity – either all changes are saved, or none are.
    • processConfig method:
    • Deserializes the raw configuration content (string) into a pb.SheriffConfig protobuf message using prototext.Unmarshal.
    • Validates the deserialized pb.SheriffConfig using validate.ValidateConfig.
    • Iterates through each pb.Subscription in the config:
      • It filters subscriptions based on the instance field, only processing those matching the service's configured instance (e.g., “chrome-internal”). This allows multiple Perf instances to share a config file but only import relevant subscriptions.
      • It calls makeSubscriptionEntity to convert the pb.Subscription into a subscription_pb.Subscription (the format used by the subscription module).
      • Revision Check: Crucially, it checks if a subscription with the same name and revision already exists in the subscriptionStore. If it does, it means this specific version of the subscription has already been imported, so it‘s skipped. This prevents redundant database writes and processing if the LUCI config file hasn’t actually changed for that subscription.
      • If the subscription is new or has a new revision, it calls makeSaveRequests to generate alerts.SaveRequest objects for each alert defined within that subscription.
    • makeSubscriptionEntity function: Transforms a pb.Subscription (from Sheriff Config proto) into a subscription_pb.Subscription (for the subscription datastore), mapping fields and applying default priorities/severities if not specified.
    • makeSaveRequests function:
    • Iterates through each pb.AnomalyConfig within a pb.Subscription.
    • For each match rule within the pb.AnomalyConfig.Rules:
      • Calls buildQueryFromRules to construct the actual query string that will be used to select metrics for this alert.
      • Calls createAlert to create an alerts.Alert object, populating it with parameters from the pb.AnomalyConfig and the parent pb.Subscription.
      • Wraps the alerts.Alert in an alerts.SaveRequest along with the subscription name and revision.
    • createAlert function: Populates an alerts.Alert struct. This involves:
    • Mapping enum values from the Sheriff Config proto (e.g., AnomalyConfig_Step, AnomalyConfig_Direction, AnomalyConfig_Action, AnomalyConfig_Algo) to their corresponding internal types used by the alerts module (e.g., alerts.Direction, types.RegressionDetectionGrouping, types.StepDetection, types.AlertAction). This is done using maps like directionMap, clusterAlgoMap, etc.
    • Applying default values for parameters like radius, minimum_num, sparse, k, group_by if they are not explicitly set in the AnomalyConfig.
    • buildQueryFromRules function: Constructs a canonical query string from a match string and a list of exclude strings. It parses them as URL query parameters, combines them (with ! for excludes), sorts the parts alphabetically, and joins them with &. This ensures that equivalent rules always produce the same query string.
    • getPriorityFromProto and getSeverityFromProto functions: Convert the enum values for priority and severity from the proto definition to the integer values expected by the subscription module, applying defaults if the proto value is “unspecified.”
    • StartImportRoutine and ImportSheriffConfigOnce: Provide functionality to periodically fetch and import configurations, making the system self-updating when LUCI configs change.

Workflow: Importing a Sheriff Configuration

LUCI Config Change (e.g., new revision of skia-sheriff-configs.cfg)
      |
      v
Sheriffconfig Service (triggered by timer or manual call)
      |
      |--- 1. luciconfigApiClient.GetProjectConfigs("skia-sheriff-configs.cfg") --> Fetches raw config content + revision
      |
      v
For each config file content:
      |
      |--- 2. processConfig(configContent, revision)
      |      |
      |      |--- 2a. prototext.Unmarshal(configContent) --> pb.SheriffConfig
      |      |
      |      |--- 2b. validate.ValidateConfig(pb.SheriffConfig) --> Error or OK
      |      |
      |      v
      |      For each pb.Subscription in pb.SheriffConfig:
      |            |
      |            |--- 2c. If subscription.Instance != service.Instance --> Skip
      |            |
      |            |--- 2d. subscriptionStore.GetSubscription(name, revision) --> ExistingSubscription?
      |            |
      |            |--- 2e. If ExistingSubscription == nil (new or updated):
      |            |      |
      |            |      |--- makeSubscriptionEntity(pb.Subscription, revision) --> subscription_pb.Subscription
      |            |      |
      |            |      |--- makeSaveRequests(pb.Subscription, revision)
      |            |      |     |
      |            |      |     v
      |            |      |     For each pb.AnomalyConfig in pb.Subscription:
      |            |      |           |
      |            |      |           v
      |            |      |           For each matchRule in pb.AnomalyConfig.Rules:
      |            |      |                 |
      |            |      |                 |--- buildQueryFromRules(matchRule, excludeRules) --> queryString
      |            |      |                 |
      |            |      |                 |--- createAlert(queryString, pb.AnomalyConfig, pb.Subscription, revision) --> alerts.Alert
      |            |      |                 |
      |            |      |                 ---> Collect alerts.SaveRequest
      |            |      |
      |            |      ---> Collect subscription_pb.Subscription
      |
      v
Database Transaction (BEGIN)
      |
      |--- 3. subscriptionStore.InsertSubscriptions(collected_subscriptions)
      |
      |--- 4. alertStore.ReplaceAll(collected_save_requests)
      |
Database Transaction (COMMIT or ROLLBACK)

This module acts as a critical bridge, translating human-readable (and machine-parsable via proto) alerting definitions into the concrete data structures used by Perf's backend alerting and subscription systems. The validation step is key to preventing malformed configurations from breaking the alerting pipeline. The revision checking mechanism ensures efficiency by only processing changes.

Module: /go/shortcut

The shortcut module provides functionality for creating, storing, and retrieving “shortcuts”. A shortcut is essentially a named list of trace keys. These trace keys typically represent specific performance metrics or configurations. The primary purpose of shortcuts is to provide a convenient way to refer to a collection of traces with a short, memorable identifier, rather than having to repeatedly specify the full list of keys. This is particularly useful for sharing links to specific views in the Perf UI or for programmatic access to predefined sets of performance data.

The core component is the Store interface, defined in shortcut.go. This interface abstracts the underlying storage mechanism, allowing different implementations to be used (e.g., in-memory for testing, SQL database for production). The key operations defined by the Store interface are:

  • Insert: Adds a new shortcut to the store. It takes an io.Reader containing the shortcut data (typically JSON) and returns a unique ID for the shortcut.
  • InsertShortcut: Similar to Insert, but takes a Shortcut struct directly.
  • Get: Retrieves a shortcut given its ID.
  • GetAll: Returns a channel that streams all stored shortcuts. This is useful for tasks like data migration.
  • DeleteShortcut: Removes a shortcut from the store.

A Shortcut itself is a simple struct containing a slice of strings, where each string is a trace key.

The generation of shortcut IDs is handled by the IDFromKeys function. This function takes a Shortcut struct, sorts its keys alphabetically (to ensure that the order of keys doesn't affect the ID), and then computes an MD5 hash of the concatenated keys. A prefix “X” is added to this hash for historical reasons, maintaining compatibility with older systems. This deterministic ID generation ensures that the same set of keys will always produce the same shortcut ID.

Workflow for creating and retrieving a shortcut:

  1. Creation: Client Code ---(JSON data or Shortcut struct)---> Store.Insert or Store.InsertShortcut Store ---(Generates ID using IDFromKeys, marshals to JSON if needed)---> Underlying Storage (e.g., SQL DB) Underlying Storage ---> Store ---(Returns Shortcut ID)---> Client Code

  2. Retrieval: Client Code ---(Shortcut ID)---> Store.Get Store ---(Queries by ID)---> Underlying Storage (e.g., SQL DB) Underlying Storage ---(Returns stored JSON or data)---> Store Store ---(Unmarshals to Shortcut struct, sorts keys)---> Client Code (receives Shortcut struct)

The sqlshortcutstore subdirectory provides a concrete implementation of the Store interface using an SQL database (specifically designed for CockroachDB, as indicated by test setup and migration references). The sqlshortcutstore.go file contains the logic for interacting with the database, including SQL statements for inserting, retrieving, and deleting shortcuts. Shortcut data is stored as JSON strings in the database. The schema for the Shortcuts table is implicitly defined by the SQL statements and further clarified in sqlshortcutstore/schema/schema.go, which defines a ShortcutSchema struct mirroring the table structure (though this struct is primarily for documentation or ORM-like purposes and not directly used in the raw SQL interaction in sqlshortcutstore.go).

Testing is a significant aspect of this module:

  • shortcut_test.go contains unit tests for the IDFromKeys function, ensuring its correctness and deterministic behavior.
  • shortcuttest provides a suite of common tests (InsertGet, GetNonExistent, GetAll, DeleteShortcut) that can be run against any implementation of the shortcut.Store interface. This promotes consistency and ensures that different store implementations behave as expected. The InsertGet test, for example, verifies that a stored shortcut can be retrieved and that the keys are sorted upon retrieval, even if they were not sorted initially.
  • sqlshortcutstore_test.go utilizes the tests from shortcuttest to validate the SQLShortcutStore implementation against a test database.
  • mocks/Store.go provides a mock implementation of the Store interface, generated by the mockery tool. This is useful for testing components that depend on shortcut.Store without needing a real storage backend.

Module: /go/sql

The go/sql module serves as the central hub for managing the SQL database schema used by the Perf application. It defines the structure of the database tables and provides utilities for schema generation, validation, and migration. This module ensures that the application's database schema is consistent, well-defined, and can evolve smoothly over time.

Key Responsibilities and Components:

  • Schema Definition (schema.go, spanner/schema_spanner.go):

    • Why: These files contain the SQL CREATE TABLE statements that define the structure of all tables used by Perf. Having the schema defined in code (generated from Go structs) provides a single source of truth and allows for easier version control and programmatic manipulation.
    • How:
    • schema.go: Defines the schema for CockroachDB.
    • spanner/schema_spanner.go: Defines the schema for Spanner. Spanner has slightly different SQL syntax and features (e.g., TTL INTERVAL), necessitating a separate schema definition.
    • The schema is not written manually but is generated by the tosql utility (see below). This ensures that the SQL schema accurately reflects the Go struct definitions in other modules (e.g., perf/go/alerts/sqlalertstore/schema).
    • Along with the CREATE TABLE statements, these files also export slices of strings representing the column names for each table. This can be useful for constructing SQL queries programmatically.
  • Table Struct Definition (tables.go):

    • Why: This file defines a Go struct Tables which aggregates all the individual table schema structs from various Perf sub-modules (like alerts, anomalygroup, git, etc.).
    • How: The Tables struct serves as the input to the tosql schema generator. By referencing schema structs from other modules, it ensures that the generated SQL schema is consistent with how data is represented and manipulated throughout the application. The //go:generate directives at the top of this file trigger the tosql utility to regenerate the schema files when necessary.
  • Schema Generation Utility (tosql/main.go):

    • Why: Manually writing and maintaining complex SQL schemas is error-prone. This utility automates the generation of the SQL schema files (schema.go and spanner/schema_spanner.go) from the Go struct definitions.
    • How: It takes the sql.Tables struct (defined in tables.go) as input and uses the go/sql/exporter module to translate the Go struct tags and field types into corresponding SQL CREATE TABLE statements. It supports different SQL dialects (CockroachDB and Spanner) and can handle specific features like Spanner's TTL (Time To Live) for tables. The schemaTarget flag controls which database dialect is generated.
  • Expected Schema and Migration (expectedschema/):

    • Why: As the application evolves, the database schema needs to change. This submodule manages schema migrations, ensuring that the live database can be updated to new versions without downtime or data loss. It also validates that the current database schema matches an expected version.

    • How:

    • embed.go: This file uses go:embed to embed JSON representations of the current (schema.json, schema_spanner.json) and previous (schema_prev.json, schema_prev_spanner.json) expected database schemas. These JSON files are generated by the exportschema utility. Load() and LoadPrev() functions provide access to these deserialized schema descriptions.

    • migrate.go: This is the core of the schema migration logic.

      • It defines SQL statements (FromLiveToNext, FromNextToLive, and their Spanner equivalents) that describe how to upgrade the database from the “previous” schema version to the “next” (current) schema version, and how to roll back that change. Crucially, schema changes must be backward and forward compatible because during a deployment, old and new versions of the application might run concurrently.
      • ValidateAndMigrateNewSchema is the key function. It:
      • Loads the “next” (target) and “previous” expected schemas from the embedded JSON files.
      • Gets the actual schema description from the live database.
      • Compares the actual schema with the previous and next expected schemas.
        • If actual == next, no migration is needed.
        • If actual == prev and actual != next, it executes the FromLiveToNext SQL statements to upgrade the database schema.
        • If actual matches neither prev nor next, it indicates an unexpected schema state and returns an error, preventing application startup. This is a critical safety check.
      • The migration process is designed to be run by a maintenance task during deployment. Old instances (frontend, ingesters) -> Maintenance task (runs ValidateAndMigrateNewSchema) -> New instances (frontend, ingesters)
      Deployment Starts
           |
           V
      Maintenance Task Runs
           |
           +------------------------------------+
           | Calls ValidateAndMigrateNewSchema  |
           +------------------------------------+
                |
                V
      Is schema == previous_expected_schema? --Yes--> Apply `FromLiveToNext` SQL
                | No                                     |
                V                                      V
      Is schema == current_expected_schema? ---Yes---> Migration Successful / No Action
                | No
                V
      Error: Schema mismatch! Halt.
                |
                V
      New Application Instances Start (if migration was successful)
      
    • Test files (migrate_test.go, migrate_spanner_test.go): These files contain unit tests to verify the schema migration logic for both CockroachDB and Spanner. They test scenarios where no migration is needed, migration is required, and the schema is in an invalid state.

  • Schema Export Utility (exportschema/main.go):

    • Why: The expectedschema submodule needs JSON representations of the “current” and “previous” database schemas to perform validation and migration. This utility generates these JSON files.
    • How: It takes the sql.Tables struct (for CockroachDB) or spanner.Schema (for Spanner) and uses the go/sql/schema/exportschema module to serialize the schema description into a JSON format. The output of this utility is typically checked into version control as schema.json, schema_prev.json, etc., within the expectedschema directory. The typical workflow for a schema change involves:
    • Make schema changes in relevant Go structs (e.g., add a new field to alerts.AlertSchema).
    • Run go generate ./... within perf/go/sql/ to regenerate schema.go and spanner/schema_spanner.go.
    • Copy the old expectedschema/schema.json to expectedschema/schema_prev.json (and similarly for Spanner).
    • Run the exportschema binary (e.g., bazel run //perf/go/sql/exportschema -- --out perf/go/sql/expectedschema/schema.json) to generate the new expectedschema/schema.json.
    • Update the FromLiveToNext and FromNextToLive SQL statements in expectedschema/migrate.go.
    • Update test constants in sql_test.go (LiveSchema, DropTables) if necessary.
  • Testing Utilities (sqltest/sqltest.go):

    • Why: Provides standardized ways to set up temporary CockroachDB or Spanner emulator instances for testing components that interact with the database.
    • How:
    • NewCockroachDBForTests: Sets up a connection to a local CockroachDB instance (managed by cockroachdb_instance.Require), creates a new temporary database for the test, applies the current sql.Schema, and registers a cleanup function to drop the database after the test.
    • NewSpannerDBForTests: Similarly, sets up a connection to a local Spanner emulator (via PGAdapter, required by pgadapter.Require), applies the current spanner.Schema, and prepares it for tests.
    • These functions abstract away the complexities of emulator management and initial schema setup, making tests cleaner and more reliable.
  • Schema Tests (sql_test.go):

    • Why: Verifies that the schema migration scripts correctly transform a database from a “live-like” previous state to the current expected state.
    • How:
    • Defines constants like DropTables (to clean up) and LiveSchema / LiveSchemaSpanner. LiveSchema represents the schema before the latest change defined in expectedschema/migrate.go's FromLiveToNext.
    • The tests typically:
      1. Create a test database.
      2. Apply DropTables to ensure a clean slate.
      3. Apply LiveSchema to simulate the state of the database before the pending migration.
      4. Execute expectedschema.FromLiveToNext (or its Spanner equivalent).
      5. Fetch the schema description from the migrated database.
      6. Compare this migrated schema with the schema obtained by applying sql.Schema (or spanner.Schema) directly to a fresh database (which represents the target state). They should be identical.

This comprehensive approach to schema management ensures that Perf's database can be reliably deployed, maintained, and evolved. The separation of concerns (schema definition, generation, validation, migration, and testing) makes the system robust and easier to understand.

Module: /go/stepfit

The stepfit module is designed to analyze time-series data, specifically performance traces, to detect significant changes or “steps.” It employs various statistical algorithms to determine if a step up (performance improvement), a step down (performance regression), or no significant change has occurred in the data. This module is crucial for automated performance monitoring, allowing for the identification of impactful changes in system behavior.

The core idea is to fit a step function to the input trace data. A step function is a simple function that is constant except for a single jump (the “step”) at a particular point (the “turning point”). The module calculates the best fit for such a function and then evaluates the characteristics of this fit to determine the nature and significance of the step.

Key Components and Logic:

The primary entity in this module is the StepFit struct. It encapsulates the results of the step detection analysis:

  • LeastSquares: This field stores the Least Squares Error (LSE) of the fitted step function. A lower LSE generally indicates a better fit of the step function to the data. It's important to note that not all step detection algorithms calculate or use LSE; in such cases, this field is set to InvalidLeastSquaresError.
  • TurningPoint: This integer indicates the index in the input trace where the step function changes its value. It essentially marks the location of the detected step.
  • StepSize: This float represents the magnitude of the change in the step function. A negative StepSize implies a step up in the trace values (conventionally a performance regression, e.g., increased latency). Conversely, a positive StepSize indicates a step down (conventionally a performance improvement, e.g., decreased latency).
  • Regression: This value is a metric used to quantify the significance or “interestingness” of the detected step. Its calculation varies depending on the chosen stepDetection algorithm.
    • For the OriginalStep algorithm, it's calculated as StepSize / LSE (or StepSize / stddevThreshold if LSE is too small). A larger absolute value of Regression implies a more significant step.
    • For other algorithms like AbsoluteStep, PercentStep, and CohenStep, Regression is directly related to the StepSize (or a normalized version of it).
    • For MannWhitneyU, Regression represents the p-value of the test.
  • Status: This is an enumerated type (StepFitStatus) indicating the overall assessment of the step:
    • LOW: A step down was detected, often interpreted as a performance improvement.
    • HIGH: A step up was detected, often interpreted as a performance regression.
    • UNINTERESTING: No significant step was found.

The main function responsible for performing the analysis is GetStepFitAtMid. It takes the following inputs:

  • trace: A slice of float32 representing the time-series data to be analyzed.
  • stddevThreshold: A threshold for standard deviation. This is used in the OriginalStep algorithm for normalizing the trace and as a floor for standard deviation in other algorithms like CohenStep to prevent division by zero or near-zero values.
  • interesting: A threshold value used to determine if a calculated Regression value is significant enough to be classified as HIGH or LOW. The exact interpretation of this threshold depends on the stepDetection algorithm.
  • stepDetection: An enumerated type (types.StepDetection) specifying which algorithm to use for step detection.

Workflow of GetStepFitAtMid:

  1. Initialization and Preprocessing:

    • A new StepFit struct is initialized with Status set to UNINTERESTING.
    • If the trace length is less than minTraceSize (currently 3), the function returns the initialized StepFit as there isn't enough data to analyze.
    • Trace Normalization/Adjustment:
      • If stepDetection is types.OriginalStep, the input trace is duplicated and normalized (mean centered and scaled by its standard deviation, unless the standard deviation is below stddevThreshold).
      • For all other stepDetection types, if the trace has an odd length, the last element is dropped to make the trace length even. This is because these algorithms typically compare the first half of the trace with the second half.
  2. Step Detection Algorithm Execution: The function then proceeds based on the selected stepDetection algorithm. The core logic involves splitting the (potentially modified) trace roughly in half at the TurningPoint (which is len(trace) / 2) and comparing statistics of the two halves.

    - **`types.OriginalStep`:**
    
      - Calculates the mean of the first half (`y0`) and the second half
        (`y1`) of the (normalized) trace.
      - Computes the Sum of Squared Errors (SSE) for fitting `y0` to the
        first half and `y1` to the second half. The `LeastSquares` error
        (`lse`) is derived from this SSE.
      - `StepSize` is `y0 - y1`.
      - `Regression` is calculated as `StepSize / lse` (or `StepSize /
    

    stddevThresholdiflse` is too small). Note: The original implementation has a slight deviation from the standard definition of standard error in this calculation.

    - **`types.AbsoluteStep`:**
    
      - `StepSize` is `y0 - y1`.
      - `Regression` is simply the `StepSize`.
      - The step is considered interesting if the absolute value of
        `StepSize` meets the `interesting` threshold.
    
    - **`types.Const`:**
    
      - This algorithm behaves differently. It focuses on the absolute value
        of the trace point at the `TurningPoint` (`trace[i]`).
      - `StepSize` is `abs(trace[i]) - interesting`.
      - `Regression` is `-1 * abs(trace[i])`. This is done so that larger
        deviations (regressions) result in more negative `Regression`
        values, which are then flagged as `HIGH`.
    
    - **`types.PercentStep`:**
    
      - `StepSize` is `(y0 - y1) / y0`, representing the percentage change
        relative to the mean of the first half.
      - Handles potential `Inf` or `NaN` results from the division (e.g., if
        `y0` is zero).
      - `Regression` is the `StepSize`.
    
    - **`types.CohenStep`:**
    
      - Calculates Cohen's d, a measure of effect size.
      - `StepSize` is `(y0 - y1) / s_pooled`, where `s_pooled` is the pooled
        standard deviation of the two halves (or `stddevThreshold` if
        `s_pooled` is too small or NaN).
      - `Regression` is the `StepSize`.
    
    - **`types.MannWhitneyU`:**
    
      - Performs a Mann-Whitney U test (a non-parametric test) to determine
        if the two halves of the trace come from different distributions.
      - `StepSize` is `y0 - y1`.
      - `Regression` is the p-value of the test.
      - `LeastSquares` is set to the U-statistic from the test.
    
  3. Status Determination:

    • For types.MannWhitneyU:
      • If Regression (p-value) is less than or equal to the interesting threshold (e.g., 0.05), a significant difference is detected.
      • The Status (HIGH or LOW) is then determined by the sign of StepSize. If StepSize is negative (step up), Status is HIGH. Otherwise, it's LOW.
      • The Regression value is then negated if the status is HIGH to align with the convention that more negative values are “worse.”
    • For all other algorithms:
      • If Regression is greater than or equal to interesting, Status is LOW.
      • If Regression is less than or equal to -interesting, Status is HIGH.
      • Otherwise, Status remains UNINTERESTING.
  4. Return Result: The populated StepFit struct, containing LeastSquares, TurningPoint, StepSize, Regression, and Status, is returned.

Design Rationale:

  • Multiple Algorithms: The inclusion of various step detection algorithms (OriginalStep, AbsoluteStep, PercentStep, CohenStep, MannWhitneyU) provides flexibility. Different datasets and performance characteristics may be better suited to different statistical approaches. For instance, MannWhitneyU is non-parametric and makes fewer assumptions about the data distribution, which can be beneficial for noisy or non-Gaussian data. AbsoluteStep and PercentStep offer simpler, more direct ways to define a regression based on absolute or relative changes.
  • Centralized Logic: The GetStepFitAtMid function consolidates the logic for all supported algorithms, making it easier to manage and extend.
  • Clear StepFit Structure: The StepFit struct provides a well-defined way to communicate the results of the analysis, separating the raw metrics (like StepSize, LeastSquares) from the final interpretation (Status).
  • interesting Threshold: The interesting parameter allows users to customize the sensitivity of the step detection. This is crucial because what constitutes a “significant” change can vary greatly depending on the context of the performance metric being monitored.
  • stddevThreshold: This parameter helps in handling cases with very low variance, preventing numerical instability (like division by zero) and ensuring that normalization in OriginalStep behaves reasonably.
  • Focus on the Middle: The GetStepFitAtMid name implies that the step detection is focused around the middle of the trace. This is a common approach for detecting a single, prominent step. More complex scenarios with multiple steps would require different techniques.

Why specific implementation choices?

  • Normalization in OriginalStep: Normalizing the trace in the OriginalStep algorithm (as described in the linked blog post) aims to make the detection less sensitive to the absolute scale of the data and more focused on the relative change.
  • Symmetric Traces for Non-OriginalStep: For algorithms other than OriginalStep, ensuring an even trace length by potentially dropping the last point simplifies the division of the trace into two equal halves for comparison.
  • Handling of Inf and NaN in PercentStep: Explicitly checking for and handling Inf and NaN values that can arise from division by zero (when y0 is zero) makes the PercentStep calculation more robust.
  • Regression as p-value for MannWhitneyU: Using the p-value as the Regression metric for MannWhitneyU directly reflects the statistical significance of the observed difference between the two halves of the trace. The interesting threshold then acts as the significance level (alpha).
  • InvalidLeastSquaresError: This constant provides a clear way to indicate when LSE is not applicable or not calculated by a particular algorithm, avoiding confusion with a calculated LSE of 0 or a negative value.

In essence, the stepfit module provides a toolkit for identifying abrupt changes in performance data, offering different lenses (algorithms) through which to view and quantify these changes. The design prioritizes flexibility in algorithm choice and user-configurable sensitivity to cater to diverse performance analysis needs.

Module: /go/subscription

The subscription module manages alerting configurations, known as subscriptions, for anomalies detected in performance data. It provides the means to define, store, and retrieve these configurations.

The core concept is that a “subscription” dictates how the system should react when an anomaly is found. This includes details like which bug tracker component to file an issue under, what labels to apply, who to CC on the bug, and the priority/severity of the issue. This allows for automated and consistent handling of performance regressions.

Subscriptions are versioned using an infra_internal Git hash (revision). This allows for tracking changes to subscription configurations over time and ensures that the correct configuration is used based on the state of the infrastructure code.

Key Components and Files:

  • store.go: Defines the Store interface. This interface is the central abstraction for interacting with subscription data. It dictates the operations that any concrete subscription storage implementation must provide. This design choice allows for flexibility in the underlying storage mechanism (e.g., SQL database, in-memory store for testing).

    • Why an interface? Decouples the business logic from the specific storage implementation. This promotes testability (using mocks) and allows for easier migration to different database technologies in the future if needed.
    • Key methods:
    • GetSubscription: Retrieves a specific version of a subscription.
    • GetActiveSubscription: Retrieves the currently active version of a subscription by its name. This is likely the most common retrieval method for active alerting.
    • InsertSubscriptions: Allows for batch insertion of new subscriptions. This is typically done within a database transaction to ensure atomicity – either all subscriptions are inserted, or none are. This is crucial when updating configurations, as it prevents a partially updated state. The implementation in sqlsubscriptionstore deactivates all existing subscriptions before inserting the new ones as active, effectively replacing the entire active set.
    • GetAllSubscriptions: Retrieves all historical versions of all subscriptions.
    • GetAllActiveSubscriptions: Retrieves all currently active subscriptions. This is useful for systems that need to know all current alerting rules.
  • proto/v1/subscription.proto: Defines the structure of a Subscription using Protocol Buffers. This is the canonical data model for subscriptions.

    • Why Protocol Buffers? Provides a language-neutral, platform-neutral, extensible mechanism for serializing structured data. This is beneficial for potential interoperability with other services or for persisting data in a well-defined format. It also generates efficient serialization and deserialization code.
    • Key fields: name, revision, bug_labels, hotlists, bug_component, bug_priority, bug_severity, bug_cc_emails, contact_email. Each field directly maps to a configuration aspect for bug filing and contact information.
  • sqlsubscriptionstore/sqlsubscriptionstore.go: Provides a concrete implementation of the Store interface using an SQL database (specifically designed for CockroachDB, as indicated by the use of pgx).

    • Why SQL? Relational databases offer robust data integrity, transaction support (ACID properties), and powerful querying capabilities, which are well-suited for managing structured configuration data like subscriptions.
    • How it works: It defines SQL statements for each operation in the Store interface. When inserting subscriptions, it first deactivates all existing subscriptions and then inserts the new ones as active. This ensures that only the latest set of configurations is considered active.
    • The is_active boolean column in the database schema (sqlsubscriptionstore/schema/schema.go) is key to this “active version” concept.
  • sqlsubscriptionstore/schema/schema.go: Defines the SQL table schema for storing subscriptions.

    • Key design choice: The primary key is a composite of name and revision. This allows multiple versions of the same named subscription to exist, identified by their revision. The is_active field differentiates the current version from historical ones.
  • mocks/Store.go: Contains a mock implementation of the Store interface, generated by the mockery tool.

    • Why mocks? Essential for unit testing components that depend on the Store interface without requiring an actual database connection. This makes tests faster, more reliable, and isolates the unit under test.

Key Workflows:

  1. Updating Subscriptions: This typically happens when configurations in infra_internal are changed.

    External Process (e.g., config syncer)
        |
        v
    Reads new subscription definitions (likely from files)
        |
        v
    Parses definitions into []*pb.Subscription
        |
        v
    Calls store.InsertSubscriptions(ctx, newSubscriptions, tx)
        |
        |--> [SQL Transaction Start]
        |       |
        |       v
        |    sqlsubscriptionstore: Deactivate all existing subscriptions (UPDATE Subscriptions SET is_active=false WHERE is_active=true)
        |       |
        |       v
        |    sqlsubscriptionstore: Insert each new subscription with is_active=true (INSERT INTO Subscriptions ...)
        |       |
        |       v
        |--> [SQL Transaction Commit/Rollback]
    

    This ensures that the update is atomic. If any part fails, the transaction is rolled back, leaving the previous set of active subscriptions intact.

  2. Anomaly Detection Triggering Alerting: Anomaly Detector | v Identifies an anomaly and the relevant subscription name (e.g., based on metric patterns) | v Calls store.GetActiveSubscription(ctx, subscriptionName) | v sqlsubscriptionstore: Retrieves the active subscription (SELECT ... FROM Subscriptions WHERE name=$1 AND is_active=true) | v Anomaly Detector uses the pb.Subscription details (bug component, labels, etc.) to file a bug.

This module provides a robust and versioned way to manage alerting rules, ensuring that performance regressions are handled consistently and routed appropriately. The separation of interface and implementation, along with the use of Protocol Buffers, contributes to a maintainable and extensible system.

Module: /go/tracecache

TraceCache Module Documentation

The tracecache module provides a mechanism for caching trace identifiers (trace IDs) associated with specific tiles and queries. This caching layer significantly improves performance by reducing the need to repeatedly compute or fetch trace IDs, which can be a computationally expensive operation.

Core Functionality & Design Rationale:

The primary purpose of tracecache is to store and retrieve lists of trace IDs. Trace IDs are represented as paramtools.Params, which are essentially key-value pairs that uniquely identify a specific trace within the performance monitoring system.

The caching strategy is built around the concept of a “tile” and a “query.”

  • Tile: In the context of Skia Perf, a tile represents a chunk of commit history. Caching trace IDs per tile allows for efficient retrieval of relevant traces when analyzing a specific range of commits.
  • Query: A query, represented by query.Query, defines the specific parameters used to filter traces. Different queries will yield different sets of trace IDs.

By combining the tile number and a string representation of the query, a unique cache key is generated. This ensures that cached data is specific to the exact combination of commit range and filter criteria.

The module relies on an external caching implementation provided via the go/cache.Cache interface. This design choice promotes flexibility, allowing different caching backends (e.g., in-memory, Redis, Memcached) to be used without modifying the tracecache logic itself. This separation of concerns is crucial for adapting to various deployment environments and performance requirements.

Key Components:

  • traceCache.go: This is the sole file in the module and contains the implementation of the TraceCache struct and its associated methods.
    • TraceCache struct:
    • Holds an instance of cache.Cache. This is the underlying cache client used for storing and retrieving data.
    • New(cache cache.Cache) *TraceCache:
    • The constructor for TraceCache. It takes a cache.Cache instance as an argument, which will be used for all caching operations. This dependency injection allows the caller to provide any cache implementation that conforms to the cache.Cache interface.
    • CacheTraceIds(ctx context.Context, tileNumber types.TileNumber, q *query.Query, traceIds []paramtools.Params) error:
    • This method is responsible for storing a list of trace IDs into the cache.
    • It first generates a unique cacheKey using the tileNumber and the query.Query.
    • The traceIds (a slice of paramtools.Params) are then serialized into a JSON string using the toJSON helper function. This serialization is necessary because most cache backends store data as strings or byte arrays. JSON is chosen for its human-readability and widespread support.
    • Finally, it uses the cacheClient.SetValue method to store the JSON string under the generated cacheKey.
    • GetTraceIds(ctx context.Context, tileNumber types.TileNumber, q *query.Query) ([]paramtools.Params, error):
    • This method retrieves a list of trace IDs from the cache.
    • It generates the cacheKey in the same way as CacheTraceIds.
    • It then attempts to fetch the value associated with this key using cacheClient.GetValue.
    • If the value is not found in the cache (i.e., cacheJson is empty), it returns nil for both the trace IDs and the error, indicating a cache miss.
    • If a value is found, it deserializes the JSON string back into a slice of paramtools.Params using json.Unmarshal.
    • traceIdCacheKey(tileNumber types.TileNumber, q query.Query) string:
    • A private helper function that constructs the cache key. It combines the tileNumber (an integer) and a string representation of the query.Query (obtained via q.KeyValueString()) separated by an underscore. This format ensures uniqueness and provides some human-readable context within the cache keys.
    • toJSON(obj interface{}) (string, error):
    • A private generic helper function to marshal any given object into a JSON string. This is used specifically for serializing the []paramtools.Params before caching.

Workflow for Caching Trace IDs:

  1. Input: tileNumber, query.Query, []paramtools.Params (trace IDs to cache)
  2. CacheTraceIds is called.
  3. traceIdCacheKey(tileNumber, query) generates a unique key. tileNumber + "_" + query.KeyValueString() ---> cacheKey
  4. toJSON(traceIds) serializes the list of trace IDs into a JSON string. []paramtools.Params --json.Marshal--> jsonString
  5. t.cacheClient.SetValue(ctx, cacheKey, jsonString) stores the JSON string in the underlying cache.

Workflow for Retrieving Trace IDs:

  1. Input: tileNumber, query.Query
  2. GetTraceIds is called.
  3. traceIdCacheKey(tileNumber, query) generates the cache key (same logic as above). tileNumber + "_" + query.KeyValueString() ---> cacheKey
  4. t.cacheClient.GetValue(ctx, cacheKey) attempts to retrieve the value from the cache. cacheClient --GetValue(cacheKey)--> jsonString (or empty if not found)
  5. If jsonString is empty (cache miss): Return nil, nil.
  6. If jsonString is not empty (cache hit): json.Unmarshal([]byte(jsonString), &traceIds) deserializes the JSON string back into []paramtools.Params. jsonString --json.Unmarshal--> []paramtools.Params
  7. Return the deserialized []paramtools.Params and nil error.

Module: /go/tracefilter

Tracefilter Module Documentation

The tracefilter module provides a mechanism to organize and filter trace data based on their hierarchical paths. The core idea is to represent traces within a tree structure, where each node in the tree corresponds to a segment of the trace's path. This allows for efficient filtering of traces, specifically to identify “leaf” traces – those that do not have any further sub-paths.

This approach is particularly useful in scenarios where traces have a parent-child relationship implied by their path structure. For instance, in performance analysis, a trace like /root/p1/p2/p3/t1 might represent a specific test (t1) under a series of nested configurations (p1, p2, p3). If there's another trace /root/p1/p2, it could be considered a “parent” or an aggregate trace. The tracefilter helps in identifying only the most specific, or “leaf,” traces, effectively filtering out these higher-level parent traces.

Key Components and Responsibilities

The primary component is the TraceFilter struct.

TraceFilter struct:

  • Purpose: Represents a node within the trace path tree.
  • Fields:
    • traceKey: A string identifier associated with the trace path ending at this node. For the root of the tree, this is initialized to “HEAD”.
    • value: The string value of the current path segment this node represents.
    • children: A map where keys are the next path segments and values are pointers to child TraceFilter nodes. This map forms the branches of the tree.
  • Why this structure?
    • A tree is a natural way to represent hierarchical path data.
    • Using a map for children allows for efficient lookup and addition of child nodes based on the next path segment.
    • Storing the traceKey at each node allows associating an identifier with a complete path as it's being built.

NewTraceFilter() function:

  • Purpose: Acts as the constructor for the TraceFilter tree.
  • How it works: It initializes a root TraceFilter node. The traceKey is set to “HEAD” as a sentinel value for the root, and its children map is initialized as empty, ready to have paths added to it.
  • Why this design? Provides a clear and simple entry point for creating a new filter structure.

AddPath(path []string, traceKey string) method:

  • Purpose: Adds a new trace, defined by its path (a slice of strings representing path segments) and its unique traceKey, to the filter tree.

  • How it works:

    1. It traverses the tree, creating new nodes as needed for each segment in the input path.
    2. If a segment in the path already exists as a child of the current node, it moves to that existing child.
    3. If a segment does not exist, a new TraceFilter node is created for that segment, its value is set to the segment string, its traceKey is set to the input traceKey, and it's added to the children map of the current node.
    4. This process repeats recursively for the remaining segments in the path.
  • Why this design?

    • This incremental build process efficiently constructs the tree by reusing existing nodes for common path prefixes.
    • The recursive nature elegantly handles paths of arbitrary length.
    • Associating the traceKey with each newly created node ensures that even intermediate nodes (which might later become leaves if no further sub-paths are added) have an associated key.
    Example: Adding path ["root", "p1", "p2"] with key "keyA"
    
    Initial Tree:
    (HEAD)
    
    After AddPath(["root", "p1", "p2"], "keyA"):
    
    (HEAD)
      |
      +-- ("root", key="keyA")
           |
           +-- ("p1", key="keyA")
                |
                +-- ("p2", key="keyA")  <- Leaf node initially
    

    If we then add ["root", "p1", "p2", "t1"] with key "keyB":

    (HEAD)
      |
      +-- ("root", key="keyB")  // traceKey updated if path is prefix
           |
           +-- ("p1", key="keyB")
                |
                +-- ("p2", key="keyB")
                     |
                     +-- ("t1", key="keyB") <- New leaf node
    

    Note: The traceKey of an existing node is updated by AddPath if the new path being added shares that node as a prefix. This ensures that the traceKey stored at a node corresponds to the longest path ending at that node if it's also a prefix of other paths. However, the primary use of GetLeafNodeTraceKeys relies on the traceKey of nodes that become leaves.

GetLeafNodeTraceKeys() method:

  • Purpose: Retrieves the traceKeys of all traces that are considered “leaf” nodes in the tree. A leaf node is a node that has no children.

  • How it works:

    1. It performs a depth-first traversal of the tree.
    2. If the current node has no children (i.e., len(tf.children) == 0), its traceKey is considered a leaf key and is added to the result list.
    3. If the current node has children, the method recursively calls itself on each child node and aggregates the results.
  • Why this design?

    • This is the core filtering logic. By only returning keys from nodes without children, it effectively filters out traces that serve as prefixes (parents) to other, more specific traces.
    • Recursion is a natural fit for traversing tree structures.
    Workflow for GetLeafNodeTraceKeys:
    
    Start at (CurrentNode)
        |
        V
    Is CurrentNode a leaf (no children)?
        |
        +-- YES --> Add CurrentNode.traceKey to results
        |
        +-- NO  --> For each ChildNode in CurrentNode.children:
                        |
                        V
                    Recursively call GetLeafNodeTraceKeys on ChildNode
                        |
                        V
                    Append results from ChildNode to overall results
        |
        V
    Return aggregated results
    

Example Scenario and How it Works

Consider the following traces and their paths:

  1. traceA: path ["config", "test_group", "test1"], key "keyA"

  2. traceB: path ["config", "test_group"], key "keyB"

  3. traceC: path ["config", "test_group", "test2"], key "keyC"

  4. traceD: path ["config", "other_group", "test3"], key "keyD"

  5. Tree Construction (AddPath calls):

    • tf.AddPath(["config", "test_group", "test1"], "keyA")
    • tf.AddPath(["config", "test_group"], "keyB")
      • When this is added, the node for "test_group" initially created by keyA will have its traceKey updated to "keyB".
    • tf.AddPath(["config", "test_group", "test2"], "keyC")
    • tf.AddPath(["config", "other_group", "test3"], "keyD")

    The tree would look something like this (simplified, showing relevant traceKeys for leaf potential):

    (HEAD)
      |
      +-- ("config")
           |
           +-- ("test_group", traceKey likely updated by "keyB" during AddPath)
           |    |
           |    +-- ("test1", traceKey="keyA")  <-- Leaf
           |    |
           |    +-- ("test2", traceKey="keyC")  <-- Leaf
           |
           +-- ("other_group")
                |
                +-- ("test3", traceKey="keyD")  <-- Leaf
    
  6. Filtering (GetLeafNodeTraceKeys() call):

    • When GetLeafNodeTraceKeys() is called on the root:
      • It traverses to "config".
      • It traverses to "test_group". This node has children ("test1" and "test2"), so its key ("keyB") is not added.
      • It traverses to "test1". This is a leaf. "keyA" is added.
      • It traverses to "test2". This is a leaf. "keyC" is added.
      • It traverses to "other_group".
      • It traverses to "test3". This is a leaf. "keyD" is added.

    The result would be ["keyA", "keyC", "keyD"]. Notice that "keyB" is excluded because the path ["config", "test_group"] has sub-paths (.../test1 and .../test2), making it a non-leaf node in the context of trace specificity.

This module provides a clean and efficient way to identify the most granular traces in a dataset where hierarchy is defined by path structure.

Module: /go/tracesetbuilder

The tracesetbuilder module is designed to efficiently construct a types.TraceSet and its corresponding paramtools.ReadOnlyParamSet from multiple, potentially disparate, sets of trace data. This is particularly useful when dealing with performance data that might arrive in chunks (e.g., from different “Tiles” of data) and needs to be aggregated into a coherent view across a series of commits.

The core challenge this module addresses is the concurrent and distributed nature of processing trace data. If multiple traces with the same identifier (key) were processed by different workers simultaneously without coordination, it could lead to race conditions and incorrect data. Similarly, simply locking the entire TraceSet for each update would create a bottleneck.

The tracesetbuilder solves this by employing a worker pool (mergeWorkers). The key design decision here is to distribute the work based on the trace key. Each trace key is hashed (using crc32.ChecksumIEEE), and this hash determines which mergeWorker is responsible for that specific trace. This ensures that all data points for a single trace are always processed by the same worker, thereby avoiding the need for explicit locking at the individual trace level within the worker. Each mergeWorker maintains its own types.TraceSet and paramtools.ParamSet.

Key Components and Workflow:

  1. TraceSetBuilder:

    - **Responsibilities:**
      - Manages a pool of `mergeWorker` instances.
      - Provides the `Add` method to ingest new trace data.
      - Provides the `Build` method to consolidate results from all workers
        and return the final `TraceSet` and `ReadOnlyParamSet`.
      - Provides the `Close` method to shut down the worker pool.
    - **`New(size int)`:** Initializes the `TraceSetBuilder`. The `size`
      parameter is crucial as it defines the expected length of each trace in
      the final, consolidated `TraceSet`. This allows the builder to
      pre-allocate trace slices of the correct length, filling in missing data
      points as necessary. It creates `numWorkers` instances of `mergeWorker`.
    - **`Add(commitNumberToOutputIndex map[types.CommitNumber]int32, commits
    

    []provider.Commit, traces types.TraceSet)`:** This is the entry point for feeding data into the builder.

    • traces: A types.TraceSetrepresenting a chunk of data (e.g., from a single tile). -commits: A slice of provider.Commitobjects corresponding to the data points in thetraces.
    • commitNumberToOutputIndex: A map that dictates where each data point from the input traces(identified by its types.CommitNumber) should be placed in the final output trace. This mapping is essential for correctly aligning data points that might come from different sources or represent different commit ranges.
    • For each trace in the input traces:
    • It parses the trace key into paramtools.Params.
    • It creates a requeststruct containing the key, params, the trace data itself, thecommitNumberToOutputIndexmap, and thecommits slice.
    • It calculates an index based on the CRC32 hash of the trace key modulonumWorkers.
    • It sends this requestto thechchannel of the selected mergeWorker.
    • A sync.WaitGroupis incremented for each trace added, ensuring Build waits for all processing to complete.
    • Build(ctx context.Context):
      • Waits for all Addoperations to be processed by the workers (using t.wg.Wait()).
      • Iterates through all mergeWorkers.
      • Merges the traceSetandparamSetfrom eachmergeWorkerinto a single, finaltypes.TraceSetandparamtools.ParamSet.
      • Normalizes and freezes the final paramSetto create a paramtools.ReadOnlyParamSet.
      • Returns the consolidated TraceSetandReadOnlyParamSet.
    • Close(): Iterates through the mergeWorkers and closes their respective input channels (ch). This signals the worker goroutines to terminate once they have processed all pending requests.
  2. mergeWorker:

    • Responsibilities:
      • Processes request objects sent to its channel.
      • Maintains its own local types.TraceSet and paramtools.ParamSet.
      • Updates its local TraceSet with new data points, placing them correctly according to request.commitNumberToOutputIndex.
      • Adds the parameters from each processed trace to its local ParamSet.
    • newMergeWorker(wg *sync.WaitGroup, size int): Creates a mergeWorker and starts its goroutine.
      • It initializes an empty types.TraceSet and paramtools.ParamSet.
      • The goroutine continuously reads request objects from its ch channel.
      • For each request:
      • It retrieves or creates a trace in its m.traceSet for the given req.key. If creating, it uses types.NewTrace(size) to ensure the trace has the correct final length.
      • It iterates through the req.commits and uses req.commitNumberToOutputIndex to determine the correct destination index in its local trace for each data point in req.trace.
      • It updates the trace value at that destination index.
      • It adds req.params to its m.paramSet.
      • It decrements the shared sync.WaitGroup (m.wg.Done()) to signal completion of this piece of work.
    • Process(req *request): Sends a request to the worker's channel.
    • Close(): Closes the worker's input channel.
  3. request struct:

    • A simple data structure used to pass all necessary information for processing a single trace segment through the pipeline to a mergeWorker. It encapsulates the trace key, its parsed parameters, the actual trace data segment, the mapping of commit numbers to output indices, and the corresponding commit metadata.

Workflow Diagram:

                       TraceSetBuilder.New(outputTraceLength)
                                   |
                                   V
  +-----------------------------------------------------------------------+
  | TraceSetBuilder (manages WaitGroup and pool of mergeWorkers)          |
  +-----------------------------------------------------------------------+
      |                                          ^
      | Add(commitMap1, commits1, traces1)       | Build() waits for WaitGroup
      | Add(commitMap2, commits2, traces2)       |
      V                                          |
  +-----------------------------------------------------------------------+
  | For each trace in input:                                              |
  |  1. Parse key -> params                                               |
  |  2. Create 'request' struct                                           |
  |  3. Hash key -> workerIndex                                           |
  |  4. Send 'request' to mergeWorkers[workerIndex].ch                    |
  |  5. Increment WaitGroup                                               |
  +-----------------------------------------------------------------------+
      |         |         | ... (numWorkers times)
      V         V         V
  +--------+ +--------+ +--------+
  | mergeW_0 | | mergeW_1 | | mergeW_N |  (Each runs in its own goroutine)
  | .ch    | | .ch    | | .ch    |
  | .traceSet| | .traceSet| | .traceSet|
  | .paramSet| | .paramSet| | .paramSet|
  +--------+ +--------+ +--------+
      ^         ^         ^
      | Process request:  |
      |  - Get/Create local trace for req.key (length: outputTraceLength) |
      |  - For each point in req.trace:                                   |
      |    - Use req.commitNumberToOutputIndex[commitNum] to find dstIdx  |
      |    - localTrace[dstIdx] = req.trace[srcIdx]                       |
      |  - Add req.params to local paramSet                               |
      |  - Decrement WaitGroup                                            |
      |         |         |
      --------------------- (When TraceSetBuilder.Build() is called)
              |
              V
  +-----------------------------------------------------------------------+
  | TraceSetBuilder.Build():                                              |
  |  1. Wait for all 'Add' operations (WaitGroup.Wait())                  |
  |  2. Create finalTraceSet, finalParamSet                               |
  |  3. For each mergeWorker:                                             |
  |     - Merge worker.traceSet into finalTraceSet                        |
  |     - Merge worker.paramSet into finalParamSet                        |
  |  4. Normalize and Freeze finalParamSet                                |
  |  5. Return finalTraceSet, finalParamSet (ReadOnly)                    |
  +-----------------------------------------------------------------------+
      |
      V
  +-----------------------------------------------------------------------+
  | TraceSetBuilder.Close():                                              |
  |  - Close channels of all mergeWorkers (signals them to terminate)     |
  +-----------------------------------------------------------------------+

The use of numWorkers and channelBufferSize are constants that can be tuned for performance based on the expected workload and system resources. The CRC32 hash provides a reasonably good distribution of keys across workers, minimizing the chance of one worker becoming a bottleneck. The sync.WaitGroup is essential for ensuring that the Build method doesn't prematurely try to aggregate results before all input data has been processed by the workers.

The design allows for efficient, concurrent processing of large volumes of trace data by partitioning the work based on trace identity and then merging the results, making it suitable for building comprehensive views of performance metrics over time.

Module: /go/tracestore

The tracestore module defines interfaces and implementations for storing and retrieving performance trace data. It's a core component of the Perf system, enabling the analysis of performance metrics over time and across different configurations.

Design Philosophy

The primary goal of tracestore is to provide an efficient and scalable way to manage large volumes of trace data. This involves:

  • Tiled Storage: Data is organized into “tiles,” which are fixed-size blocks of commits. This approach simplifies data management and allows for efficient querying of data within specific time ranges. Each tile has its own inverted index and ParamSet, making searches within a tile fast.
  • Inverted Indexing: To quickly find traces matching specific criteria (e.g., “arch=x86” and “config=8888”), tracestore uses an inverted index. This index maps key-value pairs to the trace IDs that contain them within each tile.
  • Caching: Various caching mechanisms are employed to improve performance, including:
    • In-memory LRU caches for frequently accessed data like ParamSets and recently written Postings/ParamSet entries.
    • An optional external cache (like Memcached via go/cache/memcached) for broader caching strategies.
    • A tracecache for caching the results of QueryTracesIDOnly to speed up repeated queries.
  • Interface-Based Design: The module defines interfaces (TraceStore, MetadataStore, TraceParamStore) to allow for different backend implementations. This promotes flexibility and testability. The primary implementation provided is sqltracestore, which uses an SQL database.
  • Concurrency: Operations like writing traces and querying are designed to be concurrent, leveraging Go routines and parallel processing to handle large datasets efficiently. For instance, writing large batches of traces or postings is often chunked and processed in parallel.
  • Separation of Concerns:
    • TraceStore handles the core logic of reading and writing trace values and their associated parameters.
    • MetadataStore manages metadata associated with source files (e.g., links to dashboards or logs).
    • TraceParamStore specifically handles the mapping between trace IDs (MD5 hashes of trace names) and their full parameter sets. This separation helps in optimizing storage and retrieval for these distinct types of data.

Key Components and Responsibilities

The tracestore module is primarily defined by a set of interfaces and their SQL-based implementations.

tracestore.go

This file defines the main TraceStore interface. It outlines the contract for any system that wants to store and retrieve performance traces. Key responsibilities include:

  • Writing Traces (WriteTraces, WriteTraces2): Ingesting new performance data points. Each data point is associated with a specific commit, a set of parameters (defining the trace, e.g., config=8888,arch=x86), a value, the source file it came from, and a timestamp.
    • The WriteTraces method is designed to handle potentially large batches of data efficiently. Implementations often involve chunking data and performing parallel writes to the underlying storage.
    • WriteTraces2 is a newer variant, potentially for different storage schemas or optimizations (e.g., denormalizing common params directly into the trace values table as seen in TraceValues2Schema).
  • Reading Traces (ReadTraces, ReadTracesForCommitRange): Retrieving trace data for specific keys (trace names) within a given tile or commit range.
  • Querying Traces (QueryTraces, QueryTracesIDOnly):
    • QueryTraces allows searching for traces based on a query.Query object (which specifies parameter key-value pairs). It returns the actual trace values and associated commit information.
    • QueryTracesIDOnly is an optimization that returns only the paramtools.Params (effectively the identifying parameters) of traces matching a query. This is useful when only the list of matching traces is needed, not their values.
  • Tile Management (GetLatestTile, TileNumber, TileSize, CommitNumberOfTileStart): Provides methods for interacting with the tiled storage system.
  • ParamSet Management (GetParamSet): Retrieving the paramtools.ReadOnlyParamSet for a specific tile. A ParamSet represents all unique key-value pairs present in the traces within that tile, which is crucial for UI elements like query builders.
  • Source Information (GetSource, GetLastNSources, GetTraceIDsBySource): Retrieving information about the origin of trace data, such as the ingested file name.

metadatastore.go

This file defines the MetadataStore interface. Its responsibility is to manage metadata associated with source files.

  • InsertMetadata: Stores links or other metadata for a given source file name.
  • GetMetadata: Retrieves the stored metadata for a source file. This can be used, for example, to link from a data point back to the original log file or a specific dashboard view related to the data ingestion.

traceparamstore.go

This file defines the TraceParamStore interface. This store is dedicated to managing the relationship between a trace's unique identifier (typically an MD5 hash of its full parameter string) and the actual paramtools.Params object.

  • WriteTraceParams: Stores the mapping from trace IDs to their parameter sets. This is done to avoid repeatedly parsing or storing the full parameter string for every data point of a trace.
  • ReadParams: Retrieves the paramtools.Params for a given set of trace IDs.

Submodule: sqltracestore

This submodule provides the SQL-based implementation of the TraceStore, MetadataStore, and TraceParamStore interfaces.

  • sqltracestore.go: Implements the TraceStore interface.

    • Schema: It relies on a specific SQL schema (defined conceptually in the package documentation and concretely in sqltracestore/schema/schema.go) involving tables like TraceValues (for actual metric values), Postings (the inverted index), ParamSets (per-tile parameter information), and SourceFiles.
    • Writing Data: When WriteTraces is called, it performs several actions:
    • Updates the SourceFiles table with the new source filename if it's not already present.
    • Updates the ParamSets table for the current tile with any new key-value pairs from the incoming traces. This uses a cache to avoid redundant writes.
    • For each incoming trace: _ Calculates its MD5 hash (trace ID). _ Inserts the value into the TraceValues table (or TraceValues2 for WriteTraces2). _ If the trace ID and its key-value pairs are not already in the Postings table for the current tile (checked via cache), it inserts them. _ Stores the mapping of the trace ID to its paramtools.Params in the TraceParams table via the TraceParamStore. All these writes are typically batched and parallelized for efficiency.
    • Querying Data (QueryTracesIDOnly):
    • Retrieves the ParamSet for the target tile.
    • Generates a query plan based on the input query.Query and the tile's ParamSet.
    • Optimization (restrictByCounting): It attempts to optimize the query by first running COUNT(*) queries for each part of the query plan. The part of the plan that matches the fewest traces (below a threshold) is then used to fetch its corresponding trace IDs. These IDs are then used to construct a restrictClause (e.g., AND trace_id IN (...)) that is appended to the queries for the other parts of the plan. This significantly speeds up queries where one filter is much more selective than others.
    • For each part of the query plan (each key and its OR'd values), it executes an SQL query against the Postings table (using the restrictClause if applicable) to get a stream of matching traceIDForSQL.
    • The streams of traceIDForSQL from each part of the plan are then intersected (using newIntersect) to find the trace IDs that satisfy all AND conditions of the query.
    • These resulting trace IDs are then passed to the TraceParamStore to fetch their full paramtools.Params.
    • Reading Data (QueryTraces, ReadTraces): Once the trace IDs (and thus their full names) are known (either from QueryTracesIDOnly or directly provided), it queries the TraceValues table to fetch the actual floating-point values for those traces within the specified commit range or tile. It also fetches commit information from the Commits table.
    • Follower Reads: Supports enableFollowerReads configuration, which adds AS OF SYSTEM TIME '-5s' to certain read queries, allowing them to potentially hit read replicas and reduce load on the primary, at the cost of slightly stale data.
    • Dialect Specificity: It has distinct SQL templates and statement strings for CockroachDB (default) and Spanner (spanner.go) to account for syntax differences or performance characteristics (e.g., UPSERT vs. ON CONFLICT).
  • sqlmetadatastore.go: Implements the MetadataStore interface. It uses an Metadata SQL table that links a source_file_id (from SourceFiles) to a JSONB column storing the metadata map.

  • sqltraceparamstore.go: Implements the TraceParamStore interface. It uses a TraceParams SQL table that stores trace_id (bytes) and their corresponding params (JSONB). Writes are chunked and can be parallelized.

  • intersect.go: Provides helper functions (newIntersect, newIntersect2) to compute the intersection of multiple sorted channels of traceIDForSQL. This is crucial for implementing the AND logic in QueryTracesIDOnly. It builds a binary tree of newIntersect2 operations for efficiency, avoiding slower reflection-based approaches.

  • schema/schema.go: Defines Go structs that mirror the SQL table schemas. This is used for documentation and potentially could be used with ORM-like tools if needed, though the current implementation uses direct SQL templating.

    • TraceValuesSchema: Stores individual data points (value, commit, source file) keyed by trace ID.
    • TraceValues2Schema: An alternative/extended schema for trace values, potentially denormalizing common parameters like benchmark, bot, test, etc., for direct querying.
    • SourceFilesSchema: Maps source file names to integer IDs.
    • ParamSetsSchema: Stores the unique key-value pairs present in each tile.
    • PostingsSchema: The inverted index, mapping (tile, key-value) to trace IDs.
    • MetadataSchema: Stores JSON metadata for source files.
    • TraceParamsSchema: Maps trace IDs (MD5 hashes) to their full paramtools.Params (stored as JSON).
  • spanner.go: Contains SQL templates and specific configurations (like parallel pool sizes for writes) tailored for Google Cloud Spanner.

Submodule: mocks

  • TraceStore.go: Provides a mock implementation of the TraceStore interface, generated by the mockery tool. This is essential for unit testing components that depend on TraceStore without needing a full database setup.

Key Workflows

Writing Traces

Caller (e.g., ingester) -> TraceStore.WriteTraces(ctx, commitNumber, params[], values[], paramset, sourceFile, timestamp)
  |
  `-> SQLTraceStore.WriteTraces
      |
      | 1. Tile Calculation: tileNumber = TileNumber(commitNumber)
      |
      | 2. Source File ID:
      |    `-> updateSourceFile(ctx, sourceFile) -> sourceFileID
      |        (Queries SourceFiles table, inserts if not exists)
      |
      | 3. ParamSet Update (for the tile):
      |    For each key, value in paramset:
      |      If not in cache(tileNumber, key, value):
      |        Add to batch for ParamSets table insertion
      |    Execute batch insert into ParamSets, update cache
      |
      | 4. For each trace (params[i], values[i]):
      |    | a. Trace ID Calculation: traceID_md5_hex = md5(query.MakeKey(params[i]))
      |    |
      |    | b. Store Trace Params:
      |    |    `-> TraceParamStore.WriteTraceParams(ctx, {traceID_md5_hex: params[i]})
      |    |        (Inserts into TraceParams table if not exists)
      |    |
      |    | c. Add to TraceValues Batch: (traceID_md5_hex, commitNumber, values[i], sourceFileID)
      |    |
      |    | d. Postings Update (for the tile):
      |    |    If not in cache(tileNumber, traceID_md5_hex): // Marks this whole trace as processed for postings
      |    |      For each key, value in params[i]:
      |    |        Add to batch for Postings table: (tileNumber, "key=value", traceID_md5_hex)
      |
      | 5. Execute batch insert into TraceValues (or TraceValues2)
      |
      | 6. Execute batch insert into Postings, update postings cache

Querying for Trace IDs (QueryTracesIDOnly)

Caller -> TraceStore.QueryTracesIDOnly(ctx, tileNumber, query)
  |
  `-> SQLTraceStore.QueryTracesIDOnly
      |
      | 1. Get ParamSet for tile:
      |    `-> GetParamSet(ctx, tileNumber) -> tileParamSet
      |        (Checks OPS cache, falls back to querying ParamSets table)
      |
      | 2. Generate Query Plan: plan = query.QueryPlan(tileParamSet)
      |    (If plan is empty or invalid for tile, return empty channel)
      |
      | 3. Optimization (restrictByCounting):
      |    | For each part of 'plan' (key, or_values[]):
      |    |   `-> DB: COUNT(*) FROM Postings WHERE tile_number=... AND key_value IN (...) LIMIT threshold
      |    | Find the plan part (minKey, minValues) with the smallest count (if count < threshold).
      |    | If any count is 0, plan is skippable.
      |    | If minKey found:
      |    |   `-> DB: SELECT trace_id FROM Postings WHERE tile_number=... AND key_value IN (minValues)
      |    |   `-> restrictClause = "AND trace_id IN (result_ids...)"
      |
      | 4. Execute Query for each plan part (concurrently):
      |    For each key, values[] in 'plan' (excluding minKey if restrictClause is used):
      |      `-> DB: SELECT trace_id FROM Postings
      |               WHERE tile_number=tileNumber AND key_value IN ("key=value1", "key=value2"...)
      |               [restrictClause]
      |               ORDER BY trace_id
      |         -> channel_for_key_N (stream of traceIDForSQL)
      |
      | 5. Intersect Results:
      |    `-> newIntersect(ctx, [channel_for_key_1, channel_for_key_2,...]) -> finalTraceIDsChannel (stream of unique traceIDForSQL)
      |
      | 6. Fetch Full Params (concurrently, in chunks):
      |    For each batch of unique traceIDForSQL from finalTraceIDsChannel:
      |      `-> TraceParamStore.ReadParams(ctx, batch_of_ids) -> map[traceID]Params
      |      For each Params in map:
      |        Send Params to output channel
      |
      `-> Returns output channel of paramtools.Params

This structured approach, combining interfaces with a robust SQL implementation, allows tracestore to serve as a reliable and performant foundation for Perf's data storage needs.

Module: /go/tracing

Tracing Module Documentation

High-Level Overview

The /go/tracing module is responsible for initializing and configuring tracing capabilities within the Perf application. It leverages the OpenCensus library to provide distributed tracing, allowing developers to understand the flow of requests across different services and components. This is crucial for debugging performance issues, identifying bottlenecks, and gaining insights into the application's behavior in a distributed environment.

Design Decisions and Implementation Choices

The core design principle behind this module is to centralize tracing initialization. This ensures consistency in how tracing is set up across different parts of the application.

  • Conditional Initialization: The Init function provides different initialization paths based on whether the application is running in a local development environment or a deployed environment.

    • Local Environment: In a local setup, loggingtracer.Initialize() is called. This likely configures a simpler, console-based tracer. The rationale is that in local development, detailed, distributed tracing might be overkill, and logging traces to the console is often sufficient for debugging.
    • Deployed Environment: For deployed instances, the tracing.Initialize function from the shared go.skia.org/infra/go/tracing library is used. This enables more sophisticated tracing, likely integrating with a backend tracing system like Jaeger or Stackdriver Trace.
  • Configuration-Driven Sampling: The cfg.TraceSampleProportion (of type config.InstanceConfig) determines the sampling rate for traces. This allows administrators to control the volume of trace data generated, balancing the need for detailed information with the cost and overhead of storing and processing traces. A value of 0.0 would likely disable tracing, while 1.0 would trace every request.

  • Automatic Project ID Detection: The autoDetectProjectID constant being an empty string suggests that the underlying tracing.Initialize function is capable of automatically determining the Google Cloud Project ID when running in a GCP environment. This simplifies configuration as the project ID doesn't need to be explicitly passed.

  • Metadata Enrichment: The map[string]interface{} passed to tracing.Initialize includes:

    • podName: This value is retrieved from the MY_POD_NAME environment variable. This is a common practice in Kubernetes environments to identify the specific pod generating the trace, which is invaluable for pinpointing issues.
    • instance: This is derived from cfg.InstanceName. This helps differentiate traces originating from different Perf instances (e.g., “perf-prod”, “perf-staging”).

Responsibilities and Key Components/Files

  • tracing.go: This is the sole file in this module and contains the Init function.

    • Init(local bool, cfg *config.InstanceConfig) error function:
    • Responsibility: To initialize the tracing system for the application. It acts as the single entry point for tracing setup.
    • How it works: 1. It takes a local boolean flag and an InstanceConfig pointer as input. 2. If local is true, it calls loggingtracer.Initialize(). This indicates a preference for a simpler, possibly console-based, tracing mechanism for local development. local=true ----> loggingtracer.Initialize() 3. If local is false, it proceeds to initialize tracing for a deployed environment. - It retrieves the TraceSampleProportion from the cfg. - It retrieves the InstanceName from cfg to be used as an attribute. - It calls tracing.Initialize from the shared go.skia.org/infra/go/tracing library. - It passes the sampling proportion, autoDetectProjectID (an empty string, relying on automatic detection), and a map of attributes (podName from the environment and instance from the config). local=false | V Read cfg.TraceSampleProportion Read cfg.InstanceName Read os.Getenv("MY_POD_NAME") | V tracing.Initialize(sample_proportion, "", {podName, instance})
    • Why this approach:
      • Centralizes tracing setup, making it easier to manage and modify.
      • Provides a clear distinction between local and deployed tracing configurations, catering to different needs.
      • Leverages shared tracing libraries (go.skia.org/infra/go/tracing) for common functionality, promoting code reuse.
  • Dependencies:

    • //go/tracing (likely go.skia.org/infra/go/tracing): This is the core shared tracing library providing the Initialize function for robust, distributed tracing. It handles the actual setup of exporters (e.g., to Stackdriver, Jaeger) and samplers.
    • //go/tracing/loggingtracer: This dependency provides a simpler tracer implementation, probably for logging traces to standard output, suitable for local development environments where a full-fledged tracing backend might not be available or necessary.
    • //perf/go/config: This module provides the InstanceConfig struct, which contains application-specific configuration, including the TraceSampleProportion and InstanceName used by the tracing initialization. This decouples tracing configuration from the tracing logic itself.

Key Workflows/Processes

Tracing Initialization Workflow:

Application Startup
       |
       V
Call perf/go/tracing.Init(isLocal, instanceConfig)
       |
       +---- isLocal is true? ----> Call loggingtracer.Initialize() --> Tracing active (console/simple)
       |                                                                    |
       |                                                                    V
       |                                                               Application proceeds
       |
       +---- isLocal is false? ---> Read TraceSampleProportion from instanceConfig
                                    Read InstanceName from instanceConfig
                                    Read MY_POD_NAME environment variable
                                           |
                                           V
                                    Call shared go.skia.org/infra/go/tracing.Initialize(...)
                                    with sampling rate and attributes (podName, instance)
                                           |
                                           V
                                    Tracing active (distributed, e.g., Stackdriver)
                                           |
                                           V
                                    Application proceeds

This workflow illustrates how the Init function adapts the tracing setup based on the execution context (local vs. deployed) and external configuration. The goal is to provide appropriate tracing capabilities with minimal boilerplate in the rest of the application.

Module: /go/trybot

The /go/trybot module is responsible for managing performance data generated by trybots. Trybots are automated systems that run tests on code changes (patches or changelists) before they are merged into the main codebase. This module handles the ingestion, storage, and retrieval of these trybot results, allowing developers and performance engineers to analyze the performance impact of proposed code changes.

The core idea is to provide a way to compare the performance characteristics of a pending change against the baseline performance of the current codebase. This helps in identifying potential performance regressions or improvements early in the development cycle.

Key Components and Responsibilities

/go/trybot/trybot.go

This file defines the central data structure TryFile.

  • TryFile: This struct represents a single file containing trybot results.
    • CL: The identifier of the changelist (e.g., a Gerrit change ID). This is crucial for associating results with a specific code change.
    • PatchNumber: The specific patchset within the changelist. Code review systems often allow multiple iterations (patchsets) for a single changelist.
    • Filename: The name of the file where the trybot results are stored, often including a scheme like gs:// indicating its location (e.g., in Google Cloud Storage).
    • Timestamp: When the result file was created. This is important for tracking and ordering results.

/go/trybot/ingester

This submodule is responsible for taking raw result files and transforming them into the TryFile format that the rest of the system understands.

  • /go/trybot/ingester/ingester.go: Defines the Ingester interface.

    • Ingester interface: Specifies a contract for components that can process incoming files (represented by file.File) and produce a stream of trybot.TryFile objects. The Start method initiates this processing, typically in a background goroutine. This design allows for different sources or formats of trybot results to be plugged into the system.
  • /go/trybot/ingester/gerrit/gerrit.go: Provides a concrete implementation of the Ingester interface, specifically for handling trybot results originating from Gerrit code reviews.

    • Gerrit struct: Implements ingester.Ingester. It uses a parser.Parser (from /perf/go/ingest/parser) to understand the content of the result files.
    • New function: Constructor for the Gerrit ingester.
    • Start method:
    • It receives a channel of file.File objects.
    • For each file, it attempts to parse it using parser.ParseTryBot. This method extracts the changelist ID (issue) and patchset number.
    • If parsing is successful, it converts the patchset string to an integer.
    • A trybot.TryFile is created with the extracted CL, patch number, filename, and creation timestamp.
    • This TryFile is then sent to an output channel.
    • It includes metrics (parseCounter, parseFailCounter) to track the success and failure rates of parsing.
    • The use of channels for input (files) and output (ret) facilitates asynchronous processing, meaning the ingester can process files as they become available without blocking other operations.

/go/trybot/store

This submodule is responsible for persisting and retrieving TryFile information and the associated performance measurements.

  • /go/trybot/store/store.go: Defines the TryBotStore interface.
    • TryBotStore interface: This interface outlines the contract for storing and retrieving trybot data. This abstraction allows different database backends (e.g., CockroachDB, in-memory stores for testing) to be used.
    • Write(ctx context.Context, tryFile trybot.TryFile) error: Persists a TryFile and its associated data.
    • List(ctx context.Context, since time.Time) ([]ListResult, error): Retrieves a list of unique changelist/patchset combinations that have been processed since a given time. ListResult contains the CL (as a string) and Patch number.
    • Get(ctx context.Context, cl types.CL, patch int) ([]GetResult, error): Fetches all performance results for a specific changelist and patch number. GetResult contains the TraceName (a unique identifier for a specific metric and parameter combination) and its measured Value.
  • /go/trybot/store/mocks/TryBotStore.go: Provides a mock implementation of TryBotStore, generated by the mockery tool. This is essential for unit testing components that depend on TryBotStore without needing a real database.

/go/trybot/results

This submodule focuses on loading and preparing trybot results for analysis and presentation, often by comparing them to baseline data.

  • /go/trybot/results/results.go: Defines the structures for requesting and representing analyzed trybot results.

    • Kind type (TryBot, Commit): Distinguishes whether the analysis request is for trybot data (pre-submit) or for data from an already landed commit (post-submit). This allows the system to handle both scenarios.
    • TryBotRequest struct: Represents a request from a client (e.g., a UI) to get analyzed performance data. It includes the Kind, CL and PatchNumber (for TryBot kind), CommitNumber and Query (for Commit kind). The Query is used to filter the traces to be analyzed when looking at landed commits.
    • TryBotResult struct: Contains the analysis results for a single trace.
    • Params: The key-value parameters that uniquely identify the trace.
    • Median, Lower, Upper, StdDevRatio: Statistical measures derived from the trace data. StdDevRatio is a key metric indicating how much a new value deviates from the historical distribution, helping to flag regressions or improvements.
    • Values: A slice of recent historical values for the trace, with the last value being either the trybot result or the value at the specified commit.
    • TryBotResponse struct: The overall response to a TryBotRequest.
    • Header: Column headers for the data, typically representing commit information.
    • Results: A slice of TryBotResult for each analyzed trace.
    • ParamSet: A collection of all unique parameter key-value pairs present in the results, useful for filtering in a UI.
    • Loader interface: Defines a contract for components that can take a TryBotRequest and produce a TryBotResponse. This involves fetching relevant data, performing statistical analysis, and formatting it.
  • /go/trybot/results/dfloader/dfloader.go: Implements the results.Loader interface using a dataframe.DataFrameBuilder. DataFrames are a common way to represent tabular data for analysis.

    • Loader struct: Holds references to a dataframe.DataFrameBuilder (for constructing DataFrames from trace data), a store.TryBotStore (for fetching trybot-specific measurements), and perfgit.Git (for resolving commit information).
    • TraceHistorySize constant: Defines how many historical data points to load for each trace for comparison.
    • New function: Constructor for the Loader.
    • Load method: This is the core logic for generating the TryBotResponse.
    • Workflow:
      1. Determine Timestamp: If the request is for a Commit, it fetches the commit details (including its timestamp) using perfgit.Git. Otherwise, it uses the current time.
      2. Parse Query: If the request kind is Commit, the provided Query string is parsed. An empty query for a Commit request is an error.
      3. Fetch Baseline Data (DataFrame):
        • If Kind is Commit: It uses dfb.NewNFromQuery to load a DataFrame containing the last TraceHistorySize+1 data points for traces matching the query, up to the commit's timestamp. The “+1” is to hold the value at the commit itself or to be a placeholder.
        • If Kind is TryBot: a. It first calls store.Get to retrieve the specific trybot measurements for the given CL and PatchNumber. b. It then extracts the trace names from these trybot results. c. It calls dfb.NewNFromKeys to load a DataFrame with TraceHistorySize+1 historical data points for these specific trace names. d. Crucially, it then replaces the last value in each trace within the DataFrame with the corresponding value obtained from the store.Get call. This effectively injects the trybot's measurement into the historical context for comparison. e. If a trybot result exists for a trace that has no historical data in the DataFrame, that trace is removed from the analysis, and rebuildParamSet is flagged.
      4. Prepare Response Header: The DataFrame‘s header (commit information) is used for the response. If it’s a TryBot request, the last header entry (representing the trybot data point) has its Offset set to types.BadCommitNumber to indicate it's not a landed commit.
      5. Calculate Statistics: For each trace in the DataFrame:
        • The trace name (key) is parsed into paramtools.Params.
        • vec32.StdDevRatio is called with the trace values (which now includes the trybot value at the end if applicable). This function calculates the median, lower/upper bounds, and the standard deviation ratio.
        • A results.TryBotResult is created.
        • If StdDevRatio calculation fails (e.g., insufficient data), the trace is skipped, and rebuildParamSet is flagged.
      6. Sort Results: The TryBotResult slice is sorted by StdDevRatio in descending order. This prioritizes potential regressions (high positive ratio) and significant improvements (high negative ratio).
      7. Normalize ParamSet: If rebuildParamSet is true (due to missing traces or parsing errors), the ParamSet for the response is regenerated from the final set of TryBotResults.
      8. The results.TryBotResponse is assembled and returned.
    • This process allows a direct comparison of a tryjob's performance numbers against the recent history of the same metrics on the main branch.

/go/trybot/samplesloader

This submodule deals with loading raw sample data from trybot result files. Sometimes, instead of just a single aggregated value, trybots might output multiple raw measurements (samples) for a metric.

  • /go/trybot/samplesloader/samplesloader.go: Defines the SamplesLoader interface.

    • SamplesLoader interface: Specifies a method Load(ctx context.Context, filename string) (parser.SamplesSet, error) that takes a filename (URL to the result file) and returns a parser.SamplesSet. A SamplesSet is a map where keys are trace identifiers and values are parser.Samples (which include parameters and a slice of raw float64 sample values).
  • /go/trybot/samplesloader/gcssamplesloader/gcssamplesloader.go: Implements SamplesLoader for files stored in Google Cloud Storage (GCS).

    • loader struct: Holds a gcs.GCSClient for interacting with GCS and a parser.Parser.
    • New function: Constructor for the GCS samples loader.
    • Load method:
    • Parses the input filename (which is a GCS URL like gs://bucket/path/file.json) to extract the bucket and path.
    • Uses the storageClient to read the content of the file from GCS.
    • Parses the file content using format.ParseLegacyFormat (assuming a specific JSON structure for these sample files).
    • Converts the parsed data into a parser.SamplesSet using parser.GetSamplesFromLegacyFormat.
    • This component is essential when detailed analysis of raw samples is needed, rather than just aggregated metrics.

Overall Workflow (Ingestion and Analysis)

A simplified workflow could look like this:

  1. File Arrival: A new trybot result file appears (e.g., uploaded to GCS).

    New File (e.g., in GCS)
    
  2. Ingestion: An ingester.Ingester (like ingester.gerrit.Gerrit) detects and processes this file.

    File --> [Gerrit Ingester] --parses--> trybot.TryFile{CL, PatchNum, Filename, Timestamp}
    
  3. Storage: The TryFile metadata and potentially the parsed values are written to the store.TryBotStore.

    trybot.TryFile --> [TryBotStore.Write] --> Database
    

    (The actual performance values might be stored alongside the TryFile metadata or linked via the Filename if they are in a separate detailed file).

  4. Analysis Request: A user or an automated system requests analysis for a particular CL/Patch via a UI or API, sending a results.TryBotRequest.

    UI/API --sends--> results.TryBotRequest{Kind=TryBot, CL="123", PatchNumber=1}
    
  5. Data Loading and Comparison: The results.dfloader.Loader handles this request. results.TryBotRequest | v [dfloader.Loader.Load] | +--(A)--> [TryBotStore.Get(CL, PatchNum)] --> Trybot specific values (Value_T) for traces T1, T2... | +--(B)--> [DataFrameBuilder.NewNFromKeys(traceNames=[T1,T2...])] --> Historical data for T1, T2... | (e.g., [V1_hist1, V1_hist2, ..., V1_histN, _placeholder_]) | +--(C)--> Combine: Replace _placeholder_ with Value_T | (e.g., for T1: [V1_hist1, V1_hist2, ..., V1_histN, V1_T]) | +--(D)--> Calculate StdDevRatio, Median, etc. for each trace | +--(E)--> Sort results | v results.TryBotResponse (sent back to UI/API)

This module is crucial for proactive performance monitoring, enabling teams to catch performance regressions before they land in the main codebase, by systematically ingesting, storing, and analyzing the performance data generated during the pre-submit testing phase. The use of interfaces for storage (TryBotStore), ingestion (Ingester), and results loading (results.Loader) makes the system flexible and extensible.

Module: /go/ts

The go/ts module serves as a utility to generate TypeScript definition files from Go structs. This is crucial for maintaining type safety and consistency between the Go backend and the TypeScript frontend, particularly when dealing with JSON data structures that are exchanged between them. The core problem this module solves is bridging the gap between Go‘s static typing and TypeScript’s type system for data interchange, ensuring that changes in Go struct definitions are automatically reflected in the frontend's TypeScript types.

The primary component is the main.go file. Its responsibility is to:

  1. Parse command-line arguments: It accepts an output path (-o) where the generated TypeScript file will be written.
  2. Instantiate a go2ts.Generator: This is the core engine from the go/go2ts library responsible for the Go-to-TypeScript conversion.
  3. Configure the generator:
    • GenerateNominalTypes = true: This setting likely ensures that the generated TypeScript types are nominal (i.e., types are distinct based on their name, not just their structure), which can provide stronger type checking.
    • AddIgnoreNil: This is used for specific Go types like paramtools.Params, paramtools.ParamSet, paramtools.ReadOnlyParamSet, and types.TraceSet. This suggests that nil values for these types in Go should likely be treated as optional or nullable fields in TypeScript, or perhaps excluded from the generated types if they are always expected to be non-nil when serialized.
  4. Register Go structs and unions for conversion:
    • The code extensively uses generator.AddMultiple to register a wide array of Go structs from various perf submodules (e.g., alerts, chromeperf, clustering2, frontend/api, regression). These are the structs that are serialized to JSON and consumed by the frontend. By registering them, the generator knows which Go types to convert into corresponding TypeScript interfaces or types.
    • The addMultipleUnions helper function and generator.AddUnionToNamespace are used to register Go union types (often represented as a collection of constants or an interface implemented by several types). This ensures that TypeScript enums or union types are generated, reflecting the possible values or types a Go field can hold. The typeName argument in unionAndName and the namespace argument in AddUnionToNamespace control how these unions are named and organized in the generated TypeScript.
    • generator.AddToNamespace is used to group related types under a specific namespace in the generated TypeScript, improving organization (e.g., pivot.Request{} is added to the pivot namespace).
  5. Render the TypeScript output: Finally, generator.Render(w) writes the generated TypeScript definitions to the specified output file.

The design decision to use a dedicated program for this generation task, rather than manual synchronization or other methods, highlights the importance of automation and reducing the likelihood of human error in keeping backend and frontend types aligned. The reliance on the go/go2ts library centralizes the core conversion logic, making this module a consumer and orchestrator of that library for the specific needs of the Skia Perf application.

A key workflow is triggered by the //go:generate directive at the top of main.go: //go:generate bazelisk run --config=mayberemote //:go -- run . -o ../../modules/json/index.ts

This command, when go generate is run (typically as part of a build process), executes the compiled go/ts program.

Workflow:

  1. Developer modifies a Go struct in a perf submodule that is serialized to JSON for the UI.
  2. Developer (or an automated build step) runs go generate within the go/ts module's directory (or a higher-level directory that includes it).
  3. The go:generate directive executes the main function in go/ts/main.go.
  4. main.go -> Uses go2ts.Generator -> Registers relevant Go structs and unions.
  5. go2ts.Generator -> Analyzes registered Go types -> Generates corresponding TypeScript definitions.
  6. main.go -> Writes the TypeScript definitions to ../../modules/json/index.ts.
  7. The frontend can now import and use these up-to-date TypeScript types, ensuring type safety when interacting with JSON data from the backend.

The choice of specific structs and unions registered in main.go reflects the data contracts between the Perf backend and its frontend UI. Any Go struct that is part of an API response or request payload handled by the frontend needs to be included here.

Module: /go/types

Go Types Module

This module defines core data types used throughout the Perf application. These types provide a standardized way to represent fundamental concepts related to commits, performance data (traces), and alert configurations. The design prioritizes clarity, type safety, and consistency across different parts of the system.

Key Concepts and Components:

Commit and Tile Numbering:

  • CommitNumber (types.go): Represents a unique, sequential identifier for a commit within a repository.

    • Why: To provide a simple, linear way to reference commits. It assumes a straightforward, non-branching history for easier indexing and retrieval of performance data associated with specific code changes. The first commit in a repository is assigned CommitNumber(0).
    • How: Implemented as an int32. It includes an Add method for safe offsetting and a BadCommitNumber constant (-1) to represent invalid or non-existent commit numbers.
    • CommitNumberSlice (types.go): A utility type to enable sorting of CommitNumber slices, which is useful for various data processing and display tasks.
  • TileNumber (types.go): Represents an index for a “tile” in the TraceStore. Performance data (traces) are often stored in chunks or tiles for efficient storage and retrieval.

    • Why: Tiling allows for optimized access to performance data, especially for large datasets. Instead of loading entire traces, only relevant tiles need to be accessed.
    • How: Implemented as an int32. Functions like TileNumberFromCommitNumber and TileCommitRangeForTileNumber manage the mapping between commit numbers and tile numbers based on a configurable tileSize. The Prev() method allows navigation to the preceding tile, and BadTileNumber (-1) indicates an invalid tile.

    Workflow: Commit to Tile Mapping

    CommitNumber ----(tileSize)----> TileNumberFromCommitNumber() ----> TileNumber
                                                                      |
                                                                      V
                                          TileCommitRangeForTileNumber() ----> (StartCommit, EndCommit)
    

Performance Data Representation:

  • Trace (types.go): Represents a sequence of performance measurements, typically corresponding to a specific metric over a series of commits.

    • Why: To provide a simple and efficient way to store and manipulate time-series performance data.
    • How: Implemented as a []float32. The NewTrace function initializes a trace of a given length with a special vec32.MISSING_DATA_SENTINEL value, which is crucial for distinguishing between actual zero values and missing data points. This leverages the go.skia.org/infra/go/vec32 package for optimized float32 vector operations.
  • TraceSet (types.go): A collection of Traces, keyed by a string identifier (trace ID).

    • Why: To group related traces, often corresponding to different metrics measured for the same test or configuration.
    • How: Implemented as a map[string]Trace.

Regression Detection and Alerting:

  • RegressionDetectionGrouping (types.go): An enumeration defining how traces are grouped for regression detection.

    • Why: Different grouping strategies can be more effective for different types of performance data. This allows flexibility in the regression detection process.
    • How: Defined as a string type with constants like KMeansGrouping (cluster traces by shape) and StepFitGrouping (analyze each trace individually for steps). ToClusterAlgo provides a safe way to convert strings to this type.
  • StepDetection (types.go): An enumeration defining the algorithms used to detect significant steps (changes) in individual traces or cluster centroids.

    • Why: Various statistical methods can be employed to identify meaningful performance regressions or improvements. This allows selection of the most appropriate method for the data characteristics.
    • How: Defined as a string type with constants representing different detection methods, such as OriginalStep, AbsoluteStep, PercentStep, CohenStep, and MannWhitneyU. ToStepDetection ensures type-safe conversion from strings.
  • AlertAction (types.go): An enumeration defining the actions to be taken when an anomaly (potential regression) is detected by an alert configuration.

    • Why: To allow configurable responses to detected anomalies, ranging from no action to filing issues or triggering bisection jobs.
    • How: Defined as a string type with constants like NoAction, FileIssue, and Bisection.
  • Domain (types.go): Specifies the range of commits over which an operation (like regression detection) should be performed.

    • Why: To precisely define the scope of analysis.
    • How: A struct containing either N (number of commits) and End (timestamp for the end of the range) or an Offset (a specific commit number).
  • ProgressCallback (types.go): A function type used to provide feedback on the progress of long-running operations.

    • Why: To enable user interfaces or logging systems to display the status of tasks like regression detection.
    • How: Defined as func(message string).
  • CL (types.go): Represents a Change List identifier (e.g., a GitHub Pull Request number).

    • Why: To associate performance data or alerts with specific code changes under review.
    • How: Defined as a string.
  • AnomalyDetectionNotifyType (types.go): Defines the notification mechanism for anomalies.

    • Why: Allows flexibility in how users are informed about detected performance issues.
    • How: String type with constants IssueNotify (send to issue tracker) and NoneNotify (no notification).

Miscellaneous:

  • ProjectId (types.go): Represents a project identifier.

    • Why: Useful in multi-project environments to scope data or configurations.
    • How: Defined as a string with a predefined list AllProjectIds.
  • AllMeasurementStats (types.go): A list of valid statistical suffixes that can be part of performance measurement keys (e.g., “avg”, “max”).

    • Why: To ensure consistency and provide a reference for valid stat types when parsing or generating metric keys.
    • How: A []string slice.

The unit tests in types_test.go focus on validating the logic of CommitNumber arithmetic and the mapping between CommitNumber and TileNumber, ensuring the core indexing mechanisms are correct.

Module: /go/ui

The /go/ui module is responsible for handling frontend requests and preparing data for display in the Perf UI. Its primary purpose is to bridge the gap between user interactions on the frontend (e.g., selecting time ranges, defining queries, or applying formulas) and the backend data sources and processing logic.

This module is designed to be the central point for fetching and transforming performance data into a format that can be readily consumed by the UI. It orchestrates interactions with various other modules, such as those responsible for accessing Git history (/go/git), building dataframes (/go/dataframe), handling data shortcuts (/go/shortcut), and calculating derived metrics (/go/calc).

The key rationale behind this module's existence is to encapsulate the complexity of data retrieval and preparation, providing a clean and consistent API for the frontend. This separation of concerns allows the frontend to focus on presentation and user interaction, while the backend handles the intricacies of data access and manipulation.

The main workflow involves receiving a FrameRequest from the frontend, processing it to fetch and transform data, and then returning a FrameResponse containing the prepared data and display instructions.

Key Components and Files:

  • /go/ui/frame/frame.go: This is the core file of the module.
    • Responsibilities:
    • Defines the structure of frontend requests (FrameRequest) and backend responses (FrameResponse). FrameRequest captures user inputs like time ranges, queries, formulas, and pivot table configurations. FrameResponse packages the resulting data, along with display hints and any relevant messages.
    • Manages the processing of FrameRequest objects. This involves dispatching tasks to other modules based on the request parameters. For example, it uses the dataframe.DataFrameBuilder to fetch data based on queries or trace keys, the calc module to evaluate formulas, and the pivot module to restructure data for pivot tables.
    • Handles different types of requests, such as those based on a specific time range (REQUEST_TIME_RANGE) or a fixed number of recent commits (REQUEST_COMPACT).
    • Orchestrates the retrieval of anomalies from an anomalies.Store and associates them with the relevant traces in the response. This can be done based on time ranges or commit revision numbers.
    • Includes logic to determine the appropriate display mode for the frontend (e.g., plot, pivot table, or just a query input).
    • Implements safeguards like truncating the number of traces in the response if it exceeds a predefined limit, to prevent overwhelming the frontend or the network.
    • Provides functionality to identify “SKP changes” (significant file changes in the Git repository, historically related to Skia Picture files) within the requested commit range, which can be highlighted in the UI.
    • Design Choices & Implementation Details:
    • The ProcessFrameRequest function is the main entry point for handling a request. It creates a frameRequestProcess struct to manage the state of the request processing.
    • The processing is broken down into distinct steps: handling queries, formulas, and keys (shortcuts). Each step typically involves fetching data and then joining it into a single DataFrame.
    • Error handling is centralized in reportError to ensure consistent logging and error propagation.
    • Progress tracking is integrated via the progress.Progress interface, allowing the frontend to display updates during long-running requests.
    • The decision to support both REQUEST_TIME_RANGE and REQUEST_COMPACT request types caters to different user needs: exploring specific historical periods versus viewing the latest trends.
    • The inclusion of anomaly data directly in the FrameResponse aims to provide users with immediate context about significant performance changes alongside the raw data. The system supports fetching anomalies based on either time or revision ranges, offering flexibility depending on how anomalies are tracked and stored.
    • The ResponseFromDataFrame function acts as a final assembly step, taking a processed DataFrame and enriching it with SKP change information, display mode, and handling potential truncation.

A typical request processing flow might look like this:

Frontend Request (FrameRequest)
        |
        V
ProcessFrameRequest() in frame.go
        |
        +------------------------------+-----------------------------+--------------------------+
        |                              |                             |                          |
        V                              V                             V                          V
  (If Queries exist)             (If Formulas exist)         (If Keys exist)           (If Pivot requested)
  doSearch()                     doCalc()                    doKeys()                  pivot.Pivot()
    |                              |                             |                          |
    V                              V                             V                          V
  dfBuilder.NewFromQuery...()    calc.Eval() with               dfBuilder.NewFromKeys...()   Restructure DataFrame
                                 rowsFromQuery/Shortcut()
        |                              |                             |                          |
        +------------------------------+-----------------------------+--------------------------+
        |
        V
  DataFrame construction and merging
        |
        V
  (If anomaly search enabled)
  addTimeBasedAnomaliesToResponse() OR addRevisionBasedAnomaliesToResponse()
    |
    V
  anomalyStore.GetAnomalies...()
        |
        V
ResponseFromDataFrame()
        |
        V
  getSkps() (Find significant file changes)
        |
        V
  Truncate response if too large
        |
        V
  Set DisplayMode
        |
        V
Backend Response (FrameResponse)
        |
        V
Frontend UI

Module: /go/urlprovider

URL Provider Module

The urlprovider module is designed to generate URLs for various pages within the Perf application. This centralized approach ensures consistency in URL generation across different parts of the application and simplifies the process of linking to specific views with pre-filled parameters. The key motivation is to abstract away the complexities of URL query parameter construction and to provide a simple interface for generating links to common Perf views like “Explore”, “MultiGraph”, and “GroupReport”.

The core component of this module is the URLProvider struct. An instance of URLProvider is initialized with a perfgit.Git object. This dependency is crucial because some URL generation, particularly for time-range-based views, requires fetching commit information (specifically timestamps) from the Git repository to define the “begin” and “end” parameters of the URL.

Key Responsibilities and Components:

  • urlprovider.go: This file contains the primary logic for the URL provider.
    • URLProvider struct: Holds a reference to a perfgit.Git instance. This allows it to interact with the Git repository to fetch commit details needed for constructing time-based query parameters.
    • New(perfgit perfgit.Git) *URLProvider: This constructor function creates and returns a new instance of URLProvider. It takes a perfgit.Git object as an argument, which is stored within the struct. This design choice makes the URLProvider stateful with respect to its Git interaction capabilities.
    • Explore(...) string: This method generates a URL for the “Explore” page (/e/).
    • Why: The “Explore” page is used for in-depth analysis of performance data based on various parameters and a specific commit range.
    • How:
      1. It calls getQueryParams to construct the common query parameters like begin, end, and disable_filter_parent_traces. The begin and end timestamps are derived from the provided startCommitNumber and endCommitNumber by querying the perfGit instance. The end timestamp is intentionally shifted forward by one day to ensure that anomalies at the very end of the selected range are visible on the graph.
      2. It then serializes the parameters map (which contains key-value pairs for filtering traces) into a URL-encoded query string using GetQueryStringFromParameters. This encoded string is assigned to the queries parameter of the final URL.
      3. Additional queryParams (passed as url.Values) can be merged into the URL.
      4. The final URL is constructed by appending the encoded query parameters to the base path /e/?.
    • MultiGraph(...) string: This method generates a URL for the “MultiGraph” page (/m/).
    • Why: The “MultiGraph” page allows users to view multiple graphs simultaneously, often identified by a shortcut ID.
    • How:
      1. Similar to Explore, it uses getQueryParams to build the common time-range and filtering parameters.
      2. It specifically adds the shortcut parameter with the provided shortcutId.
      3. Additional queryParams can also be merged.
      4. The final URL is constructed by appending the encoded query parameters to the base path /m/?.
    • GroupReport(param string, value string) string: This static function generates a URL for the “Group Report” page (/u/).
    • Why: The “Group Report” page displays information related to groups of anomalies, specific anomalies, bugs, or revisions. Unlike Explore and MultiGraph, it does not inherently depend on a time range derived from commits, nor does it require complex parameter encoding.
    • How:
      1. It validates the input param against a predefined list of allowed parameters (anomalyGroupID, anomalyIDs, bugID, rev, sid). This is a security and correctness measure to prevent arbitrary parameters from being injected.
      2. If the param is valid, it constructs a simple URL with the provided param and value.
      3. It returns an empty string if the param is invalid.
      4. This function is static (not a method on URLProvider) because it doesn't need access to the perfGit instance or any other state within URLProvider. This simplifies its usage for cases where only a group report URL is needed without initializing a full URLProvider.
    • getQueryParams(...) url.Values: This private helper method is responsible for creating the base set of query parameters common to Explore and MultiGraph.
    • How:
      1. It calls fillCommonParams to set the begin and end parameters based on commit numbers.
      2. It conditionally adds disable_filter_parent_traces=true if requested.
      3. It merges any additional queryParams provided by the caller.
    • fillCommonParams(...): This private helper populates the begin and end timestamp parameters in the provided url.Values.
    • How: It uses the perfGit instance to look up the Commit objects corresponding to the startCommitNumber and endCommitNumber. The timestamps from these commits are then used. As mentioned earlier, the end timestamp is adjusted by adding one day. This separation of concerns keeps the main Explore and MultiGraph methods cleaner.
    • GetQueryStringFromParameters(parameters map[string][]string) string: This helper method converts a map of string slices (representing query parameters where a single key can have multiple values) into a URL-encoded query string.

Key Workflows:

  1. Generating an “Explore” Page URL:

    Caller provides: context, startCommitNum, endCommitNum, filterParams, disableFilterParent, otherQueryParams
      |
      v
    URLProvider.Explore()
      |
      +-------------------------------------+
      |                                     |
      v                                     v
    getQueryParams()                     GetQueryStringFromParameters(filterParams)
      |                                     |
      +--> fillCommonParams()               +--> Encode filterParams
      |      |                                        |
      |      +--> perfGit.CommitFromCommitNumber() -> Get start timestamp
      |      |                                        |
      |      +--> perfGit.CommitFromCommitNumber() -> Get end timestamp, add 1 day
      |      |                                        |
      |      +----------------------------------------+
      |      |
      |      v
      |    Combine begin, end, disableFilterParent, otherQueryParams into url.Values
      |                                     |
      +-------------------------------------+
      |
      v
    Combine base URL ("/e/?"), common query params, and encoded filterParams string
      |
      v
    Return final URL string
    
  2. Generating a “MultiGraph” Page URL:

    Caller provides: context, startCommitNum, endCommitNum, shortcutId, disableFilterParent, otherQueryParams
      |
      v
    URLProvider.MultiGraph()
      |
      v
    getQueryParams()
      |
      +--> fillCommonParams()
      |      |
      |      +--> perfGit.CommitFromCommitNumber() -> Get start timestamp
      |      |
      |      +--> perfGit.CommitFromCommitNumber() -> Get end timestamp, add 1 day
      |      |
      |      +----------------------------------------+
      |      |
      |      v
      |    Combine begin, end, disableFilterParent, otherQueryParams into url.Values
      |
      v
    Add "shortcut=shortcutId" to url.Values
      |
      v
    Combine base URL ("/m/?") and all query params
      |
      v
    Return final URL string
    
  3. Generating a “Group Report” Page URL:

    Caller provides: paramName, paramValue
      |
      v
    urlprovider.GroupReport()
      |
      v
    Validate paramName against allowed list
      |
      +-- (Valid) --> Construct URL: "/u/?" + paramName + "=" + paramValue
      |                 |
      |                 v
      |               Return URL string
      |
      +-- (Invalid) --> Return "" (empty string)
    

The design emphasizes reusability of common parameter generation logic (getQueryParams, fillCommonParams) and clear separation of concerns for generating URLs for different Perf pages. The dependency on perfgit.Git is explicitly managed through the URLProvider struct, making it clear when Git interaction is necessary.

Module: /go/userissue

The userissue module is responsible for managing the association between specific data points in Perf (identified by a trace key and a commit position) and Buganizer issues. This allows users to flag specific performance regressions or anomalies and link them directly to a tracking issue.

The core of this module is the Store interface, which defines the contract for persisting and retrieving these user-issue associations. The primary implementation of this interface is sqluserissuestore, which leverages a SQL database (specifically CockroachDB in this context) to store the data.

Key Responsibilities and Components:

  • store.go: This file defines the central UserIssue struct and the Store interface.

    • UserIssue struct: Represents a single association. It contains:
    • UserId: The email of the user who made the association.
    • TraceKey: A string uniquely identifying a performance metric's trace (e.g., “,arch=x86,config=Release,test=MyTest,”).
    • CommitPosition: An integer representing a specific point in the commit history where the data point exists.
    • IssueId: The numerical ID of the Buganizer issue.
    • Store interface: This interface dictates the operations that any backing store for user issues must support:
    • Save(ctx context.Context, req *UserIssue) error: Persists a new UserIssue association. The implementation must handle potential conflicts, such as trying to save a duplicate entry (same trace key and commit position).
    • Delete(ctx context.Context, traceKey string, commitPosition int64) error: Removes an existing user-issue association based on its unique trace key and commit position. It should handle cases where the specified association doesn't exist.
    • GetUserIssuesForTraceKeys(ctx context.Context, traceKeys []string, startCommitPosition int64, endCommitPosition int64) ([]UserIssue, error): Retrieves all UserIssue associations for a given set of trace keys within a specified range of commit positions. This is crucial for displaying these associations on performance graphs or reports.
  • sqluserissuestore/sqluserissuestore.go: This is the SQL-backed implementation of the Store interface.

    • Design Rationale: Using a SQL database provides robust data integrity, transactional guarantees, and the ability to perform complex queries if needed in the future. CockroachDB is chosen for its scalability and compatibility with PostgreSQL syntax.
    • Implementation Details:
    • It uses a go.skia.org/infra/go/sql/pool for managing database connections.
    • SQL statements are defined as constants and, in the case of listUserIssues, use Go's text/template package to dynamically construct the IN clause for multiple traceKeys. This is a common pattern to avoid SQL injection vulnerabilities and handle variadic inputs efficiently.
    • Save: Inserts a new row into the UserIssues table. It includes a last_modified timestamp.
    • Delete: First, it attempts to retrieve the issue to ensure it exists before attempting deletion. This provides a more informative error message if the record is not found.
    • GetUserIssuesForTraceKeys: Constructs a SQL query using a template to select issues matching the provided trace keys and commit position range. It then iterates over the query results and populates a slice of UserIssue structs.
  • sqluserissuestore/schema/schema.go: This file defines the Go struct UserIssueSchema which directly maps to the SQL table schema for UserIssues.

    • Purpose: This provides a typed representation of the database table, making it easier to reason about the data structure and to potentially use with ORM-like tools or schema migration utilities.
    • Key Fields:
    • user_id TEXT NOT NULL
    • trace_key TEXT NOT NULL
    • commit_position INT NOT NULL
    • issue_id INT NOT NULL
    • last_modified TIMESTAMPTZ DEFAULT now()
    • PRIMARY KEY(trace_key, commit_position): The combination of trace_key and commit_position uniquely identifies a user issue, preventing multiple issues from being associated with the exact same data point.
  • mocks/Store.go: This contains a mock implementation of the Store interface, generated using the testify/mock library.

    • Purpose: This is essential for unit testing components that depend on the userissue.Store without requiring a live database connection. It allows developers to define expected calls and return values for the store's methods.

Workflow Example: Saving a User Issue

  1. User Action: A user on the Perf frontend identifies a data point (e.g., on a graph) and associates it with a Buganizer issue ID.
  2. API Request: The frontend sends a request to a backend API endpoint.
  3. Backend Handler: The API handler receives the request, which includes the user's ID, the trace key, the commit position, and the issue ID.
  4. Store Interaction: The handler creates a userissue.UserIssue struct and calls the Save method on an instance of userissue.Store (likely sqluserissuestore.UserIssueStore). User Request (UI) | v API Endpoint | v Backend Handler | | Creates userissue.UserIssue{UserId:"...", TraceKey:"...", CommitPosition:123, IssueId:45678} v userissue.Store.Save(ctx, &issue) | v sqluserissuestore.UserIssueStore.Save() | | Constructs SQL: INSERT INTO UserIssues (...) VALUES ($1, $2, $3, $4, $5) v SQL Database (UserIssues Table) <-- Row inserted

Workflow Example: Retrieving User Issues for a Chart

  1. User Action: A user views a performance chart displaying multiple traces over a range of commits.
  2. Frontend Request: The frontend needs to know if any data points on the visible traces and commit range have associated issues. It requests this information from a backend API.
  3. Backend Handler: The API handler receives the list of trace keys visible on the chart and the start/end commit positions.
  4. Store Interaction: The handler calls GetUserIssuesForTraceKeys on the userissue.Store. Chart Display Request (UI) | | Provides: traceKeys=["trace1", "trace2"], startCommit=100, endCommit=200 v API Endpoint | v Backend Handler | v userissue.Store.GetUserIssuesForTraceKeys(ctx, traceKeys, startCommit, endCommit) | v sqluserissuestore.UserIssueStore.GetUserIssuesForTraceKeys() | | Constructs SQL: SELECT ... FROM UserIssues WHERE trace_key IN ('trace1', 'trace2') AND commit_position>=100 AND commit_position<=200 v SQL Database (UserIssues Table) | | Returns rows matching the query v Backend Handler | | Formats response v API Endpoint | v UI (displays issue markers on chart)

The design emphasizes a clear separation of concerns with the Store interface, allowing for different storage backends if necessary (though SQL is the current and likely long-term choice). The SQL implementation is straightforward, using parameterized queries for security and templates for dynamic query construction where appropriate.

Module: /go/workflows

Module: /go/workflows

Overview

This module defines and implements Temporal workflows for automating tasks related to performance anomaly detection and analysis in Skia Perf. It orchestrates interactions between various services like the AnomalyGroup service, Culprit service, and Gerrit service to achieve end-to-end automation. The primary goal is to streamline the process of identifying performance regressions, finding their root causes (culprits), and notifying relevant parties.

The workflows are designed to be resilient and fault-tolerant, leveraging Temporal's capabilities for retries and state management. This ensures that even if individual steps or external services encounter transient issues, the overall process can continue and eventually complete.

Responsibilities and Key Components

The module is structured into a public API (workflows.go) and an internal implementation package (internal/).

workflows.go:

  • Purpose: Defines the public interface for the workflows, including their names and the data structures for their parameters and results.
  • Why: This separation allows other modules (clients) to trigger these workflows without needing to know the internal implementation details or depend on the specific libraries used within the workflows. It acts as a contract.
  • Key Contents:
    • Workflow Name Constants (ProcessCulprit, MaybeTriggerBisection): These string constants are the canonical names used to invoke the respective workflows via the Temporal client. Using constants helps avoid typos and ensures consistency.
    • Parameter and Result Structs (ProcessCulpritParam, ProcessCulpritResult, MaybeTriggerBisectionParam, MaybeTriggerBisectionResult): These structs define the data that needs to be passed into a workflow and the data that a workflow is expected to return upon completion. They ensure type safety and clarity in communication.

internal/ package: This package contains the actual implementation of the workflows and their associated activities. Activities are the building blocks of Temporal workflows, representing individual units of work that can be executed, retried, and timed out independently.

  • options.go:

    • Purpose: Centralizes the configuration for Temporal activities and child workflows.
    • Why: Provides a consistent way to define timeouts and retry policies. This makes it easier to manage and adjust these settings globally or for specific categories of operations. For example, short-lived activities interacting with external services have different reliability characteristics than long-running child workflows.
    • Key Components:
    • regularActivityOptions: Defines default options (e.g., 1-minute timeout, 10 retry attempts) for standard activities that are expected to complete quickly, like API calls to other services.
    • childWorkflowOptions: Defines options for child workflows (e.g., 12-hour execution timeout, 4 retry attempts). This longer timeout accommodates potentially resource-intensive tasks like bisections which involve compilation and testing.
  • maybe_trigger_bisection.go:

    • Purpose: Implements the MaybeTriggerBisectionWorkflow, which is the core logic for deciding whether to automatically find the cause of a performance regression (bisection) or to simply report the anomaly.
    • Why: This workflow automates a critical decision point in the performance analysis pipeline. It aims to reduce manual intervention by automatically initiating bisections for significant regressions while still allowing for manual reporting of less critical issues.
    • Key Workflow Steps:
    • Wait: Pauses for a defined duration (_WAIT_TIME_FOR_ANOMALIES, e.g., 30 minutes). This allows time for related anomalies to be detected and grouped together, potentially providing a more comprehensive picture before taking action. Wait for more anomalies ->
    • Load Anomaly Group: Retrieves details of the specific anomaly group using an activity that calls the AnomalyGroup service. Load Anomaly Group (Activity) AnomalyGroup Service <---> Workflow
    • Decision (Bisect or Report): Based on the GroupAction field of the anomaly group: - If BISECT: a. Load Top Anomaly: Fetches the most significant anomaly within the group. b. Resolve Commit Hashes: Converts the start and end commit positions of the anomaly into Git commit hashes using an activity that interacts with a Gerrit/Crrev service. Get Commit Hashes (Activity) Gerrit/Crrev Service <---> Workflow c. Launch Bisection (Child Workflow): Triggers a separate CulpritFinderWorkflow (defined in the pinpoint/go/workflows module) as a child workflow. This child workflow is responsible for performing the actual bisection. - A unique ID is generated for the Pinpoint job. - The child workflow is configured with ParentClosePolicy: ABANDON, meaning it will continue running even if this parent workflow terminates. This is crucial because bisections can be long-running. - Callback parameters are passed to the child workflow so it knows how to report its findings back (e.g., which Anomaly Group ID it's associated with, which Culprit service to use). Launch Pinpoint Bisection Workflow -----------------> Pinpoint.CulpritFinderWorkflow (Child) d. Update Anomaly Group: Records the ID of the launched bisection job back into the AnomalyGroup. Update Anomaly Group with Bisection ID (Activity) AnomalyGroup Service <---> Workflow - If REPORT: a. Load Top Anomalies: Fetches a list of the top N anomalies in the group. b. Notify User: Calls an activity that uses the Culprit service to file a bug or send a notification about these anomalies. Notify User of Anomalies (Activity) Culprit Service <--------> Workflow
    • Helper Functions:
    • parseStatisticNameFromChart, benchmarkStoriesNeedUpdate, updateStoryDescriptorName: These functions handle specific data transformations needed to correctly format parameters for the Pinpoint bisection request, often due to legacy conventions or differences in how metrics are named.
  • process_culprit.go:

    • Purpose: Implements the ProcessCulpritWorkflow, which handles the results of a completed bisection (i.e., when one or more culprits are identified).
    • Why: This workflow bridges the gap between a successful bisection and making that information actionable. It ensures that found culprits are stored and that users are notified appropriately.
    • Key Workflow Steps:
    • Convert Commits: Transforms the commit data from the Pinpoint format to the format expected by the Culprit service. This involves parsing repository URLs.
    • Persist Culprit: Calls an activity to store the identified culprit(s) in a persistent datastore via the Culprit service. Persist Culprit (Activity) Culprit Service <--------> Workflow
    • Notify User of Culprit: Calls an activity to notify users (e.g., by filing or updating a bug) about the identified culprit(s) via the Culprit service. Notify User of Culprit (Activity) Culprit Service <--------> Workflow
    • Helper Function:
    • ParsePinpointCommit: Handles the parsing of repository URLs from the Pinpoint commit format (e.g., https://{host}/{project}.git) into separate host and project components required by the Culprit service.
  • anomalygroup_service_activity.go:

    • Purpose: Defines activities that interact with the AnomalyGroup gRPC service.
    • Why: Encapsulates the client-side logic for communicating with the AnomalyGroup service. This makes the workflows themselves cleaner and focuses them on orchestration rather than low-level RPC details.
    • Key Activities:
    • LoadAnomalyGroupByID: Fetches an anomaly group by its ID.
    • FindTopAnomalies: Retrieves the most significant anomalies within a group.
    • UpdateAnomalyGroup: Updates an existing anomaly group (e.g., to add a bisection ID).
  • culprit_service_activity.go:

    • Purpose: Defines activities that interact with the Culprit gRPC service.
    • Why: Similar to anomalygroup_service_activity.go, this encapsulates communication with the Culprit service.
    • Key Activities:
    • PeristCulprit: Stores culprit information.
    • NotifyUserOfCulprit: Notifies users about a found culprit (e.g., by creating a bug).
    • NotifyUserOfAnomaly: Notifies users about a set of anomalies (used when the group action is REPORT).
  • gerrit_service_activity.go:

    • Purpose: Defines activities for interacting with Gerrit or a Gerrit-like service (specifically Crrev in this case) to resolve commit positions to commit hashes.
    • Why: Bisection workflows often start with commit positions (which are easier for humans or detection systems to reason about initially) but need actual Git hashes to perform the bisection. This activity provides that translation.
    • Key Activity:
    • GetCommitRevision: Takes a commit position (as an integer) and returns its corresponding Git hash.

worker/main.go:

  • Purpose: This is the entry point for the Temporal worker process that hosts and executes the workflows and activities defined in this module.
  • Why: Temporal workers are the processes that actually run the workflow and activity code. This main function sets up the worker, connects it to the Temporal server, and registers the workflows and activities it's capable of handling.
  • Key Operations:
    1. Initialization: Sets up logging and Prometheus metrics.
    2. Temporal Client Creation: Establishes a connection to the Temporal frontend service.
    3. Worker Creation: Creates a new Temporal worker associated with a specific task queue (e.g., localhost.dev or a production queue name). Workflows and activities are dispatched to workers listening on the correct task queue.
    4. Workflow Registration: Registers ProcessCulpritWorkflow and MaybeTriggerBisectionWorkflow with the worker, associating them with their public names (e.g., workflows.ProcessCulprit).
    5. Activity Registration: Registers instances of the activity structs (e.g., CulpritServiceActivity, AnomalyGroupServiceActivity, GerritServiceActivity) with the worker.
    6. Worker Start: Starts the worker, which begins polling the specified task queue for tasks to execute.

Key Workflows/Processes

1. Anomaly Group Processing and Potential Bisection (MaybeTriggerBisectionWorkflow)

External Trigger (e.g., new AnomalyGroup created)
       |
       v
Start MaybeTriggerBisectionWorkflow(AG_ID)
       |
       +----------------------------------+
       | Wait (e.g., 30 mins)             |
       +----------------------------------+
       |
       v
LoadAnomalyGroupByID(AG_ID) ----> AnomalyGroup Service
       |
       +-----------+
       | GroupAction?|
       +-----------+
          /       \
         /         \
    BISECT         REPORT
      |              |
      v              v
FindTopAnomalies(AG_ID, Limit=1)  FindTopAnomalies(AG_ID, Limit=10)
      |              |
      v              v
GetCommitRevision(StartCommit) --> Gerrit   Anomalies --> Convert to CulpritService format
      |              |
      v              v
GetCommitRevision(EndCommit)   --> Gerrit   NotifyUserOfAnomaly(AG_ID, Anomalies) --> Culprit Service
      |
      v
Execute Pinpoint.CulpritFinderWorkflow (Child)
      |   (Async, ParentClosePolicy=ABANDON)
      |   Params: {StartHash, EndHash, Config, Benchmark, Story, ...
      |            CallbackParams: {AG_ID, CulpritServiceURL, GroupingTaskQueue}}
      |
      v
UpdateAnomalyGroup(AG_ID, BisectionID) --> AnomalyGroup Service
      |
      v
End Workflow

2. Processing Bisection Results (ProcessCulpritWorkflow)

This workflow is typically triggered as a callback by the Pinpoint CulpritFinderWorkflow when it successfully identifies a culprit.

Pinpoint.CulpritFinderWorkflow completes
       | (Calls back to Temporal, invoking ProcessCulpritWorkflow)
       v
Start ProcessCulpritWorkflow(Commits, AG_ID, CulpritServiceURL)
       |
       +----------------------------------+
       | Convert Pinpoint Commits to       |
       | Culprit Service Format           |
       | (Parse Repository URLs)          |
       +----------------------------------+
       |
       v
PersistCulprit(Commits, AG_ID) --------> Culprit Service
       | (Returns CulpritIDs)
       v
NotifyUserOfCulprit(CulpritIDs, AG_ID) -> Culprit Service
       | (Returns IssueIDs, e.g., bug numbers)
       v
End Workflow

Module: /integration

The /integration module provides a dataset and tools for conducting integration tests on the Perf performance monitoring system. Its primary purpose is to offer a controlled and reproducible environment for verifying the ingestion and processing capabilities of Perf.

The core of this module is the data subdirectory. This directory houses a collection of JSON files, each representing performance data associated with specific commits from the perf-demo-repo (https://github.com/skia-dev/perf-demo-repo.git). These files are structured according to the format.Format schema defined in go.skia.org/infra/perf/go/ingest/format. This standardized format is crucial as it allows Perf's ‘dir’ type ingester to directly consume these files. The dataset is intentionally designed to include a mix of valid data points and specific error conditions:

  • Nine “good” files: These represent typical, valid performance data that Perf should successfully ingest and process. Each file corresponds to a known commit in the perf-demo-repo.
  • One file with a “bad” commit: This file (demo_data_commit_10.json) contains a git_hash that does not correspond to an actual commit in the perf-demo-repo. This allows testing how Perf handles data associated with unknown or invalid commit identifiers.
  • One malformed JSON file: malformed.json is intentionally not a valid JSON file. This is used to test Perf's error handling capabilities when encountering incorrectly formatted input data.

The generation of these data files is handled by generate_data.go. This Go program is responsible for creating the JSON files in the data directory. It uses a predefined list of commit hashes from the perf-demo-repo and generates random but plausible performance metrics for each. The inclusion of this generator script is important because it allows developers to easily modify, expand, or regenerate the test dataset if the testing requirements change or if new scenarios need to be covered. The script uses math/rand for generating some variability in the measurement values, ensuring the data isn't entirely static while still being predictable.

The key workflow for utilizing this module in an integration test scenario would look something like this:

  1. Setup Perf: Configure a local instance of Perf.
  2. Configure Ingester: Point Perf's ‘dir’ type ingester to the /integration/data directory. Perf Instance --> Ingester (type: 'dir') --> /integration/data/*.json
  3. Run Ingestion: Trigger the ingestion process in Perf.
  4. Verify:
    • Confirm that the data from the nine “good” files is correctly ingested and displayed in Perf.
    • Check that Perf appropriately handles the file with the “bad” commit (e.g., logs an error, flags the data).
    • Verify that Perf correctly identifies and reports the error with the malformed.json file.

The BUILD.bazel file defines how the components of this module are built.

  • The data filegroup makes the JSON test files available to other parts of the system, specifically for use in performance testing (//perf:__subpackages__).
  • The integration_lib go_library encapsulates the logic from generate_data.go.
  • The integration go_binary provides an executable to run generate_data.go, allowing for easy regeneration of the test data.

In essence, the /integration module provides a self-contained, version-controlled set of test data and a mechanism to regenerate it. This is crucial for ensuring the stability and correctness of Perf‘s data ingestion pipeline by providing a consistent baseline for integration testing. The choice to include both valid and intentionally erroneous data points allows for comprehensive testing of Perf’s data handling capabilities, including its robustness in the face of invalid input.

Module: /jupyter

The /jupyter module provides tools and examples for interacting with Skia's performance data, specifically data from perf.skia.org. The primary goal is to enable users to programmatically query, analyze, and visualize performance metrics using the power of Python libraries like Pandas, NumPy, and Matplotlib within a Jupyter Notebook environment.

The core functionality revolves around fetching and processing performance data. This is achieved by providing Python functions that abstract the complexities of interacting with the perf.skia.org API. This allows users to focus on the data analysis itself rather than the underlying data retrieval mechanisms.

Key Components/Files:

  • /jupyter/Perf+Query.ipynb: This is a Jupyter Notebook that serves as both an example and a utility library.

    • Why: It demonstrates how to use the provided Python functions to query performance data. It also contains the definitions of these key functions, making it a self-contained environment for performance analysis. The notebook format is chosen for its interactive nature, allowing users to execute code snippets, see results immediately, and experiment with different queries and visualizations.

    • How:

    • perf_calc(formula): This function is designed to evaluate a specific formula against the performance data. It takes a string formula (e.g., 'count(filter(\"\"))') as input. The formula is sent to the perf.skia.org backend for processing. This function is useful when you need to perform calculations or aggregations on the data directly on the server side before retrieving it.

    • perf_query(query): This function allows for more direct querying of performance data based on key-value pairs. It takes a query string (e.g., 'source_type=skp&sub_result=min_ms') that specifies the parameters for data retrieval. This is suitable when you want to fetch raw or filtered trace data.

    • perf_impl(body): This is an internal helper function used by both perf_calc and perf_query. It handles the actual HTTP communication with perf.skia.org. It first determines the time range for the query (typically the last 50 commits by default) by fetching initial page data. Then, it sends the query or formula to the /_/frame/start endpoint, polls the /_/frame/status endpoint until the request is successful, and finally retrieves the results from /_/frame/results. The results are then processed into a Pandas DataFrame, which is a powerful data structure for analysis in Python. A special value 1e32 from the backend (often representing missing or invalid data) is converted to np.nan (Not a Number) for better handling in Pandas.

    • paramset(): This utility function fetches the available parameter set from perf.skia.org. This is useful for discovering the possible values for different dimensions like ‘model’, ‘test’, ‘cpu_or_gpu’, etc., which can then be used to construct more targeted queries.

    • Examples: The notebook is rich with examples showcasing how to use perf_calc and perf_query, plot the resulting DataFrames using Pandas' built-in plotting capabilities or Matplotlib directly, normalize data, calculate means, and perform more complex analyses like finding the noisiest hardware models or comparing CPU vs. GPU performance for specific tests. These examples serve as practical starting points for users.

    • Workflow (Simplified perf_impl):

    • Client (Jupyter Notebook) -- GET /_/initpage/ --> perf.skia.org (Get time bounds)

    • perf.skia.org -- Initial Data (JSON) --> Client

    • Client -- POST /_/frame/start (with query/formula & time bounds) --> perf.skia.org

    • perf.skia.org -- Request ID (JSON) --> Client

    • Client -- GET /_/frame/status/{ID} --> perf.skia.org (Loop until ‘Success’)

    • perf.skia.org -- Status (JSON) --> Client

    • Client -- GET /_/frame/results/{ID} --> perf.skia.org

    • perf.skia.org -- Performance Data (JSON) --> Client

    • Client (Python): Parse JSON -> Create Pandas DataFrame -> Return DataFrame to user.

  • /jupyter/README.md: This file provides instructions on setting up the necessary Python environment to run Jupyter Notebooks and the required libraries (Pandas, SciPy, Matplotlib).

    • Why: Python environment management can be tricky, especially with system-wide installations. Using a virtual environment (virtualenv) is recommended to isolate project dependencies and avoid conflicts.
    • How: It guides the user through installing pip, python-dev, and python-virtualenv using apt-get (assuming a Debian-based Linux system). It then shows how to create a virtual environment, activate it, upgrade pip, and install jupyter, notebook, scipy, pandas, and matplotlib within that isolated environment. Finally, it explains how to run the Jupyter Notebook server and deactivate the environment when done. This ensures a reproducible and clean setup for users wanting to utilize the Perf+Query.ipynb notebook.

The design emphasizes ease of use for data analysts and developers who need to interact with Skia's performance data. By leveraging Jupyter Notebooks, it provides an interactive and visual way to explore performance trends and issues. The abstraction of API calls into simple Python functions (perf_calc, perf_query) significantly lowers the barrier to entry for accessing this rich dataset.

Module: /lint

The /lint module is responsible for ensuring code quality and consistency within the project by integrating and configuring JSHint, a popular JavaScript linting tool.

The primary goal of this module is to provide a standardized way to identify and report potential errors, stylistic issues, and anti-patterns in the JavaScript codebase. This helps maintain code readability, reduces the likelihood of bugs, and promotes adherence to established coding conventions.

The core component of this module is the reporter.js file. This file defines a custom reporter function that JSHint will use to format and output the linting results.

The decision to implement a custom reporter stems from the need to present linting errors in a clear, concise, and actionable format. Instead of relying on JSHint‘s default output, which might be too verbose or not ideally suited for the project’s workflow, reporter.js provides a tailored presentation.

The reporter function within reporter.js takes an array of error objects (res) as input, where each object represents a single linting issue found by JSHint. It then iterates through these error objects and constructs a formatted string for each error. The format chosen is filename:line:character message, which directly points developers to the exact location of the issue in the source code.

For example: src/myFile.js:10:5 Missing semicolon

This specific format is chosen for its commonality in development tools and its ease of integration with various editors and IDEs, allowing developers to quickly navigate to the reported errors.

After processing all errors, if any were found, the reporter function aggregates the formatted error strings and prints them to the standard output (process.stdout.write). Additionally, it appends a summary line indicating the total number of errors found, ensuring that developers have a quick overview of the linting status. The pluralization of “error” vs. “errors” is also handled for grammatical correctness.

The workflow can be visualized as:

JSHint analysis --[error objects]--> reporter.js --[formatted errors & summary]--> stdout

By controlling the output format, this module ensures that linting feedback is consistently presented and easily digestible, contributing to a more efficient development process. The design prioritizes providing actionable information to developers, enabling them to address code quality issues promptly.

Module: /migrations

This module is responsible for managing SQL database schema migrations for Perf. Perf utilizes SQL backends to store various data, including trace data, shortcuts, and alerts. As the application evolves, the database schema may need to change. This module provides the mechanism to apply these changes and to upgrade existing databases to the schema expected by the current Perf version.

The core of this system relies on the github.com/golang-migrate/migrate/v4 library. This library provides a robust framework for versioning database schemas and applying migrations in a controlled manner.

The key design principle is to have a versioned set of SQL scripts for each supported SQL dialect. This allows Perf to:

  1. Initialize a new database with the correct schema.
  2. Upgrade an existing database from an older schema version to the current one.
  3. Rollback schema changes if necessary, by providing “down” migrations.

Each SQL dialect (e.g., CockroachDB) has its own subdirectory within the /migrations module. The naming convention for these directories is critical: they must match the values defined in sql.Dialect.

Inside each dialect-specific directory, migration files are organized by version.

  • File names are prefixed with a 0-padded version number (e.g., 0001_, 0002_).
  • For each version, there are two files:
    • An .up. file (e.g., 0001_create_initial_tables.up.sql): Contains SQL statements to apply the schema changes for that version.
    • A .down. file (e.g., 0001_create_initial_tables.down.sql): Contains SQL statements to revert the schema changes introduced by the corresponding .up. file.

This paired approach ensures that migrations can be applied and rolled back smoothly.

Key Files and Responsibilities:

  • README.md: Provides a high-level overview of the migration system, explaining its purpose and the use of the golang-migrate/migrate library. It also details the directory structure and file naming conventions for migration scripts.
  • cockroachdb/: This directory contains the migration scripts specifically for the CockroachDB dialect.
    • cockroachdb/0001_create_initial_tables.up.sql: This is the first migration script for CockroachDB. It defines the initial schema for Perf, creating tables such as TraceValues, SourceFiles, ParamSets, Postings, Shortcuts, Alerts, Regressions, and Commits. The table definitions include primary keys, indexes, and column types tailored for efficient data storage and retrieval specific to Perf's needs (e.g., storing trace data, associating traces with source files, managing alert configurations, and tracking commit history). The schema is designed to support the various functionalities of Perf, such as querying traces by parameters, retrieving trace values over commit ranges, and linking regressions to specific alerts and commits.
    • cockroachdb/0001_create_initial_tables.down.sql: This file is intended to contain SQL statements to drop the tables created by its corresponding .up. script. However, as a safety precaution against accidental data loss, it is currently empty. The design acknowledges the potential danger of automated table drops in a production environment.
  • cdb.sql: This is a utility SQL script designed for developers to interact with and test queries against a CockroachDB instance populated with Perf data. It includes sample INSERT statements to populate tables with test data and various SELECT queries demonstrating common data retrieval patterns used by Perf. This file is not part of the automated migration process but serves as a helpful tool for development and debugging. It showcases how to query for traces based on parameters, retrieve trace values, find the most recent tile, and get source file information. It also includes examples of more complex queries involving INTERSECT and JOIN operations, reflecting the kinds of queries Perf might execute.
  • test.sql: Similar to cdb.sql, this script is for testing and experimentation, but it's tailored for a SQLite database. It creates a schema similar to the CockroachDB one (though potentially simplified or with slight variations due to dialect differences) and populates it with test data. It contains a series of CREATE TABLE, INSERT, and SELECT statements that developers can use to quickly set up a local test environment and verify SQL logic.
  • batch-delete.sh and batch-delete.sql: These files provide a mechanism for performing batch deletions of specific parameter data from the ParamSets table in a CockroachDB instance.
    • batch-delete.sql: Contains the DELETE SQL statement. It is designed to be edited directly to specify the deletion criteria (e.g., tile_number, param_key, param_value ranges) and the LIMIT for the number of rows deleted in each batch. This batching approach is crucial for deleting large amounts of data without overwhelming the database or causing long-running transactions.
    • batch-delete.sh: A shell script that repeatedly executes batch-delete.sql using the cockroach sql command-line tool. It runs in a loop with a short sleep interval, allowing for controlled, iterative deletion. This script assumes that a port-forward to the CockroachDB instance is already established. This utility is likely used for data cleanup or maintenance tasks that require removing specific, potentially large, datasets.

Migration Workflow (Conceptual):

When Perf starts or when a migration command is explicitly run:

  1. Determine Current Schema Version: The golang-migrate/migrate library connects to the database and checks the current schema version (often stored in a dedicated migrations table managed by the library itself).

  2. Identify Target Schema Version: This is typically the highest version number found among the migration files for the configured SQL dialect.

  3. Apply Pending Migrations:

    - If the current schema version is lower than the target version, the
      library iteratively executes the `.up.sql` files in ascending order of
      their version numbers, starting from the version immediately following
      the current one, up to the target version.
    - Each successful `.up.` migration updates the schema version in the
      database.
    
    Example: Current Version = 0, Target Version = 2 `DB State (v0) --> Run
    

    0001**.up.sql --> DB State (v1) --> Run 0002**.up.sql --> DB State (v2)`

  4. Rollback Migrations (if needed):

    - If a user needs to revert to an older schema version, the library can
      execute the `.down.sql` files in descending order.
    
    Example: Current Version = 2, Target Rollback Version = 0 `DB State (v2) -->
    

    Run 0002**.down.sql --> DB State (v1) --> Run 0001**.down.sql --> DB State (v0)`

The BUILD.bazel file defines a filegroup named cockroachdb which bundles all files under the cockroachdb/ subdirectory. This is likely used by other parts of the Perf build system, perhaps to package these migration scripts or make them accessible to the Perf application when it needs to perform migrations.

Module: /modules

Modules Documentation

Overview

The modules directory contains a collection of frontend TypeScript modules that constitute the building blocks of the Perf web application's user interface. These modules primarily define custom HTML elements (web components) and utility functions for various UI functionalities, data processing, and interaction with backend services. The architecture emphasizes modularity, reusability, and a component-based approach, largely leveraging the Lit library for creating custom elements and elements-sk for common UI widgets.

The design philosophy encourages separation of concerns:

  • UI Components: Dedicated custom elements encapsulate specific UI features like plotting, alert configuration, data tables, dialogs, and input controls.
  • Data Handling: Modules like dataframe and progress manage data fetching, processing, and state.
  • Utilities: Modules like paramtools, pivotutil, cid, and trybot provide common functionalities for data manipulation, key parsing, and specific calculations.
  • Styling and Theming: A centralized themes module ensures a consistent visual appearance, building upon infra-sk's theming capabilities.
  • JSON Contracts: The json module defines TypeScript interfaces that mirror backend Go structures, ensuring type safety in client-server communication.

This modular structure aims to create a maintainable and scalable frontend codebase. Each module typically includes its core logic, associated styles, demo pages for isolated development and testing, and unit/integration tests.

Key Responsibilities and Components

A significant portion of the modules is dedicated to creating custom HTML elements that serve as interactive UI components. These elements often encapsulate complex behavior and interactions, simplifying their use in higher-level page components.

Data Visualization and Interaction:

  • plot-simple-sk: A custom-built canvas-based plotting element for rendering interactive line graphs, optimized for performance with features like dual canvases, Path2D objects, and k-d trees for point proximity.
  • plot-google-chart-sk: An alternative plotting element that wraps the Google Charts library, offering a rich set of features and interactivity like panning, zooming, and trace visibility toggling.
  • plot-summary-sk: Displays a summary plot (often using Google Charts) and allows users to select a range, which is useful for overview and drill-down scenarios.
  • chart-tooltip-sk: Provides a detailed, interactive tooltip for data points on charts, showing commit information, anomaly details, and actions like bisection or requesting traces.
  • graph-title-sk: Displays a structured title for graphs, showing key-value parameter pairs associated with the plotted data.
  • word-cloud-sk: Visualizes key-value pairs and their frequencies as a textual list with proportional bars.

Alert and Regression Management:

  • alert-config-sk: A UI for creating and editing alert configurations, including query definition, detection algorithms, and notification settings.
  • alerts-page-sk: A page for viewing, creating, and managing all alert configurations.
  • cluster-summary2-sk: Displays a detailed summary of a performance cluster, including a plot, statistics, and triage controls.
  • anomalies-table-sk: Renders a sortable and interactive table of detected performance anomalies, allowing for grouping and bulk actions like triage and graphing.
  • anomaly-sk: Displays detailed information about a single performance anomaly.
  • triage-status-sk: A simple button-like element indicating the current triage status of a cluster and allowing users to initiate the triage process.
  • triage-menu-sk: Provides a menu for bulk triage actions on selected anomalies, including assigning bugs or marking them as ignored.
  • new-bug-dialog-sk: A dialog for filing new bugs related to anomalies, pre-filling details.
  • existing-bug-dialog-sk: A dialog for associating anomalies with existing bug reports.
  • user-issue-sk: Manages the association of user-reported Buganizer issues with specific data points.
  • bisect-dialog-sk: A dialog for initiating a Pinpoint bisection process to find the commit causing a regression.
  • pinpoint-try-job-dialog-sk: A (legacy) dialog for initiating Pinpoint A/B try jobs to request additional traces.
  • triage-page-sk: A page dedicated to viewing and triaging regressions based on time range and filters.
  • regressions-page-sk: A page for viewing regressions associated with specific “subscriptions” (e.g., sheriff configs).
  • subscription-table-sk: Displays details of a subscription and its associated alerts.
  • revision-info-sk: Displays information about anomalies detected around a specific revision.

Data Input and Selection:

  • query-sk: A comprehensive UI for constructing complex queries by selecting parameters and their values.
  • paramset-sk: Displays a set of parameters and their values, often used to summarize a query or data selection.
  • query-chooser-sk: Combines paramset-sk (for summary) and query-sk (in a dialog) for a compact query selection experience.
  • query-count-sk: Shows the number of items matching a given query, fetching this count from a backend endpoint.
  • commit-detail-picker-sk: Allows users to select a specific commit from a range, typically presented in a dialog with date range filtering.
  • commit-detail-panel-sk: Displays a list of commit details, making them selectable.
  • commit-detail-sk: Displays information about a single commit with action buttons.
  • calendar-input-sk: A date input field combined with a calendar picker dialog.
  • calendar-sk: A standalone interactive calendar widget.
  • day-range-sk: Allows selection of a “begin” and “end” date.
  • domain-picker-sk: Allows selection of a data domain either by date range or by a number of recent commits.
  • test-picker-sk: A guided, multi-step picker for selecting tests or traces by sequentially choosing parameter values.
  • picker-field-sk: A text input field with a filterable dropdown menu of predefined options, built using Vaadin ComboBox.
  • algo-select-sk: A dropdown for selecting a clustering algorithm.
  • split-chart-menu-sk: A menu for selecting an attribute by which to split a chart.
  • pivot-query-sk: A UI for configuring pivot table requests (group by, operations, summaries).
  • triage2-sk: A set of three buttons for selecting a triage status (positive, negative, untriaged).
  • tricon2-sk: An icon that visually represents one of the three triage states.

Data Display and Structure:

  • pivot-table-sk: Displays pivoted DataFrame data in a sortable table.
  • json-source-sk: A dialog for viewing the raw JSON source data for a specific trace point.
  • ingest-file-links-sk: Displays relevant links (e.g., to Swarming, Perfetto) associated with an ingested data point.
  • point-links-sk: Displays links from ingestion files and generates commit range links between data points.
  • commit-range-sk: Dynamically generates a URL to a commit range viewer based on begin and end commits.

Scaffolding and Application Structure:

  • perf-scaffold-sk: Provides the consistent layout, header, and navigation sidebar for all Perf application pages.
  • explore-simple-sk: The core element for exploring and visualizing performance data, including querying, plotting, and anomaly interaction.
  • explore-sk: Wraps explore-simple-sk, adding features like user authentication, default configurations, and optional integration with test-picker-sk.
  • explore-multi-sk: Allows displaying and managing multiple explore-simple-sk graphs simultaneously, with shared controls and shortcut management.
  • favorites-dialog-sk: A dialog for adding or editing bookmarked “favorites” (named URLs).
  • favorites-sk: Displays and manages a user's list of favorites.

Backend Interaction and Data Processing Utilities:

  • cid/cid.ts: Provides lookupCids to fetch detailed commit information based on commit numbers.
  • common/plot-builder.ts & common/plot-util.ts: Utilities for transforming DataFrame and TraceSet data into formats suitable for plotting libraries (especially Google Charts) and for creating consistent chart options.
  • common/test-util.ts: Sets up mocked API responses (fetch-mock) for various backend endpoints, facilitating isolated testing and demo page development.
  • const/const.ts: Defines shared constants, notably MISSING_DATA_SENTINEL for representing missing data points, ensuring consistency with the backend.
  • csv/index.ts: Converts DataFrame objects into CSV format for data export.
  • dataframe/index.ts & dataframe/dataframe_context.ts: Core logic for managing and manipulating DataFrame objects. DataFrameRepository (a LitElement context provider) handles fetching, caching, merging, and providing DataFrame and DataTable objects to consuming components.
  • dataframe/traceset.ts: Utilities for extracting and formatting information from trace keys within DataFrames/DataTables, such as generating chart titles and legends.
  • errorMessage/index.ts: A wrapper around elements-sk's errorMessage to display persistent error messages by default.
  • json/index.ts: Contains TypeScript interfaces and types that define the structure of JSON data exchanged with the backend, crucial for type safety and often auto-generated from Go structs.
  • paramtools/index.ts: Client-side utilities for creating, parsing, and manipulating ParamSet objects and structured trace keys (e.g., makeKey, fromKey, queryFromKey).
  • pivotutil/index.ts: Utilities for validating pivot table requests (pivot.Request) and providing descriptions for pivot operations.
  • progress/progress.ts: Implements startRequest for initiating and polling the status of long-running server-side tasks, providing progress updates to the UI.
  • trace-details-formatter/traceformatter.ts: Provides TraceFormatter implementations (default and Chrome-specific) for converting trace parameter sets to display strings and vice-versa for querying.
  • trybot/calcs.ts: Calculates and aggregates stddevRatio values from Perf trybot results, grouping them by parameter to identify performance impacts.
  • trybot-page-sk: A page for analyzing performance regressions based on commit or trybot run, using trybot/calcs for analysis.
  • window/index.ts: Utilities related to the browser window object, including parsing build tag information from window.perf.image_tag.

Core Architectural Patterns:

  • Custom Elements (Web Components): The UI is primarily built using custom elements, promoting encapsulation, reusability, and interoperability. Most elements extend ElementSk from infra-sk.
  • Lit Library: Widely used for defining custom elements, providing efficient templating (lit-html) and reactive updates.
  • State Management:
    • Local component state is managed within the elements themselves.
    • stateReflector (from infra-sk) is frequently used to synchronize component state with URL query parameters, enabling bookmarking and shareable views (e.g., alerts-page-sk, explore-simple-sk, triage-page-sk).
    • Lit contexts (@lit/context) are used for providing shared data down the component tree without prop drilling, notably in dataframe/dataframe_context.ts for DataFrame objects.
  • Event-Driven Communication: Components often communicate using custom DOM events. Child components emit events, and parent components listen and react to them (e.g., query-sk emits query-change, triage-status-sk emits start-triage).
  • Asynchronous Operations: fetch API is used for backend communication. Promises and async/await are standard for handling these asynchronous operations. Spinners (spinner-sk) provide user feedback during loading.
  • Modularity and Dependencies: Modules are designed to be relatively self-contained, with clear dependencies declared in BUILD.bazel files. This allows for better organization and easier maintenance.
  • Testing: Each module typically has associated demo pages (*-demo.html, *-demo.ts) for isolated development and visual testing, Karma unit tests (*_test.ts), and Puppeteer end-to-end/screenshot tests (*_puppeteer_test.ts). fetch-mock is extensively used in demos and tests to simulate backend responses.

This comprehensive set of modules forms a rich ecosystem for building and maintaining the Perf application's frontend, with a strong emphasis on modern web development practices and reusability.

Module: /modules/alert

Alert Module Documentation

Overview

The alert module is responsible for validating the configuration of alerts within the Perf system. Its primary function is to ensure that alert definitions adhere to a set of predefined rules, guaranteeing their proper functioning and preventing errors. This module plays a crucial role in maintaining the reliability of the alerting system by catching invalid configurations before they are deployed.

Design Decisions and Implementation Choices

The core design principle behind this module is simplicity and focused responsibility. Instead of incorporating complex validation logic directly into other parts of the system (like the UI or backend services that handle alert creation/modification), this module provides a dedicated, reusable validation function. This promotes modularity and makes the validation logic easier to maintain and update.

The choice of using a simple function (validate) that returns a string (empty for valid, error message for invalid) is intentional. This approach is straightforward to understand and integrate into various parts of the application. It avoids throwing exceptions for validation failures, which can sometimes complicate control flow, and instead provides clear, human-readable feedback.

The current validation is intentionally minimal, focusing on the essential requirement of a non-empty query. This is a pragmatic approach, starting with the most critical validation and allowing for the addition of more complex rules as the system evolves. The dependency on //perf/modules/json:index_ts_lib indicates that the structure of an Alert is defined externally, and this module consumes that definition.

Key Components and Responsibilities

  • index.ts: This is the central file of the module.
    • Responsibility: It houses the primary validation logic for Alert configurations.
    • validate(alert: Alert): string function:
    • Purpose: This function is the public API of the module. It takes an Alert object (as defined in the ../json module) as input.
    • How it works: It performs a series of checks on the properties of the alert object. Currently, it verifies that the query property of the Alert is present and not an empty string.
    • Output: If all checks pass, it returns an empty string, signifying that the Alert configuration is valid. If any check fails, it returns a string containing a descriptive error message indicating why the Alert is considered invalid. This message is intended to be user-friendly and help in correcting the configuration.

Key Workflows

Alert Validation Workflow:

External System (e.g., UI, API)  -- Passes Alert object --> [alert/index.ts: validate()]
                                                                    |
                                                                    V
                                                       [ Is alert.query non-empty? ]
                                                                    |
                                         +--------------------------+--------------------------+
                                         | (Yes)                                            | (No)
                                         V                                                  V
                                [ Returns "" (empty string) ]           [ Returns "An alert must have a non-empty query." ]
                                         |                                                  |
                                         V                                                  V
External System <-- Receives validation result -- [ Interprets result (valid/invalid) ]

This workflow illustrates how an external system would interact with the validate function. The external system provides an Alert object, and the validate function returns a string. The external system then uses this string to determine if the alert configuration is valid and can proceed accordingly (e.g., save the alert, display an error to the user).

Module: /modules/alert-config-sk

The alert-config-sk module provides a custom HTML element, <alert-config-sk>, designed for creating and editing alert configurations within the Perf application. This element serves as a user interface for defining the conditions under which an alert should be triggered, how regressions are detected, and where notifications should be sent.

Core Functionality and Design:

The primary goal of alert-config-sk is to offer a comprehensive yet user-friendly way to manage alert settings. It encapsulates all the necessary input fields and logic for defining an Alert object, which is a central data structure in Perf for representing alert configurations.

Key design considerations include:

  • Modularity and Reusability: By packaging the alert configuration UI as a custom element, it can be easily integrated into various parts of the Perf application where alert management is needed.
  • Dynamic UI based on Context: The UI adapts based on global settings (e.g., window.perf.notifications, window.perf.display_group_by, window.perf.need_alert_action). This allows the same component to present different options depending on the specific Perf instance‘s configuration or the user’s context. For example, the notification options (email vs. issue tracker) and the visibility of “Group By” settings can change.
  • Data Binding and Reactivity: The element uses Lit library for templating and reactivity. Changes in the input fields directly update the internal _config object, and changes to the element's properties (like config, paramset) trigger re-renders.
  • Integration with other Perf modules: It leverages other custom elements like query-chooser-sk for selecting traces, algo-select-sk for choosing clustering algorithms, and various elements-sk components (e.g., select-sk, multi-select-sk, checkbox-sk) for standard UI inputs. This promotes consistency and reduces redundant code.
  • User Feedback and Validation: The component provides immediate feedback, such as displaying different threshold units based on the selected step detection algorithm and validating input for fields like the Issue Tracker Component ID. It also includes “Test” buttons to verify alert notification and bug template configurations.

Key Components and Files:

  • alert-config-sk.ts: This is the heart of the module, defining the AlertConfigSk class which extends ElementSk.
    • Properties:
    • config: An Alert object representing the current alert configuration being edited. This is the primary data model for the component.
    • paramset: A ParamSet object providing the available parameters and their values for constructing queries (used by query-chooser-sk).
    • key_order: An array of strings dictating the preferred order of keys in the query-chooser-sk.
    • Templating (template static method): Uses lit-html to define the structure and content of the element. It dynamically renders sections based on the current configuration and global settings (e.g., window.perf.notifications).
    • Event Handling: Listens to events from child components (e.g., query-change from query-chooser-sk, selection-changed from select-sk) to update the _config object.
    • Logic for Dynamic UI:
    • The thresholdDescriptors object maps step detection algorithms to their corresponding units and descriptive labels, ensuring the “Threshold” input field is always relevant.
    • Conditional rendering (e.g., using ? operator in lit-html or if statements in helper functions like _groupBy) is used to show/hide UI elements based on window.perf flags.
    • API Interaction:
    • testBugTemplate(): Sends a POST request to /_/alert/bug/try to test the configured bug URI template.
    • testAlert(): Sends a POST request to /_/alert/notify/try to test the alert notification setup.
    • Helper Functions:
    • toDirection(), toConfigState(): Convert string values from UI selections to the appropriate enum types for the Alert object.
    • indexFromStep(): Determines the correct selection index for the “Step Detection” dropdown based on the current _config.step value.
  • alert-config-sk.scss: Contains the SASS styles for the element, ensuring a consistent look and feel within the Perf application. It imports styles from themes_sass_lib and buttons_sass_lib for theming and button styling.
  • alert-config-sk-demo.html and alert-config-sk-demo.ts: Provide a demonstration page for the alert-config-sk element.
    • The HTML sets up a basic page structure with an instance of alert-config-sk and buttons to manipulate global window.perf settings, allowing developers to test different UI states of the component.
    • The TypeScript file initializes the demo, sets up mock paramset and config data, and provides event listeners for the control buttons to refresh the alert-config-sk component and display its current state. This is crucial for development and testing.
  • alert-config-sk_puppeteer_test.ts: Contains Puppeteer tests for the component. These tests verify that the component renders correctly in different states (e.g., with/without group_by, different notification options) by interacting with the demo page and taking screenshots.
  • index.ts: A simple entry point that imports and thereby registers the alert-config-sk custom element, making it available for use in HTML.

Workflow Example: Editing an Alert

  1. Initialization:

    • An instance of alert-config-sk is added to the DOM.
    • The paramset property is set, providing the available trace parameters.
    • The config property is set with the Alert object to be edited (or a default new configuration).
    • Global window.perf settings influence which UI sections are initially visible.
  2. User Interaction:

    • The user modifies various fields: Display Name, Category, Query (via query-chooser-sk), Grouping (via algo-select-sk), Step Detection, Threshold, etc.
    • As the user changes a field (e.g., selects a new “Step Detection” algorithm from the select-sk):
      • An event is dispatched by the child component (e.g., selection-changed).
      • alert-config-sk listens for this event.
      • The event handler in alert-config-sk.ts updates the corresponding property in its internal _config object (e.g., this._config.step = newStepValue).
      • The component re-renders (managed by Lit) to reflect the change. For instance, if the “Step Detection” changes, the “Threshold” label and units dynamically update.
    User interacts with <select-sk id="step">
        |
        V
    <select-sk> emits 'selection-changed' event
        |
        V
    AlertConfigSk.stepSelectionChanged(event) is called
        |
        V
    this._config.step is updated
        |
        V
    this._render() is (indirectly) called by Lit
        |
        V
    UI updates, e.g., label for "Threshold" input changes
    
  3. Testing Configuration (Optional):

    • User clicks “Test” for bug template:
      • AlertConfigSk.testBugTemplate() is called.
      • A POST request is made to /_/alert/bug/try.
      • The response (a URL to the bug) is opened in a new tab, or an error is shown.
    • User clicks “Test” for alert notification:
      • AlertConfigSk.testAlert() is called.
      • A POST request is made to /_/alert/notify/try.
      • A success/error message is displayed.
  4. Saving Changes:

    • The parent component or application logic that hosts alert-config-sk is responsible for retrieving the updated config object from the alert-config-sk element (e.g., element.config) and persisting it (e.g., by sending it to a backend API). alert-config-sk itself does not handle the saving of the configuration to a persistent store.

This element aims to simplify the complex task of configuring alerts by providing a structured and reactive interface, abstracting away the direct manipulation of the underlying Alert JSON object for the end-user.

Module: /modules/alerts-page-sk

alerts-page-sk Module Documentation

High-Level Overview

The alerts-page-sk module provides a user interface for managing and configuring alerts within the Perf application. Users can view, create, edit, and delete alert configurations. The page displays existing alerts in a table and provides a dialog for detailed configuration of individual alerts. It interacts with a backend API to fetch and persist alert data.

Design Decisions and Implementation Choices

Why a dedicated page for alerts? Centralizing alert management provides a clear and focused interface for users responsible for monitoring performance metrics. This separation of concerns simplifies the overall application structure and user experience.

How are alerts displayed and managed? Alerts are displayed in a tabular format, offering a quick overview of key information like name, query, owner, and status. Icons are used for common actions like editing and deleting, enhancing usability. A modal dialog, utilizing the <dialog> HTML element and the alert-config-sk component, is employed for focused editing of individual alert configurations. This approach avoids cluttering the main page and provides a dedicated space for detailed settings.

Why use Lit for templating? Lit is used for its efficient rendering and component-based architecture. This allows for a declarative way to define the UI and manage its state, making the code more maintainable and easier to understand. The use of html tagged template literals provides a clean and JavaScript-native way to write templates.

How is user authorization handled? The page checks if the logged-in user has an ‘editor’ role. This is determined by fetching the user‘s status from /_/login/status. Editing and creation functionalities are disabled if the user lacks the necessary permissions, preventing unauthorized modifications. The logged-in user’s email is also pre-filled as the owner for new alerts.

Why is fetch-mock used in the demo? fetch-mock is utilized in the demo (alerts-page-sk-demo.ts) to simulate backend API responses. This allows for isolated testing and development of the frontend component without requiring a running backend. It enables developers to define expected responses for various API endpoints, facilitating a predictable environment for UI development and testing.

How are API interactions handled? The component uses the fetch API to communicate with the backend. Helper functions like jsonOrThrow and okOrThrow are used to simplify response handling and error management. Specific endpoints are used for listing (/_/alert/list/...), creating (/_/alert/new), updating (/_/alert/update), and deleting (/_/alert/delete/...) alerts.

Why distinguish between “Alert” and “Component” in the UI? The UI adapts to display either an “Alert” field or an “Issue Tracker Component” field based on the window.perf.notifications global setting. This allows the application to integrate with different notification systems. If markdown_issuetracker is configured, it links directly to the relevant issue tracker component.

Responsibilities and Key Components/Files

  • alerts-page-sk.ts: This is the core TypeScript file defining the AlertsPageSk custom element.

    • Responsibilities:
    • Fetching and displaying a list of alert configurations.
    • Providing functionality to create new alerts.
    • Enabling editing of existing alerts through a modal dialog.
    • Allowing deletion of alerts.
    • Handling user authorization for edit/create operations.
    • Managing the state of the “show deleted alerts” checkbox.
    • Interacting with the backend API for all alert-related operations.
    • Rendering the UI using Lit templates.
    • Key Methods:
    • connectedCallback(): Initializes the component by fetching initial data (paramset and alert list).
    • list(): Fetches and re-renders the list of alerts.
    • add(): Initiates the creation of a new alert by fetching a default configuration from the server and opening the edit dialog.
    • edit(): Opens the edit dialog for an existing alert.
    • accept(): Handles the submission of changes from the edit dialog, sending an update request to the server.
    • delete(): Sends a request to the server to delete an alert.
    • openOnLoad(): Checks the URL for an alert ID on page load and, if present, opens the edit dialog for that specific alert. This allows for direct linking to an alert's configuration.
    • Key Properties:
    • alerts: An array holding the currently displayed alert configurations.
    • _cfg: The Alert object currently being edited in the dialog.
    • isEditor: A boolean indicating if the current user has editing privileges.
    • dialog: A reference to the HTML <dialog> element used for editing.
    • alertconfig: A reference to the alert-config-sk element within the dialog.
  • alerts-page-sk.scss: Contains the SASS/CSS styles for the alerts-page-sk element.

    • Responsibilities: Defines the visual appearance of the alerts table, buttons, dialog, and other UI elements within the page. It ensures a consistent look and feel, including theming (dark mode).
  • alerts-page-sk-demo.ts: Provides a demonstration and development environment for the alerts-page-sk component.

    • Responsibilities:
    • Sets up fetch-mock to simulate backend API responses for /login/status, /_/count/, /_/alert/update, /_/alert/list/..., /_/initpage/, and /_/alert/new. This allows the component to be developed and tested in isolation.
    • Initializes global window.perf properties that might affect the component's behavior (e.g., key_order, display_group_by, notifications).
    • Dynamically inserts alerts-page-sk elements into the demo HTML page.
  • alerts-page-sk-demo.html: The HTML structure for the demo page.

    • Responsibilities: Provides the basic HTML layout where the alerts-page-sk component is rendered for demonstration purposes. Includes an <error-toast-sk> for displaying error messages.
  • alerts-page-sk_puppeteer_test.ts: Contains Puppeteer tests for the alerts-page-sk component.

    • Responsibilities: Performs automated UI testing, ensuring the component renders correctly and basic interactions function as expected. It takes screenshots for visual regression testing.
  • index.ts: A simple entry point that imports and thereby registers the alerts-page-sk custom element.

Key Workflows

1. Viewing Alerts:

User navigates to the alerts page
  |
  V
alerts-page-sk.connectedCallback()
  |
  +----------------------+
  |                      |
  V                      V
fetch('/_/initpage/')  fetch('/_/alert/list/false')  // Fetch paramset and initial alert list
  |                      |
  V                      V
Update `paramset`      Update `alerts` array
  |                      |
  +----------------------+
  |
  V
_render() // Lit renders the table with alerts

2. Creating a New Alert:

User clicks "New" button (if isEditor === true)
  |
  V
alerts-page-sk.add()
  |
  V
fetch('/_/alert/new') // Get a template for a new alert
  |
  V
Update `cfg` with the new alert template (owner set to current user)
  |
  V
dialog.showModal() // Show the alert-config-sk dialog
  |
  V
User fills in alert details in alert-config-sk
  |
  V
User clicks "Accept"
  |
  V
alerts-page-sk.accept()
  |
  V
cfg = alertconfig.config // Get updated config from alert-config-sk
  |
  V
fetch('/_/alert/update', { method: 'POST', body: JSON.stringify(cfg) }) // Send new alert to backend
  |
  V
alerts-page-sk.list() // Refresh the alert list

3. Editing an Existing Alert:

User clicks "Edit" icon next to an alert (if isEditor === true)
  |
  V
alerts-page-sk.edit() with the selected alert's data
  |
  V
Set `origCfg` (deep copy of current `cfg`)
Set `cfg` to the selected alert's data
  |
  V
dialog.showModal() // Show the alert-config-sk dialog pre-filled with alert data
  |
  V
User modifies alert details in alert-config-sk
  |
  V
User clicks "Accept"
  |
  V
alerts-page-sk.accept()
  |
  V
cfg = alertconfig.config // Get updated config
  |
  V
IF JSON.stringify(cfg) !== JSON.stringify(origCfg) THEN
  fetch('/_/alert/update', { method: 'POST', body: JSON.stringify(cfg) }) // Send updated alert
  |
  V
  alerts-page-sk.list() // Refresh list
ENDIF

4. Deleting an Alert:

User clicks "Delete" icon next to an alert (if isEditor === true)
  |
  V
alerts-page-sk.delete() with the selected alert's ID
  |
  V
fetch('/_/alert/delete/{alert_id}', { method: 'POST' }) // Send delete request
  |
  V
alerts-page-sk.list() // Refresh the alert list

5. Toggling “Show Deleted Configs”:

User clicks "Show deleted configs" checkbox
  |
  V
alerts-page-sk.showChanged()
  |
  V
Update `showDeleted` property based on checkbox state
  |
  V
alerts-page-sk.list() // Fetches alerts based on the new `showDeleted` state

Module: /modules/algo-select-sk

Algo Select SK Module

The algo-select-sk module provides a custom HTML element that allows users to select a clustering algorithm. This component is crucial for applications where different clustering approaches might yield better results depending on the data or the analytical goal.

High-Level Overview

The core purpose of this module is to present a user-friendly way to switch between available clustering algorithms, specifically “k-means” and “stepfit”. It encapsulates the selection logic and emits an event when the chosen algorithm changes, allowing other parts of the application to react accordingly.

Design and Implementation

The “why” behind this module is the need for a standardized and reusable UI component for algorithm selection. Instead of each part of an application implementing its own dropdown or radio buttons for algorithm choice, algo-select-sk provides a consistent look and feel.

The “how” involves leveraging the select-sk custom element from the elements-sk library to provide the actual dropdown functionality. algo-select-sk builds upon this by:

  1. Defining specific algorithm options: It hardcodes “k-means” and “stepfit” as the available choices, along with descriptive tooltips.
  2. Managing state: It uses an algo attribute (and corresponding property) to store and reflect the currently selected algorithm.
  3. Emitting a custom event: When the selection changes, it dispatches an algo-change event with the new algorithm in the detail object. This decoupling allows other components to listen for changes without direct dependencies on algo-select-sk.

The choice to use select-sk as a base provides a consistent styling and behavior aligned with other elements in the Skia infrastructure.

Responsibilities and Key Components

  • algo-select-sk.ts: This is the heart of the module.
    • AlgoSelectSk class: This ElementSk subclass defines the custom element's behavior.
    • template: Uses lit-html to render the underlying select-sk element with predefined div elements representing the algorithm options (“K-Means” and “Individual” which maps to “stepfit”). The selected attribute on these divs is dynamically updated based on the current algo property.
    • connectedCallback and attributeChangedCallback: Ensure the element renders correctly when added to the DOM or when its algo attribute is changed programmatically.
    • _selectionChanged method: This is the event handler for the selection-changed event from the inner select-sk element. When triggered, it updates the algo property of algo-select-sk and then dispatches the algo-change custom event. This is the primary mechanism for communicating the selected algorithm to the outside world. User interacts with <select-sk> | V <select-sk> emits 'selection-changed' event | V AlgoSelectSk._selectionChanged() is called | V Updates internal 'algo' property | V Dispatches 'algo-change' event with { algo: "new_value" }
    • algo getter/setter: Provides a programmatic way to get and set the selected algorithm. The setter ensures that only valid algorithm values (‘kmeans’ or ‘stepfit’) are set, defaulting to ‘kmeans’ for invalid inputs. This adds a layer of robustness.
    • toClusterAlgo function: A utility function to validate and normalize the input string to one of the allowed ClusterAlgo types. This prevents invalid algorithm names from being propagated.
    • AlgoSelectAlgoChangeEventDetail interface: Defines the structure of the detail object for the algo-change event, ensuring type safety for event consumers.
  • algo-select-sk.scss: Provides minimal styling, primarily ensuring that the cursor is a pointer when hovering over the element, indicating interactivity. It imports shared color and theme styles.
  • index.ts: A simple entry point that imports algo-select-sk.ts, ensuring the custom element is defined and available for use when the module is imported.
  • algo-select-sk-demo.html and algo-select-sk-demo.ts: These files provide a demonstration page for the algo-select-sk element.
    • The HTML sets up a few instances of algo-select-sk, including one with a pre-selected algorithm and one in dark mode, to showcase its appearance.
    • The TypeScript for the demo listens to the algo-change event from one of the instances and displays the event detail in a <pre> tag. This serves as a live example of how to consume the event.
  • algo-select-sk_puppeteer_test.ts: Contains Puppeteer tests to verify the component renders correctly and basic functionality. It checks for the presence of the elements on the demo page and takes a screenshot for visual regression testing.

The component is designed to be self-contained and easy to integrate. By simply including the element in HTML and listening for the algo-change event, developers can incorporate algorithm selection functionality into their applications.

Module: /modules/anomalies-table-sk

Anomalies Table (anomalies-table-sk)

The anomalies-table-sk module provides a custom HTML element for displaying a sortable and interactive table of performance anomalies. Its primary purpose is to present anomaly data in a clear, actionable format, allowing users to quickly identify, group, triage, and investigate performance regressions or improvements.

Key Responsibilities:

  • Displaying Anomalies: Renders a list of Anomaly objects in a tabular format. Each row represents an anomaly and displays key information such as bug ID, revision range, test path, and metrics like delta percentage and absolute delta.
  • Grouping Anomalies: Automatically groups anomalies that share overlapping revision ranges. This helps users identify related issues or multiple manifestations of the same underlying problem. Groups can be expanded or collapsed for better readability.
  • User Interaction:
    • Sorting: Allows users to sort the table by various columns (e.g., Bug ID, Revisions, Test, Delta %).
    • Selection: Users can select individual anomalies or entire groups of anomalies using checkboxes.
    • Bulk Actions: Provides “Triage” and “Graph” buttons that operate on the currently selected anomalies.
  • Triage Integration: Integrates with triage-menu-sk to allow users to assign bug IDs, mark anomalies as invalid or ignored, or reset their triage state.
  • Navigation and Investigation:
    • Provides links to individual anomaly reports (e.g., /u/?anomalyIDs=...).
    • Generates links to view graphs of selected anomalies in the multi-graph explorer (/m/...).
    • Links bug IDs to the configured bug tracking system (e.g., /u/?bugID=...).
    • Allows unassociating a bug ID from an anomaly.

Design and Implementation Choices:

  • LitElement for Web Component: The component is built using LitElement, a lightweight library for creating Web Components. This promotes encapsulation, reusability, and interoperability with other web technologies.
  • Client-Side Grouping: Anomaly grouping based on revision range intersection is performed client-side. This simplifies the backend and provides immediate feedback to the user as they interact with the table. The groupAnomalies method iterates through the anomaly list, merging anomalies into existing groups if their revision ranges intersect, or creating new groups otherwise.
  • Client-Side Sorting: Sorting is handled by the sort-sk element, which observes changes to data attributes on the table rows. This avoids server roundtrips for simple sorting operations.
  • Selective Rendering: The table is re-rendered (using this._render()) only when necessary, such as when data changes, groups are expanded/collapsed, or selections are updated. This improves performance.
  • AnomalyGroup Class: A simple AnomalyGroup class is used to manage collections of related anomalies and their expanded state. This provides a clear structure for handling grouped data.
  • Popup for Triage: The triage menu is presented in a popup to save screen real estate and provide a focused interface for triage actions. The popup's visibility is controlled by the showPopup boolean property.
  • Event-Driven Communication: The component emits a custom event anomalies_checked when the selection state of an anomaly changes. This allows parent components or other parts of the application to react to user selections.
  • API Integration for Graphing and Reporting:
    • When graphing multiple anomalies, it first calls the /_anomalies/group_report backend API. This API is designed to provide a consolidated view or a shared identifier (sid) for a group of anomalies, which is then used to construct the graph URL. This is preferred over constructing potentially very long URLs with many individual anomaly IDs.
    • For single anomaly graphing, it fetches additional time range information via the same group_report API to provide context (one week before and after the anomaly) in the graph.
  • Trace Formatting: Uses ChromeTraceFormatter to correctly format trace queries for linking to the graph explorer.
  • Styling: SCSS is used for styling, importing shared styles from themes_sass_lib, buttons_sass_lib, and select_sass_lib for a consistent look and feel. Specific styles handle the appearance of regression vs. improvement, expanded rows, and the triage popup.

Key Files:

  • anomalies-table-sk.ts: This is the core file containing the LitElement class definition for AnomaliesTableSk. It implements all the logic for rendering the table, handling user interactions, grouping anomalies, and interacting with backend services for triage and graphing.
    • populateTable(anomalyList: Anomaly[]): The primary method to load data into the table. It triggers grouping and rendering.
    • generateTable(), generateGroups(), generateRows(): Template methods responsible for constructing the HTML structure of the table using lit-html.
    • groupAnomalies(): Implements the logic for grouping anomalies based on overlapping revision ranges.
    • openReport(): Handles the logic for generating a URL to graph the selected anomalies, potentially calling the /_anomalies/group_report API.
    • togglePopup(): Manages the visibility of the triage menu popup.
    • anomalyChecked(): Handles checkbox state changes and updates the checkedAnomaliesSet.
    • openMultiGraphUrl(): Constructs the URL for viewing an anomaly's trend in the multi-graph explorer, fetching time range context via an API call.
  • anomalies-table-sk.scss: Contains the SCSS styles specific to the anomalies table, defining its layout, appearance, and the styling for different states (e.g., improvement, regression, expanded rows).
  • index.ts: A simple entry point that imports and registers the anomalies-table-sk custom element.
  • anomalies-table-sk-demo.ts and anomalies-table-sk-demo.html: Provide a demonstration page for the component, showcasing its usage with sample data and interactive buttons to populate the table and retrieve checked anomalies. The demo also sets up a global window.perf object with configuration typically provided by the Perf application environment.

Workflows:

1. Displaying and Grouping Anomalies:

[User Action: Page Load with Anomaly Data]
  |
  v
AnomaliesTableSk.populateTable(anomalyList)
  |
  v
AnomaliesTableSk.groupAnomalies()
  |-> For each Anomaly in anomalyList:
  |     |-> Try to merge with existing AnomalyGroup (if revision ranges intersect)
  |     |-> Else, create new AnomalyGroup
  |
  v
AnomaliesTableSk._render()
  |
  v
[DOM Update: Table is rendered with grouped anomalies, groups initially collapsed]

2. Selecting and Triaging Anomalies:

[User Action: Clicks checkbox for an anomaly or group]
  |
  v
AnomaliesTableSk.anomalyChecked() or AnomalySk.toggleChildrenCheckboxes()
  |-> Updates `checkedAnomaliesSet`
  |-> Updates header checkbox state if needed
  |-> Emits 'anomalies_checked' event
  |-> Enables/Disables "Triage" and "Graph" buttons based on selection
  |
  v
[User Action: Clicks "Triage" button (if enabled)]
  |
  v
AnomaliesTableSk.togglePopup()
  |-> Shows TriageMenuSk popup
  |-> TriageMenuSk.setAnomalies(checkedAnomalies)
  |
  v
[User interacts with TriageMenuSk (e.g., assigns bug, marks invalid)]
  |
  v
TriageMenuSk makes API request (e.g., to /_/triage)
  |
  v
[Application reloads data or updates table based on triage result]

3. Graphing Selected Anomalies:

[User Action: Selects one or more anomalies]
  |
  v
[User Action: Clicks "Graph" button (if enabled)]
  |
  v
AnomaliesTableSk.openReport()
  |
  |--> If single anomaly selected:
  |     |-> window.open(`/u/?anomalyIDs={id}`, '_blank')
  |
  |--> If multiple anomalies selected:
        |-> Call fetchGroupReportApi(idString)
        |    |-> POST to /_/anomalies/group_report with anomaly IDs
        |    |-> Receives response with `sid` (shared ID)
        |
        |-> window.open(`/u/?sid={sid}`, '_blank')

4. Expanding/Collapsing an Anomaly Group:

[User Action: Clicks expand/collapse button on a group row]
  |
  v
AnomaliesTableSk.expandGroup(anomalyGroup)
  |-> Toggles `anomalyGroup.expanded` boolean
  |
  v
AnomaliesTableSk._render()
  |
  v
[DOM Update: Rows within the group are shown or hidden]

Module: /modules/anomaly-sk

The anomaly-sk module provides a custom HTML element <anomaly-sk> and related functionalities for displaying details about performance anomalies. It's designed to present information about a specific anomaly, including its severity, the affected revision range, and a link to the associated bug report. A key utility function, getAnomalyDataMap, is also provided to process raw anomaly data into a format suitable for plotting.

Key Responsibilities and Components:

  • anomaly-sk.ts: This is the core file defining the <anomaly-sk> custom element.

    • Why: To encapsulate the logic and presentation of individual anomaly details in a reusable web component. This promotes modularity and makes it easy to integrate anomaly information into various parts of the Perf application.
    • How: It extends ElementSk and uses the lit-html library for templating. It accepts an Anomaly object as a property and dynamically renders a table displaying information like the score before and after the anomaly, percentage change, revision range, improvement status, and bug ID.
    • It fetches commit details (hashes) using the lookupCids function from the cid module to construct a clickable link to the commit range.
    • It formats numbers and percentages for better readability.
    • It handles different bug ID states (e.g., 0 for no bug, -1 for invalid alert, -2 for ignored alert) by displaying appropriate text or a link to the bug tracking system. The bug_host_url property allows customization of the bug tracker URL.
    • The formatRevisionRange method asynchronously fetches commit hashes for the start and end revisions of the anomaly to create a link to the commit range view. If window.perf.commit_range_url is not defined, it simply displays the revision numbers.
  • getAnomalyDataMap (function in anomaly-sk.ts):

    • Why: To transform raw trace data and anomaly information into a structured format that can be easily consumed by plotting components like plot-simple-sk. This function bridges the gap between the raw data representation and the visual representation of anomalies on a graph.
    • How: It takes a TraceSet (a collection of traces), ColumnHeader[] (representing commit points on the x-axis), an AnomalyMap (mapping trace IDs and commit IDs to Anomaly objects), and a list of highlight_anomalies IDs.
    • It iterates through each trace in the TraceSet. If a trace has anomalies listed in the AnomalyMap, it then iterates through those anomalies.
    • For each anomaly, it finds the corresponding x-coordinate by matching the anomaly‘s commit ID (cid) with the offset in the ColumnHeader. A crucial detail is that if an exact commit ID match isn’t found in the header (e.g., due to a data upload failure for that specific commit), it will associate the anomaly with the next available commit point. This ensures that anomalies are still visualized even if their precise commit data point is missing, rather than being omitted entirely.
    • The y-coordinate is taken directly from the trace data at that x-coordinate.
    • It determines if an anomaly should be highlighted based on the highlight_anomalies input.
    • The output is an object where keys are trace IDs and values are arrays of AnomalyData objects, each containing the x, y coordinates, the Anomaly object itself, and a highlight flag.
    Input:
      TraceSet: { "traceA": [10, 12, 15*], ... } (*value at commit 101)
      Header: [ {offset: 99}, {offset: 100}, {offset: 101} ]
      AnomalyMap: { "traceA": { "101": AnomalyObjectA } }
      HighlightList: []
    
    getAnomalyDataMap
          |
          V
    Output:
      {
        "traceA": [
          { x: 2, y: 15, anomaly: AnomalyObjectA, highlight: false }
        ],
        ...
      }
    
  • anomaly-sk.scss: This file contains the SCSS styles for the <anomaly-sk> element.

    • Why: To provide a consistent visual appearance for the anomaly details table, aligning with the overall theme of the application (themes_sass_lib).
    • How: It defines basic table styling, such as text alignment and padding for th and td elements within the anomaly-sk component.
  • anomaly-sk-demo.html and anomaly-sk-demo.ts: These files set up a demonstration page for the <anomaly-sk> element.

    • Why: To provide a sandbox environment for developers to see the component in action with various inputs and to facilitate isolated testing and development.
    • How: anomaly-sk-demo.html includes instances of <anomaly-sk> with different IDs. anomaly-sk-demo.ts initializes these components with sample Anomaly data. It also mocks the /_/cid/ API endpoint using fetch-mock to simulate responses for commit detail lookups, which is crucial for the formatRevisionRange functionality to work in the demo. Global window.perf configurations are also set up, as the component relies on them (e.g., commit_range_url).
  • Test Files (anomaly-sk_test.ts, anomaly-sk_puppeteer_test.ts):

    • Why: To ensure the correctness and reliability of the component's logic and rendering.
    • anomaly-sk_test.ts: Contains unit tests for the getAnomalyDataMap function (verifying its mapping logic, especially the handling of missing commit points) and for static utility methods within AnomalySk like formatPercentage and the asynchronous formatRevisionRange. It uses fetch-mock to control API responses for CID lookups.
    • anomaly-sk_puppeteer_test.ts: Contains browser-based integration tests using Puppeteer. It verifies that the demo page renders correctly and takes screenshots for visual regression testing.

Workflow for Displaying an Anomaly:

  1. An Anomaly object is passed to the anomaly property of the <anomaly-sk> element. <anomaly-sk .anomaly=${someAnomalyObject}></anomaly-sk>
  2. The set anomaly() setter in AnomalySk is triggered.
  3. It calls this.formatRevisionRange() to asynchronously prepare the revision range display.
    • formatRevisionRange extracts start_revision and end_revision.
    • It calls lookupCids([start_rev_num, end_rev_num]) which makes a POST request to /_/cid/.
    • The response provides commit hashes.
    • If window.perf.commit_range_url is set, it constructs an <a> tag with the URL populated with the fetched hashes. Otherwise, it just formats the revision numbers as text.
    • The resulting TemplateResult is stored in this._revision.
  4. this._render() is called, which re-renders the component's template.
  5. The template (AnomalySk.template) displays the table:
    • Score, Prior Score, Percent Change (calculated using getPercentChange).
    • Revision Range (using the this.revision template generated in step 3).
    • Improvement status.
    • Bug ID (formatted using AnomalySk.formatBug, potentially linking to this.bugHostUrl).

This module effectively isolates the presentation and data transformation logic related to individual anomalies, making it a maintainable and reusable piece of the Perf frontend. The handling of potentially missing data points in getAnomalyDataMap shows a robust design choice for dealing with real-world data imperfections.

Module: /modules/bisect-dialog-sk

Bisect Dialog (bisect-dialog-sk)

The bisect-dialog-sk module provides a user interface element for initiating a bisection process within the Perf application. This is specifically designed to help pinpoint the commit that introduced a performance regression or improvement, primarily for Chrome.

Core Responsibility

The primary responsibility of this module is to present a dialog to the user, pre-filled with relevant information extracted from a chart tooltip (e.g., when a user identifies an anomaly in a performance graph). It allows the user to confirm or modify these parameters and then submit a request to the backend to start a bisection task.

Why a Dedicated Dialog?

Performance analysis often involves identifying the exact change that caused a shift in metrics. A manual bisection process can be tedious and error-prone. This dialog streamlines this by:

  1. Pre-filling Data: It leverages context from the chart (like the test path and commit range) to pre-populate the necessary fields, reducing manual data entry and potential mistakes.
  2. Structured Input: It provides a clear form for all required parameters for a bisection request, ensuring that the backend receives all necessary information.
  3. User Authentication Awareness: It integrates with the alogin-sk module to fetch the logged-in user's email, which is a required parameter for the bisect request.
  4. Feedback Mechanism: It provides visual feedback to the user during the submission process (e.g., a spinner) and communicates success or failure via toast messages.

How it Works

  1. Initialization and Pre-filling:

    • The dialog is typically instantiated and hidden until needed.
    • When a user triggers a bisection (e.g., from a chart tooltip), the setBisectInputParams method is called with details like the testPath, startCommit, endCommit, bugId, story, and anomalyId.
    • These parameters are used to populate the input fields within the dialog's form.
  2. User Interaction and Submission:

    • The open() method displays the modal dialog.
    • The user can review and, if necessary, modify the pre-filled values (e.g., adjust the commit range or add a bug ID). They can also provide an optional patch to be applied during the bisection.
    • Upon clicking the “Bisect” button, the postBisect method is invoked.
  3. Request Construction and API Call:

    • postBisect gathers the current values from the form fields.
    • It parses the testPath to extract components like the benchmark, chart, and statistic. The logic for deriving chart and statistic involves checking the last part of the test name against a predefined list of STATISTIC_VALUES (e.g., “avg”, “count”).
    • A CreateBisectRequest object is constructed with all the necessary parameters.
    • A fetch call is made to the /_/bisect/create endpoint with the JSON payload.
  4. Response Handling:

    • If the request is successful, a success message is typically displayed (often as a toast by the calling context, as this dialog focuses on the submission itself), and the dialog closes.
    • If the request fails, an error message is displayed using errorMessage, and the dialog remains open, allowing the user to correct any issues or retry.

Simplified Bisect Request Workflow:

User Clicks Bisect Trigger (e.g., on chart)
      |
      V
Calling Code prepares `BisectPreloadParams`
      |
      V
`bisect-dialog-sk.setBisectInputParams(params)`
      |
      V
`bisect-dialog-sk.open()`
      |
      V
Dialog is Displayed (pre-filled)
      |
      V
User reviews/modifies data & Clicks "Bisect"
      |
      V
`bisect-dialog-sk.postBisect()`
      |
      V
`testPath` is parsed (extract benchmark, chart, statistic)
      |
      V
`CreateBisectRequest` object is built
      |
      V
`fetch POST /_/bisect/create` with request data
      |
      V
Handle API Response:
  - Success -> Close dialog, Show success notification (external)
  - Error   -> Show error message, Keep dialog open

Key Components and Files

  • bisect-dialog-sk.ts: This is the core TypeScript file defining the BisectDialogSk custom element.

    • BisectDialogSk class: Extends ElementSk and manages the dialog's state, rendering, and interaction logic.
    • BisectPreloadParams interface: Defines the structure of the initial data passed to the dialog.
    • template: A lit-html template defining the dialog's HTML structure, including input fields for test path, bug ID, start/end commits, story, and an optional patch. It also includes a close icon, a spinner for loading states, and submit/close buttons.
    • connectedCallback(): Initializes the element, sets up property upgrades, queries for DOM elements (dialog, form, spinner, button), and attaches an event listener to the form‘s submit event. It also fetches the logged-in user’s status.
    • setBisectInputParams(): Populates the internal state and input fields with data provided externally.
    • open(): Shows the modal dialog and ensures the submit button is enabled.
    • closeBisectDialog(): Closes the dialog.
    • postBisect(): This is the heart of the submission logic. It:
      • Activates the spinner and disables the submit button.
      • Parses the testPath to extract various components required for the bisect request (like benchmark, chart, story, statistic). The logic for chart and statistic derivation is particularly important here.
      • Constructs the CreateBisectRequest payload.
      • Makes a POST request to the /_/bisect/create endpoint.
      • Handles the response, either closing the dialog on success or displaying an error message on failure.
    • STATISTIC_VALUES: A constant array used to determine if the last part of a test name is a statistic (e.g., avg, min, max).
  • bisect-dialog-sk.scss: Contains the SASS styles for the dialog, ensuring it aligns with the application's theme. It styles the dialog itself, input fields, and the footer elements.

  • index.ts: A simple entry point that imports and thus registers the bisect-dialog-sk custom element.

  • BUILD.bazel: Defines the build rules for this module, specifying its dependencies (SASS, TypeScript, other SK elements like alogin-sk, select-sk, spinner-sk, close-icon-sk) and sources. The dependencies highlight its reliance on common UI components and infrastructure modules for features like login status and error messaging.

Design Choices

  • Custom Element (ElementSk): Encapsulating the dialog as a custom element promotes reusability and modularity. It can be easily integrated into different parts of the Perf application where bisection capabilities are needed.
  • lit-html for Templating: Provides an efficient and declarative way to define the dialog's HTML structure and update it based on its state.
  • Pre-computation of Request Parameters: The dialog takes a “test path” and derives several other parameters (benchmark, chart, statistic) from it. This simplifies the input required from the user or the calling component, as they only need to provide the full test identifier.
  • Specific to Chrome: The comment “The bisect logic is only specific to Chrome” indicates that the backend service this dialog interacts with (/_/bisect/create) is tailored for Chrome's bisection infrastructure. The project: 'chromium' in the request payload confirms this.
  • Error Handling: The use of jsonOrThrow and errorMessage provides a standard way to handle API errors and inform the user.
  • Spinner for Feedback: The spinner-sk element gives visual feedback during the asynchronous fetch operation, improving user experience.

Module: /modules/calendar-input-sk

Calendar Input Element (calendar-input-sk)

The calendar-input-sk module provides a user-friendly way to select dates. It combines a standard text input field for manual date entry with a button that reveals a calendar-sk element within a dialog for visual date picking. This approach offers flexibility for users who prefer typing dates directly and those who prefer a visual calendar interface.

Responsibilities and Key Components

  • calendar-input-sk.ts: This is the core file defining the CalendarInputSk custom element.

    • Why: It orchestrates the interaction between the text input, the calendar button, and the pop-up calendar dialog. The goal is to provide a seamless date selection experience.
    • How:
    • It uses a standard HTML <input type="text"> element for direct date input. A pattern attribute ([0-9]{4}-[0-9]{1,2}-[0-9]{1,2}) and a title are used to guide the user on the expected YYYY-MM-DD format. An error indicator (&cross;) is shown if the input doesn't match the pattern.
    • A <button> element, styled with a date-range-icon-sk, triggers the display of the calendar.
    • A standard HTML <dialog> element is used to present the calendar-sk element. This choice simplifies the implementation of modal behavior.
    • The openHandler method is responsible for showing the dialog. It uses a Promise to manage the asynchronous nature of user interaction with the dialog (either selecting a date or canceling). This makes the event handling logic cleaner and easier to follow.
    • The inputChangeHandler is triggered when the user types into the text field. It validates the input against the defined pattern. If valid, it parses the date string and updates the displayDate property.
    • The calendarChangeHandler is invoked when a date is selected from the calendar-sk component within the dialog. It resolves the aforementioned Promise with the selected date.
    • The dialogCancelHandler is called when the dialog is closed without a date selection (e.g., by pressing the “Cancel” button or the Escape key). It rejects the Promise.
    • An input custom event (of type CustomEvent<Date>) is dispatched whenever the selected date changes, whether through the text input or the calendar dialog. This allows parent components to react to date selections.
    • The displayDate property acts as the single source of truth for the currently selected date. Setting this property will update both the text input and the date displayed in the calendar-sk when it's opened.
    • It leverages the lit-html library for templating, providing a declarative way to define the element's structure and efficiently update the DOM.
    • The element extends ElementSk, inheriting common functionalities for Skia custom elements.
  • calendar-input-sk.scss: This file contains the styling for the calendar-input-sk element.

    • Why: To provide a consistent visual appearance that integrates well with the Skia design system (themes).
    • How: It uses SASS to define styles for the input field, the calendar button, the error indicator, and the dialog. It leverages CSS variables (e.g., --error, --on-surface, --surface-1dp) for theming, allowing the component‘s appearance to adapt to different contexts (like dark mode). The .invalid class is conditionally displayed based on the input field’s validity state using the :invalid pseudo-class.
  • index.ts: This file simply imports and thereby registers the calendar-input-sk custom element.

    • Why: This is a common pattern for making custom elements available for use in an application. It acts as the entry point for the component.
  • calendar-input-sk-demo.html / calendar-input-sk-demo.ts: These files constitute a demonstration page for the calendar-input-sk element.

    • Why: To showcase the element's functionality, different states (including invalid input and dark mode), and provide a simple way for developers to interact with and understand the component. It also serves as a testbed during development.
    • How: The HTML file includes multiple instances of <calendar-input-sk> in various configurations. The TypeScript file initializes these instances, sets initial displayDate values, and demonstrates how to listen for the input event. It also shows an example of programmatically setting an invalid value in one of the input fields.

Key Workflows

1. Selecting a Date via Text Input:

User types "2023-10-26" into text input
    |
    V
inputChangeHandler in calendar-input-sk.ts
    |
    +-- (Input is valid: matches pattern "YYYY-MM-DD") --> Parse "2023-10-26" into a Date object
    |                                                       |
    |                                                       V
    |                                                   Update _displayDate property
    |                                                       |
    |                                                       V
    |                                                   Render component (updates input field's .value)
    |                                                       |
    |                                                       V
    |                                                   Dispatch "input" CustomEvent<Date>
    |
    +-- (Input is invalid: e.g., "2023-") --> Do nothing (CSS shows error indicator)

2. Selecting a Date via Calendar Dialog:

User clicks calendar button
    |
    V
openHandler in calendar-input-sk.ts
    |
    V
dialog.showModal() is called
    |
    V
<dialog> with <calendar-sk> is displayed
    |
    +-- User selects a date in <calendar-sk> --> <calendar-sk> dispatches "change" event
    |                                                |
    |                                                V
    |                                           calendarChangeHandler in calendar-input-sk.ts
    |                                                |
    |                                                V
    |                                           dialog.close()
    |                                                |
    |                                                V
    |                                           Promise resolves with the selected Date
    |
    +-- User clicks "Cancel" button or presses Esc --> dialog dispatches "cancel" event
                                                         |
                                                         V
                                                    dialogCancelHandler in calendar-input-sk.ts
                                                         |
                                                         V
                                                    dialog.close()
                                                         |
                                                         V
                                                    Promise rejects

If Promise resolves (date selected):

openHandler continues after await
    |
    V
Update _displayDate property with the resolved Date
    |
    V
Render component (updates input field's .value)
    |
    V
Dispatch "input" CustomEvent<Date>
    |
    V
Focus on the text input field

The design emphasizes a clear separation of concerns: the calendar-sk handles the visual calendar logic, while calendar-input-sk manages the integration of text input and the dialog presentation. The use of a Promise in openHandler simplifies the handling of the asynchronous dialog interaction, leading to more readable and maintainable code.

Module: /modules/calendar-sk

The calendar-sk module provides a custom HTML element <calendar-sk> that displays an interactive monthly calendar. This element was created to address limitations with the native HTML <input type="date"> element, specifically its lack of Safari support and the inability to style the pop-up calendar. Furthermore, it aims to be more themeable and accessible than other existing web component solutions like Elix.

The core philosophy behind calendar-sk is to provide a user-friendly, accessible, and customizable date selection experience. Accessibility is a key consideration, with design choices informed by WAI-ARIA practices for date pickers. This includes keyboard navigation and appropriate ARIA attributes.

Key Responsibilities and Components:

  • calendar-sk.ts: This is the heart of the module, defining the CalendarSk custom element which extends ElementSk.
    • Rendering the Calendar: It uses the lit-html library for templating, dynamically generating the HTML for the calendar grid. The calendar displays one month at a time.
    • The main template (CalendarSk.template) constructs the overall table structure, including navigation buttons for changing the year and month, and headers for the year and month.
    • CalendarSk.rowTemplate is responsible for rendering each week (row) of the calendar.
    • CalendarSk.buttonForDateTemplate creates the individual day buttons. It handles logic for disabling buttons for dates outside the current month and highlighting the selected date and today's date.
    • Date Management:
    • It internally manages a _displayDate (a JavaScript Date object) which represents the currently selected or focused date.
    • The CalendarDate class is a helper to simplify comparisons of year, month, and date, as JavaScript Date objects can be tricky with timezones and direct comparisons.
    • Helper functions like getNumberOfDaysInMonth and firstDayIndexOfMonth are used to correctly layout the days within the grid.
    • Navigation:
    • Provides UI buttons (using navigate-before-icon-sk and navigate-next-icon-sk) for incrementing/decrementing the month and year. Methods like incYear, decYear, incMonth, and decMonth handle the logic for updating _displayDate and re-rendering. A crucial detail in month/year navigation is handling cases where the current day (e.g., 31st) doesn't exist in the target month (e.g., February). In such scenarios, the date is adjusted to the last valid day of the target month.
    • Keyboard Navigation:
    • The keyboardHandler method implements navigation using arrow keys (day/week changes) and PageUp/PageDown keys (month changes). This handler is designed to be attached to a parent element (like a dialog or the document) to allow for controlled event handling, especially when multiple keyboard-interactive elements are on a page. When a key is handled, it prevents further event propagation and focuses the newly selected date button.
    • Internationalization (i18n):
    • Leverages Intl.DateTimeFormat to display month names and weekday headers according to the specified locale property or the browser's default locale. The buildWeekDayHeader method dynamically generates these headers.
    • Events:
    • Dispatches a change custom event ( CustomEvent<Date>) whenever a new date is selected by clicking on a day. The event detail contains the selected Date object.
    • Theming:
    • The component is themeable through CSS custom properties, as defined in calendar-sk.scss. It imports styles from //perf/modules/themes:themes_sass_lib and //elements-sk/modules/styles:buttons_sass_lib.
  • calendar-sk.scss: This file contains the SASS/CSS styles for the <calendar-sk> element. It defines the visual appearance of the calendar grid, buttons, headers, and how selected or “today” dates are highlighted. It relies on CSS variables (e.g., --background, --secondary, --surface-1dp) for theming, allowing the look and feel to be customized by the consuming application.
  • calendar-sk-demo.html and calendar-sk-demo.ts: These files set up a demonstration page for the calendar-sk element.
    • calendar-sk-demo.html includes instances of the calendar, some in dark mode and one configured for a different locale (zh-Hans-CN), to showcase its versatility.
    • calendar-sk-demo.ts initializes these calendar instances, sets their initial displayDate and locale, and attaches event listeners to log the change event. It also demonstrates how to hook up the keyboardHandler.
  • index.ts: A simple entry point that imports and thus registers the calendar-sk custom element, making it available for use in HTML.

Key Workflows:

  1. Initialization and Rendering: ElementSk constructor -> connectedCallback -> buildWeekDayHeader -> _render (calls CalendarSk.template)

    • When the <calendar-sk> element is added to the DOM, its connectedCallback is invoked.
    • This triggers the initial rendering process, including building the weekday headers based on the current locale.
    • The main template then renders the calendar grid for the month of the initial _displayDate.
  2. Date Selection (Click): User clicks on a date button -> dateClick method -> Updates _displayDate -> Dispatches change event with the new Date -> _render (to update UI, e.g., highlight new selection)

    User clicks a date button. [date button] --click--> dateClick(event) | +--> new Date(this._displayDate) (create copy) | +--> d.setDate(event.target.dataset.date) (update day) | +--> dispatchEvent(new CustomEvent<Date>('change', { detail: d })) | +--> this._displayDate = d | +--> this._render()

  3. Month/Year Navigation (Click): User clicks “Previous Month” button -> decMonth method -> Calculates new year, monthIndex, and date (adjusting for days in month) -> Updates _displayDate with the new Date -> _render (to display the new month/year)

    User clicks "Previous Month" button. `[Previous Month button]` --click-->
    `decMonth()` | +--> Calculate new year, month, date (adjusting for month
    boundaries and days in month) | +--> `this._displayDate = new Date(newYear,
    

    newMonthIndex, newDate)| +-->this._render()`

  4. Keyboard Navigation: User presses “ArrowRight” while calendar (or its container) has focus -> keyboardHandler(event) -> case 'ArrowRight': this.incDay(); -> incDay method updates _displayDate (e.g., from May 21 to May 22) -> this._render() -> e.stopPropagation(); e.preventDefault(); -> this.querySelector<HTMLButtonElement>('button[aria-selected="true"]')!.focus();

User presses ArrowRight key. keydown event (ArrowRight) ---> keyboardHandler(event) | + (matches case 'ArrowRight') | +--> this.incDay() | | | +--> `this._displayDate = new Date(year, monthIndex,

date + 1)| | | +-->this._render()| +-->event.stopPropagation()| +-->event.preventDefault()` | +--> Focus the newly selected day button.

The use of zero-indexed months (monthIndex) internally, as is common with the JavaScript Date object, is a deliberate choice for consistency with the underlying API, though it requires careful handling to avoid off-by-one errors, especially when calculating things like the number of days in a month.

Module: /modules/chart-tooltip-sk

chart-tooltip-sk Module Documentation

Overview

The chart-tooltip-sk module provides a custom HTML element, <chart-tooltip-sk>, designed to display detailed information about a specific data point on a chart. This tooltip is intended to be interactive, offering context-sensitive actions and information relevant to performance monitoring and analysis. It can be triggered by hovering over or clicking on a chart point.

The design philosophy behind this module is to centralize the presentation of complex data point information and related actions. Instead of scattering this logic across various chart implementations, chart-tooltip-sk encapsulates it, promoting reusability and maintainability. It aims to provide a rich user experience by surfacing relevant details like commit information, anomaly status, bug tracking, and actions like bisection or requesting further traces.

Key Responsibilities and Components

The primary responsibility of chart-tooltip-sk is to render a tooltip with relevant information and interactive elements based on the data point it's associated with.

Core Functionality & Design Choices:

  • Data Loading and Display:
    • The load() method is the main entry point for populating the tooltip with data. It accepts various parameters like the trace index, test name, y-value, date, commit position, anomaly details, and bug information. This comprehensive loading mechanism allows the parent charting component (e.g., explore-simple-sk) to provide all necessary context.
    • It displays fundamental information such as the test name, data value, units, and the date of the data point.
    • Why: Consolidating data loading into a single method simplifies the interface for parent components.
  • Commit Information:
    • The tooltip can display details about the commit associated with the data point, including the author, message, and a link to the commit in the version control system.
    • The fetch_details() method is responsible for asynchronously retrieving commit details using the /_/cid/ endpoint. This is done to avoid loading all commit details upfront for every point on a chart, which could be performance-intensive.
    • The _always_show_commit_info and _skip_commit_detail_display flags (sourced from window.perf) allow for configurable display of commit details, catering to different instance needs.
    • Why: On-demand fetching of commit details optimizes initial load times. Configuration flags provide flexibility for different deployment scenarios.
  • Anomaly Detection and Triage:
    • If a data point is identified as an anomaly, the tooltip will highlight this and display relevant anomaly metrics (e.g., median before/after, percentage change).
    • It integrates with anomaly-sk for consistent formatting of anomaly data.
    • It incorporates triage-menu-sk to allow users to triage new anomalies (e.g., create bugs, mark as not a bug).
    • If a bug is already associated with an anomaly, it displays the bug ID and provides an option to unassociate it.
    • Why: Centralizing anomaly display and triage actions within the tooltip provides a focused user workflow.
  • Bug Association:
    • Integrates with user-issue-sk to display and manage Buganizer issues linked to a data point (even if it's not a formal anomaly). Users can associate existing bugs or create new ones.
    • The bug_host_url (from window.perf) is used to construct links to the bug tracking system.
    • Why: Direct integration with bug tracking streamlines the process of linking performance data to actionable issues.
  • Interactive Actions:
    • Bisect: Provides a “Bisect” button (if _show_pinpoint_buttons is true, typically for Chromium instances) that opens bisect-dialog-sk. This allows users to initiate a bisection to find the exact commit that caused a regression.
    • Request Trace: Offers a “Request Trace” button (also gated by _show_pinpoint_buttons) that opens pinpoint-try-job-dialog-sk. This is used to request more detailed trace data for a specific commit.
    • Point Links: Integrates point-links-sk to show relevant links for a data point based on instance configuration (e.g., links to V8 or WebRTC specific commit ranges). This is configured via keys_for_commit_range and keys_for_useful_links in window.perf.
    • JSON Source: If enabled (show_json_file_display in window.perf), it provides a way to view the raw JSON data for the point via json-source-sk.
    • Why: Placing these actions directly in the tooltip makes them easily discoverable and accessible in the context of the selected data point.
  • Positioning and Visibility:
    • The moveTo() method handles the dynamic positioning of the tooltip relative to the mouse cursor or the selected chart point. It intelligently adjusts its position to stay within the viewport and avoid overlapping critical chart elements.
    • The tooltip can be “fixed” (typically on click) or transient (on hover). A fixed tooltip remains visible and offers more interactive elements.
    • Why: Smart positioning ensures the tooltip is always usable and doesn't obstruct the underlying chart. The fixed/transient behavior balances information density with unobtrusiveness.
  • Styling:
    • Uses SCSS for styling (chart-tooltip-sk.scss), including themes imported from //perf/modules/themes:themes_sass_lib.
    • Employs md-elevation for a Material Design-inspired shadow effect.
    • Why: SCSS allows for organized and maintainable styles. Material Design elements provide a consistent look and feel.

Key Files:

  • chart-tooltip-sk.ts: The core TypeScript file defining the ChartTooltipSk class, its properties, methods, and HTML template (using lit-html). This is where the primary logic for data display, interaction handling, and integration with sub-components resides.
  • chart-tooltip-sk.scss: The SASS file containing the styles for the tooltip element.
  • index.ts: A simple entry point that imports and registers the chart-tooltip-sk custom element.
  • chart-tooltip-sk-demo.html & chart-tooltip-sk-demo.ts: Files for demonstrating the tooltip's functionality. The demo sets up mock data and fetchMock to simulate API responses, allowing isolated testing and visualization of the component.
  • BUILD.bazel: Defines how the element and its demo page are built, including dependencies on other Skia Elements and Perf modules like anomaly-sk, commit-range-sk, triage-menu-sk, etc.

Workflow Example: Displaying Tooltip on Chart Point Click (Fixed Tooltip)

User clicks a point on a chart
        |
        V
Parent Chart Component (e.g., explore-simple-sk)
    1. Determines data for the clicked point (coordinates, commit, trace info).
    2. Optionally fetches commit details if not already available.
    3. Optionally checks its anomaly map for anomaly data.
    4. Calls `chartTooltipSk.load(...)` with all relevant data,
       setting `tooltipFixed = true` and providing a close button action.
    5. Calls `chartTooltipSk.moveTo({x, y})` to position the tooltip.
        |
        V
chart-tooltip-sk
    1. `load()` method populates internal properties (_test_name, _y_value, _commit_info, _anomaly, etc.).
    2. `_render()` is triggered (implicitly or explicitly).
    3. The lit-html template in `static template` is evaluated:
        - Basic info (test name, value, date) is displayed.
        - If `commit_info` is present, commit details (author, message, hash) are shown.
        - If `_anomaly` is present:
            - Anomaly metrics are displayed.
            - If `anomaly.bug_id === 0`, `triage-menu-sk` is shown.
            - If `anomaly.bug_id > 0`, bug ID is shown with an unassociate button.
            - Pinpoint job links are shown if available.
        - If `tooltip_fixed` is true:
            - "Bisect" and "Request Trace" buttons are shown (if configured).
            - `user-issue-sk` is shown (if not an anomaly).
            - `json-source-sk` button/link is shown (if configured).
            - The close icon is visible.
    4. Child components like `commit-range-sk`, `point-links-sk`, `user-issue-sk`, `triage-menu-sk`
       are updated with their respective data.
    5. `moveTo()` positions the rendered `div.container` on the screen.
        |
        V
User interacts with buttons (e.g., "Bisect", "Triage", "Close")
        |
        V
chart-tooltip-sk or its child components handle the interaction
    - e.g., clicking "Bisect" calls `openBisectDialog()`, which shows `bisect-dialog-sk`.
    - e.g., clicking "Close" executes the `_close_button_action` passed during `load()`.

This modular approach ensures that chart-tooltip-sk is a self-contained, feature-rich component for displaying detailed contextual information and actions related to data points in performance charts.

Module: /modules/cid

CID Module Documentation

This module, /modules/cid, provides functionality for interacting with Commit IDs (CIDs), which are also referred to as CommitNumbers. The primary purpose of this module is to facilitate the retrieval of detailed commit information based on a set of commit numbers and their corresponding sources.

Design and Implementation

The core functionality revolves around the lookupCids function. This function is designed to be a simple and efficient way to fetch commit details from a backend endpoint.

Why Asynchronous Operations?

The lookup of commit information involves a network request to a backend service (/_/cid/). Network requests are inherently asynchronous. Therefore, lookupCids returns a Promise. This allows the calling code to continue execution while the commit information is being fetched and to handle the response (or any potential errors) when it becomes available. This non-blocking approach is crucial for maintaining a responsive user interface or efficient server-side processing.

Why JSON for Data Exchange?

JSON (JavaScript Object Notation) is used as the data format for both the request and the response.

  • Request: The input cids (an array of CommitNumber objects) is serialized into a JSON string and sent in the body of the HTTP POST request. JSON is a lightweight and widely supported format, making it ideal for client-server communication.
  • Response: The backend endpoint is expected to return a JSON response conforming to the CIDHandlerResponse type. The jsonOrThrow utility (imported from ../../../infra-sk/modules/jsonOrThrow) is used to parse this JSON response. This utility simplifies error handling by automatically throwing an error if the response is not valid JSON or if the HTTP request itself fails.

Why POST Request?

A POST request is used instead of a GET request for sending the cids. While GET requests are often used for retrieving data, they are typically limited in the amount of data that can be sent in the URL (e.g., through query parameters). Since the number of cids to look up could be large, sending them in the request body via a POST request is a more robust and scalable approach. The Content-Type: application/json header informs the server that the request body contains JSON data.

Key Components and Files

  • cid.ts: This is the sole TypeScript file in the module and contains the implementation of the lookupCids function.
    • lookupCids(cids: CommitNumber[]): Promise<CIDHandlerResponse>:
    • Responsibility: Takes an array of CommitNumber objects and asynchronously fetches detailed commit information for each from the /_/cid/ backend endpoint.
    • How it works:
      1. It constructs an HTTP POST request to the /_/cid/ endpoint.
      2. The cids array is converted into a JSON string and included as the request body.
      3. Appropriate headers (Content-Type: application/json) are set.
      4. The fetch API is used to make the network request.
      5. The response from the server is then processed by jsonOrThrow. If the request is successful and the response is valid JSON, it resolves the promise with the parsed CIDHandlerResponse. Otherwise, it rejects the promise with an error.
    • Dependencies:
      • jsonOrThrow (from ../../../infra-sk/modules/jsonOrThrow): For robust JSON parsing and error handling.
      • CommitNumber, CIDHandlerResponse (from ../json): These are type definitions that define the structure of the input commit identifiers and the expected response from the backend.

Workflow: Looking up Commit IDs

The typical workflow for using this module is as follows:

Caller                     | /modules/cid/cid.ts (lookupCids) | Backend Server (/_/cid/)
---------------------------|----------------------------------|-------------------------
1. Has array of CommitNumber objects.
                           |                                  |
2. Calls `lookupCids(cids)`|                                  |
   `---------------------->`|                                  |
                           | 3. Serializes `cids` to JSON.    |
                           | 4. Creates POST request with JSON body.
                           |    `--------------------------->`| 5. Receives POST request.
                           |                                  | 6. Processes `cids`.
                           |                                  | 7. Generates `CIDHandlerResponse`.
                           |    `<---------------------------`| 8. Sends JSON response.
                           | 9. Receives response.            |
                           | 10. `jsonOrThrow` parses response.|
                           |    (Throws error on failure)     |
                           |                                  |
11. Receives Promise that  |                                  |
    resolves with          |                                  |
    `CIDHandlerResponse`   |                                  |
    (or rejects with error).
   `<----------------------`|                                  |

Module: /modules/cluster-lastn-page-sk

The cluster-lastn-page-sk module provides a user interface for testing and configuring alert configurations by running them against a recent range of commits. This allows users to “dry run” an alert to see what regressions it would detect before saving it to run periodically.

Core Functionality:

The primary purpose of this module is to facilitate the iterative process of defining effective alert configurations. Instead of deploying an alert and waiting for it to trigger (potentially with undesirable results), users can simulate its behavior on historical data. This helps in fine-tuning parameters like the detection algorithm, radius, sparsity, and interestingness threshold.

Key Components and Files:

  • cluster-lastn-page-sk.ts: This is the heart of the module, defining the ClusterLastNPageSk custom element.

    • State Management: It manages the current alert configuration (this.state), the commit range (this.domain), and the results of the dry run (this.regressions). It utilizes stateReflector to potentially persist and restore parts of this state in the URL, allowing users to share specific configurations or test setups.
    • User Interaction: It handles user actions such as:
    • Editing the alert configuration via a dialog (alert-config-dialog which hosts an alert-config-sk element).
    • Modifying the commit range using a domain-picker-sk element.
    • Initiating the dry run (run() method).
    • Saving the configured alert (writeAlert() method).
    • Viewing details of detected regressions in a dialog (triage-cluster-dialog which hosts a cluster-summary2-sk element).
    • API Communication:
    • Fetches initial data (paramset for alert configuration, default new alert template) from /_/initpage/ and /_/alert/new respectively.
    • Sends the alert configuration and commit range to the /_/dryrun/start endpoint to initiate the clustering and regression detection process. It uses the startRequest utility from ../progress/progress to handle the asynchronous request and display progress.
    • Sends the finalized alert configuration to /_/alert/update to save or update it in the backend.
    • Rendering: It uses lit-html for templating and dynamically renders the UI based on the current state, including the controls, the progress of a running dry run, and a table of detected regressions. The table displays commit details (commit-detail-sk) and triage status (triage-status-sk) for each detected regression.
    • Error Handling: It displays error messages if the dry run or alert saving fails.
  • cluster-lastn-page-sk.html (Demo Page): A simple HTML file that includes the cluster-lastn-page-sk element and an error-toast-sk for displaying global error messages. This is primarily used for demonstration and testing purposes.

  • cluster-lastn-page-sk-demo.ts: Sets up mock HTTP responses using fetch-mock for the demo page. This allows the cluster-lastn-page-sk element to function in isolation without needing a live backend. It mocks endpoints like /_/initpage/, /_/alert/new, /_/count/, and /_/loginstatus/.

  • cluster-lastn-page-sk.scss: Provides the styling for the cluster-lastn-page-sk element and its dialogs, ensuring a consistent look and feel with the rest of the Perf application. It uses shared SASS libraries for buttons and themes.

Workflow for Testing an Alert Configuration:

  1. Load Page: User navigates to the page.

    • cluster-lastn-page-sk fetches initial paramset and a default new alert configuration.
    User -> cluster-lastn-page-sk
           cluster-lastn-page-sk -> GET /_/initpage/ (fetches paramset)
           cluster-lastn-page-sk -> GET /_/alert/new (fetches default alert)
    
  2. Configure Alert: User clicks the “Configure Alert” button.

    • A dialog (alert-config-dialog) opens, showing alert-config-sk.
    • User modifies alert parameters (algorithm, radius, query, etc.).
    • User clicks “Accept”.
    • The state in cluster-lastn-page-sk is updated with the new configuration.
    User --clicks--> "Configure Alert" button
           cluster-lastn-page-sk --shows--> alert-config-dialog
           User --interacts with--> alert-config-sk
           User --clicks--> "Accept"
           alert-config-sk --updates--> cluster-lastn-page-sk.state
    
  3. (Optional) Adjust Commit Range: User interacts with domain-picker-sk to define the number of recent commits or a specific date range for the dry run.

    • cluster-lastn-page-sk.domain is updated.
  4. Run Dry Run: User clicks the “Run” button.

    • cluster-lastn-page-sk constructs a RegressionDetectionRequest using the current alert state and domain.
    • It sends this request to /_/dryrun/start.
    • The UI shows a spinner and progress messages.
    • As results (regressions) become available, they are displayed in a table.
    User --clicks--> "Run" button
           cluster-lastn-page-sk --creates--> RegressionDetectionRequest
           cluster-lastn-page-sk --POSTs to--> /_/dryrun/start (with request body)
                                       (progress updates via startRequest callback)
           Backend --processes & clusters-->
           Backend --sends progress/results--> cluster-lastn-page-sk
           cluster-lastn-page-sk --updates--> UI (regressions table, status messages)
    
  5. Review Results: User examines the table of regressions.

    • Each row shows a commit and the regressions (low/high) found at that commit.
    • User can click on a regression to open a triage-cluster-dialog (showing cluster-summary2-sk) for more details.
    • From the summary dialog, user can open related traces in the explorer view.
  6. Iterate or Save:

    - If results are not satisfactory, user goes back to step 2 to adjust the
      alert configuration and re-runs.
    - If results are satisfactory, user clicks "Create Alert" (or "Update
      Alert" if modifying an existing one).
    - `cluster-lastn-page-sk` sends the current alert `state` to
      `/_/alert/update`. `User --clicks--> "Create Alert" / "Update Alert"
    

    button cluster-lastn-page-sk --POSTs to--> /_/alert/update (with alert config) Backend --saves/updates alert--> Backend --responds with ID--> cluster-lastn-page-sk cluster-lastn-page-sk --updates--> UI (button text might change to “Update Alert”)`

Design Decisions:

  • Client-Side Dry Run Initiation: The “dry run” is initiated from the client, sending the full alert configuration. This allows immediate feedback and iteration without needing to first save an incomplete or experimental alert to the backend.
  • Component-Based UI: The UI is built using custom elements (e.g., alert-config-sk, domain-picker-sk, cluster-summary2-sk). This promotes modularity, reusability, and separation of concerns.
  • Asynchronous Operations with Progress: Long-running operations like the dry run are handled asynchronously with visual feedback (spinners, status messages) provided by the ../progress/progress utility, enhancing user experience.
  • State Reflection: Using stateReflector allows parts of the page's state (like the alert configuration) to be encoded in the URL. This is useful for sharing specific test scenarios or bookmarking them.
  • Dialogs for Focused Interaction: Modal dialogs are used for alert configuration and viewing regression summaries, preventing users from interacting with the main page content while these tasks are in progress, thus guiding their focus.
  • Mocking for Demo/Testing: The demo page (cluster-lastn-page-sk-demo.ts) heavily relies on fetch-mock. This enables isolated development and testing of the UI component without a backend dependency, which is crucial for frontend unit/integration tests and local development.

Module: /modules/cluster-page-sk

The cluster-page-sk module provides the user interface for Perf's trace clustering functionality. This allows users to identify groups of traces that exhibit similar behavior, which is crucial for understanding performance regressions or improvements across different configurations and tests.

Core Functionality and Design:

The primary goal of this page is to allow users to define a set of traces and then apply a clustering algorithm to them. The “why” behind this is to simplify the analysis of large datasets by grouping related performance changes. Instead of manually inspecting hundreds or thousands of individual traces, users can focus on a smaller number of clusters, each representing a distinct performance pattern.

The “how” involves several key components:

  1. Defining the Scope of Analysis:

    • Commit Selection: Users first select a central commit around which the analysis will be performed. This is handled by commit-detail-picker-sk. The clustering will typically look at commits before and after this selected point. The state.offset property stores the selected commit's offset.
    • Query: Users define the set of traces to consider using a query string. This is managed by query-sk and paramset-sk. The state.query holds this query. The query-count-sk element provides feedback on how many traces match the current query.
    • Time Range/Commit Radius: Users can specify a “radius” (in terms of number of commits) around the selected commit to include in the analysis. This is stored in state.radius.
  2. Clustering Algorithm and Parameters:

    • Algorithm Selection: Users can choose the clustering algorithm (e.g., k-means). This is facilitated by algo-select-sk and stored in state.algo. The choice of algorithm impacts how clusters are formed and what “similarity” means.
    • Number of Clusters (K): For algorithms like k-means, the user can suggest the number of clusters to find. A value of 0 typically means the server will try to determine an optimal K. This is stored in state.k.
    • Interestingness Threshold: Users can define a threshold for what constitutes an “interesting” cluster, often based on the magnitude of regression or step size. This is state.interesting.
    • Sparse Data Handling: An option (state.sparse) allows users to indicate if the data is sparse, meaning not all traces have data points for all commits. This affects how the clustering algorithm processes missing data.
  3. Executing the Clustering and Displaying Results:

    • Initiating the Request: The “Run” button triggers the clustering process. The start() method constructs a RegressionDetectionRequest object containing all the user-defined parameters. This request is sent to the /_/cluster/start endpoint.
    • Background Processing and Progress: Clustering can be a long-running operation. The module uses the progress utility to manage the asynchronous request. It displays a spinner (spinner-sk) and status messages (ele.status, ele.runningStatus) to keep the user informed. The requestId property tracks the active request.
    • Displaying Clusters: Once the server responds, the RegressionDetectionResponse contains a list of FullSummary objects. Each FullSummary represents a discovered cluster. These are rendered using multiple cluster-summary2-sk elements. This component is responsible for visualizing the details of each cluster, including its member traces and regression information.
    • Sorting Results: Users can sort the resulting clusters by various metrics (size, regression score, etc.) using sort-sk.

State Management:

The cluster-page-sk component maintains its internal state in a State object. This includes user selections like the query, commit offset, algorithm, and various parameters. Crucially, this state is reflected in the URL using the stateReflector utility. This design decision ensures that:

  • The page is bookmarkable: Users can save and share URLs that directly lead to a specific clustering configuration and its results.
  • Browser history (back/forward buttons) works as expected.
  • The application state is serializable and easily reproducible.

The stateHasChanged() method is called whenever a piece of the state is modified, triggering the stateReflector to update the URL and potentially re-render the component.

Key Files and Their Roles:

  • cluster-page-sk.ts: This is the main TypeScript file defining the ClusterPageSk custom element. It orchestrates all the sub-components, manages the application state, handles user interactions (e.g., button clicks, input changes), makes API calls for clustering, and renders the results. It defines the overall layout and logic of the clustering page.
  • cluster-page-sk.html (inferred, as it's a LitElement): The HTML template is defined within cluster-page-sk.ts using lit-html. This template structures the page, embedding various custom elements for commit selection, query building, algorithm choice, and result display.
  • cluster-page-sk.scss: Provides the specific styling for the cluster-page-sk element and its layout, ensuring a consistent look and feel.
  • index.ts: A simple entry point that imports and registers the cluster-page-sk custom element, making it available for use in HTML.
  • cluster-page-sk-demo.ts & cluster-page-sk-demo.html: These files set up a demonstration page for the cluster-page-sk element. cluster-page-sk-demo.ts uses fetch-mock to simulate API responses, allowing the component to be developed and tested in isolation without needing a live backend. This is crucial for rapid development and ensuring the UI behaves correctly under various backend scenarios.
  • State class (within cluster-page-sk.ts): Defines the structure of the data that is persisted in the URL and drives the component's behavior. It encapsulates all user-configurable options for the clustering process.

Workflow Example: Performing a Cluster Analysis

User Interaction                         | Component/State Change        | Backend Interaction
-----------------------------------------|-------------------------------|---------------------
1. User navigates to the cluster page.   | `ClusterPageSk` initializes.  | Fetches initial paramset (`/_/initpage/`)
                                         | `stateReflector` initializes  |
                                         | from URL or defaults.         |
                                         |                               |
2. User selects a commit.                | `commit-detail-picker-sk`     | (Potentially fetches commit details if not cached)
                                         | emits `commit-selected`.      |
                                         | `state.offset` updates.       |
                                         | `stateHasChanged()` called.   |
                                         |                               |
3. User types a query (e.g., "config=gpu").| `query-sk` emits            | (Potentially `/`_`/count/` to update trace count)
                                         | `query-change`.               |
                                         | `state.query` updates.        |
                                         | `stateHasChanged()` called.   |
                                         |                               |
4. User selects an algorithm (e.g., kmeans).| `algo-select-sk` emits      |
                                         | `algo-change`.                |
                                         | `state.algo` updates.         |
                                         | `stateHasChanged()` called.   |
                                         |                               |
5. User adjusts advanced parameters      | Input elements update         |
   (K, radius, interestingness).         | corresponding `state` props.  |
                                         | `stateHasChanged()` called.   |
                                         |                               |
6. User clicks "Run".                    | `start()` method is called.   | POST to `/_/cluster/start` with `RegressionDetectionRequest`
                                         | `requestId` is set.           | (This is a long-running request)
                                         | Spinner becomes active.       |
                                         |                               |
7. Page periodically updates status.     | `progress` utility polls for  | GET requests to check progress.
                                         | updates.                      |
                                         | `ele.runningStatus` updates.  |
                                         |                               |
8. Clustering completes.                 | `progress` utility resolves.  | Final response from `/_/cluster/start` (or progress endpoint)
                                         | `summaries` array is populated| containing `RegressionDetectionResponse`.
                                         | with cluster data.            |
                                         | `requestId` is cleared.       |
                                         | Spinner stops.                |
                                         |                               |
9. Results are displayed.                | `ClusterPageSk` re-renders,   |
                                         | showing `cluster-summary2-sk` |
                                         | elements for each cluster.    |

This workflow highlights how user inputs are translated into state changes, which then drive API requests and ultimately update the UI to present the clustering results. The separation of concerns among various sub-components (for query, commit selection, etc.) makes the main cluster-page-sk element more manageable.

Module: /modules/cluster-summary2-sk

The cluster-summary2-sk module provides a custom HTML element for displaying detailed information about a cluster of performance test results. This includes visualizing the trace data, showing regression statistics, and allowing users to triage the cluster.

Core Functionality and Design:

The primary purpose of this element is to present a comprehensive summary of a performance cluster. It aims to provide all necessary information for a user to understand the nature of a performance change (regression or improvement) and take appropriate action (e.g., filing a bug, marking it as expected).

Key design considerations include:

  • Data Visualization: A plot-simple-sk element is used to display the centroid trace of the cluster over time. This visual representation helps users quickly grasp the trend and identify the point of change. An “x-bar” can be displayed on the plot to highlight the specific commit where a step change is detected.
  • Statistical Summary: The element displays key statistics about the cluster, such as its size, the regression factor, step size, and least squares error. The labels and formatting of these statistics dynamically adapt based on the StepDetection algorithm used (e.g., ‘absolute’, ‘percent’, ‘mannwhitneyu’). This ensures that the presented information is relevant and interpretable for the specific detection method.
  • Commit Details: Integration with commit-detail-panel-sk allows users to view details of the commit associated with the detected step point or any selected point on the trace plot. This is crucial for correlating performance changes with specific code modifications.
  • Triaging: If not disabled via the notriage attribute, the element includes a triage2-sk component. This allows authenticated users with “editor” privileges to set the triage status (e.g., “positive”, “negative”, “untriaged”) and add a message. This functionality is essential for tracking the investigation and resolution of performance issues.
  • Contextual Actions: Buttons are provided to:
    • “View on dashboard”: Opens the current cluster view in a broader explorer context, pre-filling relevant parameters like shortcut ID and time range.
    • “Word Cloud”: Toggles the visibility of a word-cloud-sk element, which displays a summary of the parameters that make up the traces in the cluster. This helps in understanding the common characteristics of the affected tests.
    • A permalink is generated to directly link to the triage page for the specific step point.
  • Interactive Exploration: The commit-range-sk component allows users to define a range around the detected step or a selected commit, facilitating further investigation within the Perf application.

Key Components and Their Roles:

  • cluster-summary2-sk.ts: This is the main TypeScript file defining the ClusterSummary2Sk custom element.
    • ClusterSummary2Sk class: Extends ElementSk and manages the element's state, rendering, and event handling.
    • Data Properties (full_summary, triage, alert): These properties receive the core data for the cluster. When full_summary is set, it triggers the rendering of the plot, statistics, and commit details. The alert property determines the labels and formatting for regression statistics. The triage property reflects the current triage state.
    • Template (template static method): Uses lit-html to define the element's structure, binding data to various sub-components and display areas.
    • Event Handling:
      • open-keys: Fired when the “View on dashboard” button is clicked, providing details for opening the explorer.
      • triaged: Fired when the triage status is updated, containing the new status and the relevant commit information.
      • trace_selected: Handles events from plot-simple-sk when a point on the graph is clicked, triggering a lookup for the corresponding commit details.
    • Helper Methods:
      • statusClass(): Determines the CSS class for the regression display based on the severity (e.g., “high”, “low”).
      • permaLink(): Generates a URL to the triage page focused on the step point.
      • lookupCids() (static): A static method (delegating to ../cid/cid.ts) used to fetch commit details based on commit numbers.
    • labelsForStepDetection: A crucial constant object that maps different StepDetection algorithm names (e.g., ‘percent’, ‘mannwhitneyu’, ‘absolute’) to specific labels and number formatting functions for the regression statistics. This ensures that the displayed information is meaningful and correctly interpreted for the algorithm used to detect the cluster.
  • cluster-summary2-sk.html (template, rendered by cluster-summary2-sk.ts): Defines the visual layout using HTML and embedded custom elements. It uses a CSS grid for positioning the main sections: regression summary, statistics, plot, triage status, commit details, actions, and word cloud.
  • cluster-summary2-sk.scss: Provides the styling for the element. It defines how different sections are displayed, including styles for regression severity (e.g., red for “high” regressions, green for “low”), button appearances, and responsive behavior (hiding the plot on smaller screens).
  • cluster-summary2-sk-demo.html and cluster-summary2-sk-demo.ts: These files set up a demonstration page for the cluster-summary2-sk element. The .ts file provides mock data for FullSummary, Alert, and TriageStatus to populate the demo instances of the element. It also demonstrates how to listen for the triaged and open-keys custom events.

Workflows:

  1. Initialization and Data Display:

    • The host application provides full_summary (containing cluster data and trace frame), alert (details of the alert that triggered this cluster), and optionally triage (current triage status) properties to the cluster-summary2-sk element.
    • set full_summary():
      • Updates internal summary and frame data.
      • Populates dataset attributes for sorting (e.g., data-clustersize).
      • Clears and redraws the plot-simple-sk with the centroid trace from summary.centroid and time labels from frame.dataframe.header.
      • If a step point is identified and the status is not “Uninteresting”, an x-bar is placed on the plot at the corresponding commit.
      • lookupCids is called to fetch and display details for the commit at the step point in commit-detail-panel-sk.
    • set alert():
      • Updates the labels used for displaying regression statistics based on alert.step and labelsForStepDetection.
    • set triage():
      • Updates the triageStatus and re-renders the triage controls.
    • The element renders based on the provided data, displaying statistics, plot, commit details, and triage controls.
    Host Application           cluster-summary2-sk
    ----------------           -------------------
    [Set full_summary data] --> Process data
                                 |
                                 +-> plot-simple-sk (Draws trace)
                                 |
                                 +-> commit-detail-panel-sk (Shows step commit)
                                 |
                                 +-> Display stats (regression, size, etc.)
    
    [Set alert data] ---------> Update regression labels/formatters
    
    [Set triage data] --------> Update triage2-sk state
    
  2. User Triage:

    • User interacts with triage2-sk (selects status) and the message input field.
    • User clicks the “Update” button.
    • update() method is called:
      • An ClusterSummary2SkTriagedEventDetail object is created containing the step_point (as columnHeader) and the current triageStatus.
      • A triaged custom event is dispatched with this detail.
    • The host application listens for the triaged event to persist the triage status.
    User                  cluster-summary2-sk             Host Application
    ----                  -------------------             ----------------
    Selects status ---->  [triage2-sk updates value]
    Types message  ---->  [Input updates value]
    Clicks "Update" --->  update()
                           |
                           +-> Creates TriagedEventDetail
                           |
                           +-> Dispatches "triaged" event --> Listens and handles event
                                                               (e.g., saves to backend)
    
  3. Viewing on Dashboard:

    • User clicks the “View on dashboard” button.
    • openShortcut() method is called:
      • An ClusterSummary2SkOpenKeysEventDetail object is created with the shortcut ID, begin and end timestamps from the frame, and the step_point as xbar.
      • An open-keys custom event is dispatched.
    • The host application listens for open-keys and navigates the user to the explorer view with the provided parameters.
    User                      cluster-summary2-sk             Host Application
    ----                      -------------------             ----------------
    Clicks "View on dash" --> openShortcut()
                               |
                               +-> Creates OpenKeysEventDetail
                               |
                               +-> Dispatches "open-keys" event --> Listens and handles event
                                                                    (e.g., navigates to explorer)
    

The cluster-summary2-sk element plays a vital role in the Perf frontend by providing a focused and interactive view for analyzing individual performance regressions or improvements identified through clustering. Its integration with plotting, commit details, and triaging makes it a key tool for performance analysis workflows.

Module: /modules/commit-detail-panel-sk

Commit Detail Panel SK

High-level Overview:

The commit-detail-panel-sk module provides a custom HTML element <commit-detail-panel-sk> designed to display a list of commit details. It offers functionality to make these commit entries selectable and emits an event when a commit is selected. This component is primarily used in user interfaces where users need to browse and interact with a sequence of commits.

Why and How:

The core purpose of this module is to present commit information in a structured and interactive way. Instead of simply displaying raw commit data, it leverages the commit-detail-sk element (an external dependency) to render each commit with relevant information like author, message, and a link to the commit.

The design decision to make commits selectable (via the selectable attribute) enhances user interaction. When a commit is clicked in “selectable” mode, it triggers a commit-selected custom event. This event carries detailed information about the selected commit, including its index in the list, a concise description, and the full commit object. This allows parent components or applications to react to user selections and perform actions based on the chosen commit (e.g., loading further details, navigating to a specific state).

The implementation uses Lit library for templating and rendering. The commit data is provided via the details property, which expects an array of Commit objects (defined in perf/modules/json). The component dynamically generates table rows for each commit.

The visual appearance is controlled by commit-detail-panel-sk.scss. It defines styles for the panel, including highlighting the selected row and adjusting opacity based on the selectable state. The styling aims for a clean and readable presentation of commit information.

A hide property is also available to conditionally show or hide the entire commit list. This is useful for scenarios where the panel's visibility needs to be controlled dynamically by the parent application.

Key Components/Files:

  • commit-detail-panel-sk.ts: This is the heart of the module. It defines the CommitDetailPanelSk class, which extends ElementSk.
    • Responsibilities:
    • Manages the list of Commit objects (_details property).
    • Renders the list of commits as an HTML table using Lit templates (template and rows static methods).
    • Handles user clicks on table rows (_click method).
    • When a commit is selected (and the selectable attribute is present), it dispatches the commit-selected custom event with relevant commit data.
    • Manages the selectable, selected, and hide attributes and their corresponding properties, re-rendering the component when these change.
    • Integrates the commit-detail-sk element to display individual commit details within each row.
  • commit-detail-panel-sk.scss: This file contains the SASS styles for the component.
    • Responsibilities:
    • Defines the visual appearance of the commit panel, including link colors, table cell padding, and selected row highlighting.
    • Adjusts the opacity and cursor style based on whether the panel is selectable.
    • Leverages theme variables (e.g., --primary, --surface-1dp) from //perf/modules/themes:themes_sass_lib for consistent theming.
  • commit-detail-panel-sk-demo.ts and commit-detail-panel-sk-demo.html: These files provide a demonstration page for the component.
    • Responsibilities:
    • Illustrate how to use the <commit-detail-panel-sk> element in an HTML page.
    • Show examples of the component in both selectable and non-selectable states, and in light/dark themes.
    • Demonstrate how to provide commit data to the details property and how to listen for the commit-selected event.
  • index.ts: A simple entry point that imports and registers the commit-detail-panel-sk custom element, making it available for use.
  • BUILD.bazel: Defines how the module is built and its dependencies. For instance, it declares commit-detail-sk as a runtime dependency and Lit as a TypeScript dependency.
  • commit-detail-panel-sk_puppeteer_test.ts: Contains Puppeteer tests to verify the component's rendering and basic functionality.

Key Workflows:

  1. Initialization and Rendering:

    Parent Application --> Sets 'details' property of <commit-detail-panel-sk> with Commit[]
                                      |
                                      V
    commit-detail-panel-sk.ts --> _render() is called
                                      |
                                      V
                          Lit template generates <table>
                                      |
                                      V
                          For each Commit in 'details':
                               Generates <tr> containing <commit-detail-sk .cid=Commit>
    
  2. Commit Selection (when selectable is true): User --> Clicks on a <tr> in the <commit-detail-panel-sk> | V commit-detail-panel-sk.ts --> _click(event) handler is invoked | V Determines the clicked commit's index and data | V Sets 'selected' attribute/property to the index of the clicked commit | V Dispatches 'commit-selected' CustomEvent with { selected: index, description: string, commit: Commit } | V Parent Application --> Listens for 'commit-selected' event and processes the event.detail

The design favors declarative attribute-based configuration (e.g., selectable, selected) and event-driven communication for user interactions, which are common patterns in web component development.

Module: /modules/commit-detail-picker-sk

The commit-detail-picker-sk module provides a user interface element for selecting a specific commit from a range of commits. It's designed to be a reusable component that simplifies the process of commit selection within applications that need to interact with commit histories.

Core Functionality and Design:

The primary purpose of commit-detail-picker-sk is to allow users to browse and select a commit. This is achieved by presenting a button that, when clicked, opens a dialog.

  • Button as Entry Point: The button displays a summary of the currently selected commit (author and message) or a default message like “Choose a commit.” This provides immediate context to the user. Clicking this button triggers the opening of the selection dialog. [Button: "Author - Commit Message"] --- (click) ---> [Dialog Opens]
  • Dialog for Selection: The dialog is the main interaction point for choosing a commit. It contains:
    • commit-detail-panel-sk: This submodule is responsible for displaying the list of commits fetched from the backend. Users can click on a commit in this panel to select it.
    • Date Range Selection: A day-range-sk component allows users to specify a time window for fetching commits. This is crucial for performance and usability, as it prevents loading an overwhelming number of commits at once. When the date range changes, the component automatically fetches the relevant commits. [day-range-sk] -- (date range change) --> [Fetch Commits for New Range] | V [commit-detail-panel-sk updates]
    • Spinner: A spinner-sk element provides visual feedback to the user while commits are being fetched, indicating that an operation is in progress.
    • Close Button: Allows the user to dismiss the dialog without making a selection or after a selection is made.

Data Flow and State Management:

  1. Initialization: When the component is first loaded, it initializes with a default date range (typically the last 24 hours). It then fetches the commits within this initial range.
  2. Fetching Commits: The component makes a POST request to the /_/cidRange/ endpoint. The request body includes the begin and end timestamps of the desired range and optionally the offset of a currently selected commit (to ensure it's included in the results if it falls outside the new range). User Action (e.g., change date range) | V [commit-detail-picker-sk] | V (Constructs RangeRequest: {begin, end, offset}) POST /_/cidRange/ | V (Receives Commit[] array) [commit-detail-picker-sk] | V (Updates internal 'details' array) [commit-detail-panel-sk] (Re-renders with new commit list)
  3. Commit Selection: - When a user selects a commit in the commit-detail-panel-sk, the panel emits a commit-selected event. - commit-detail-picker-sk listens for this event and updates its internal selected index. - The dialog is then closed, and the main button‘s text updates to reflect the new selection. - Crucially, commit-detail-picker-sk itself emits a commit-selected event. This allows parent components to react to the user’s choice. The detail of this event is of type CommitDetailPanelSkCommitSelectedDetails, containing information about the selected commit. [commit-detail-panel-sk] -- (internal click on a commit) --> Emits 'commit-selected' (internal) | V [commit-detail-picker-sk] -- (handles internal event) --> Updates 'selected' index Updates button text Closes dialog Emits 'commit-selected' (external)
  4. External Selection (selection property): The component exposes a selection property (of type CommitNumber). If this property is set externally, the component will attempt to fetch commits around that CommitNumber and pre-select it in the panel.

Key Files and Responsibilities:

  • commit-detail-picker-sk.ts: This is the core TypeScript file defining the CommitDetailPickerSk custom element.
    • Why: It orchestrates the interactions between the button, dialog, commit-detail-panel-sk, and day-range-sk. It handles fetching commit data, managing the selection state, and emitting the final commit-selected event.
    • How: It uses the Lit library for templating and rendering. It defines methods for opening/closing the dialog (open(), close()), handling range changes (rangeChange()), updating the commit list (updateCommitSelections()), and processing selections from the panel (panelSelect()). The selection getter/setter allows for programmatic control of the selected commit.
  • commit-detail-picker-sk.scss: Contains the SASS/CSS styles for the component.
    • Why: To provide a consistent visual appearance and layout for the button and the dialog, ensuring it integrates well with the overall application theme (e.g., light and dark modes via CSS variables like --on-background, --background).
    • How: It styles the dialog element, the buttons within it, and ensures proper display and spacing of child components like day-range-sk.
  • commit-detail-picker-sk-demo.html & commit-detail-picker-sk-demo.ts: These files provide a demonstration page for the component.
    • Why: To showcase the component's functionality in isolation, making it easier to test and understand its usage. The demo also includes examples for light and dark themes.
    • How: The HTML sets up basic page structure and placeholders for the component. The TypeScript file initializes instances of commit-detail-picker-sk, mocks the backend API call (/_/cidRange/) using fetch-mock to provide sample commit data, and sets up an event listener to display the commit-selected event details.
  • Dependencies:
    • commit-detail-panel-sk: Used within the dialog to list and allow selection of individual commits. commit-detail-picker-sk passes the fetched details (array of Commit objects) to this panel.
    • day-range-sk: Used to allow the user to define the time window for which commits should be fetched. Its day-range-change event triggers a refetch in the picker.
    • spinner-sk: Provides visual feedback during data loading.
    • ElementSk: Base class from infra-sk providing common custom element functionality.
    • jsonOrThrow: Utility for parsing JSON responses and throwing an error if parsing fails or the response is not OK.
    • errorMessage: Utility for displaying error messages to the user.

The design focuses on encapsulation: the commit-detail-picker-sk component manages its internal state (current range, fetched commits, selected index) and exposes a clear interface for interaction (a button to open, a selection property, and a commit-selected event). This makes it easy to integrate into larger applications that require users to pick a commit from a potentially large history.

Module: /modules/commit-detail-sk

commit-detail-sk

The commit-detail-sk module provides a custom HTML element <commit-detail-sk> designed to display concise information about a single commit. This element is crucial for user interfaces where presenting commit details in a structured and interactive manner is necessary.

Why

In applications dealing with version control systems, there's often a need to display details of individual commits. This could be for reviewing changes, navigating commit history, or linking to related actions like exploring code changes, viewing clustered data, or triaging issues associated with a commit. The commit-detail-sk element encapsulates this functionality, offering a reusable and consistent way to present commit information.

How

The core of the module is the CommitDetailSk class, which extends ElementSk. This class defines the structure and behavior of the <commit-detail-sk> element.

Key Responsibilities and Components:

  • commit-detail-sk.ts: This is the heart of the module.

    • It defines the CommitDetailSk custom element.
    • The element takes a Commit object (defined in perf/modules/json) as input via the cid property. This object contains details like the commit hash, author, message, timestamp, and URL.
    • The template function, using lit-html, defines the HTML structure of the element. It displays:
    • A truncated commit hash.
    • The commit author.
    • The time elapsed since the commit (human-readable, via diffDate).
    • The commit message.
    • It also renders a set of Material Design outlined buttons: “Explore”, “Cluster”, “Triage”, and “Commit”. These buttons are intended to navigate the user to different views or actions related to the specific commit. The links for these buttons are dynamically generated based on the commit hash and the cid.url.
    • The openLink method handles the click events on these buttons, opening the respective links in a new browser window/tab.
    • upgradeProperty is used to ensure that the cid property is correctly initialized if it's set before the element is fully connected to the DOM.
  • commit-detail-sk.scss: This file contains the styling for the <commit-detail-sk> element.

    • It defines styles for the layout, typography, and appearance of the commit information and the action buttons.
    • It utilizes CSS variables for theming (e.g., --blue, --primary), allowing the component to adapt to different visual themes (light and dark mode, as demonstrated in the demo).
    • It includes styles from //perf/modules/themes:themes_sass_lib and //elements-sk/modules:colors_sass_lib to ensure consistency with the broader application's design system.
  • commit-detail-sk-demo.html and commit-detail-sk-demo.ts: These files provide a demonstration page for the <commit-detail-sk> element.

    • The HTML sets up basic page structure and includes instances of <commit-detail-sk> in both light and dark mode contexts.
    • The TypeScript file initializes these demo elements with sample Commit data. It also simulates a click on the element to potentially reveal more details or actions if such functionality were implemented (though in the current version, the “tip” div with buttons is always visible). The Date.now function is mocked to ensure consistent output for the diffDate calculation in the demo and tests.

Workflow Example: Displaying Commit Information and Actions

1. Application provides a `Commit` object.
   e.g., { hash: "abc123...", author: "user@example.com", ... }

2. The `Commit` object is assigned to the `cid` property of a `<commit-detail-sk>` element.
   <commit-detail-sk .cid=${commitData}></commit-detail-sk>

3. `CommitDetailSk` element renders:
   [abc123...] - [user@example.com] - [2 days ago] - [Commit message]
   +----------------------------------------------------------------+
   | [Explore] [Cluster] [Triage] [Commit (link to commit source)]  |  <- Action buttons
   +----------------------------------------------------------------+

4. User clicks an action button (e.g., "Explore").

5. `openLink` method is called with a generated URL (e.g., "/g/e/abc123...").

6. A new browser tab opens to the specified URL.

This design promotes reusability and separation of concerns. The element focuses solely on presenting commit information and providing relevant action links, making it easy to integrate into various parts of an application that need to display commit details. The use of lit-html for templating allows for efficient rendering and updates.

Module: /modules/commit-range-sk

The commit-range-sk module provides a custom HTML element, <commit-range-sk>, designed to display a link representing a range of commits within a Git repository. This functionality is particularly useful in performance analysis tools where identifying the specific commits that introduced a performance regression or improvement is crucial.

Core Functionality and Design:

The primary purpose of commit-range-sk is to dynamically generate a URL that points to a commit range viewer (e.g., a Git web interface like Gerrit or GitHub). This URL is constructed based on a “begin” and an “end” commit.

  • Identifying the Commit Range:

    • The element takes a trace (an array of numerical data points, where each point corresponds to a commit), a commitIndex (the index within the trace array that represents the “end” commit of interest), and header information (which maps trace indices to commit metadata like offset or commit number).
    • The “end” commit is directly determined by the commitIndex and the header.
    • The “begin” commit is found by iterating backward from the commitIndex - 1 in the trace. It skips over any entries marked with MISSING_DATA_SENTINEL (indicating commits for which there's no data point) until it finds a valid previous commit.
    • This logic ensures that the range always spans from a commit with actual data to the target commit, even if there are intermediate commits with missing data.
  • Converting Commit Numbers to Hashes:

    • The commit range URL template, configured globally via window.perf.commit_range_url, typically requires Git commit hashes (SHAs) rather than internal commit numbers or offsets.
    • The commit-range-sk element uses a commitNumberToHashes function to perform this conversion.
    • The default implementation, defaultcommitNumberToHashes, makes an asynchronous call to a backend service (likely //cid/``) by invokinglookupCidsfrom the//perf/modules/cid:cid_ts_lib module. This service is expected to return the commit hashes corresponding to the provided commit numbers.
    • This design allows for testability by enabling the replacement of commitNumberToHashes with a mock function during testing (as seen in commit-range-sk_test.ts).
  • URL Construction and Display:

    • Once the “begin” and “end” commit numbers are identified and their corresponding hashes are retrieved, the element populates the window.perf.commit_range_url template. This template usually contains placeholders like {begin} and {end} which are replaced with the actual commit hashes.
    • The displayed text for the link is also dynamically generated. If the “begin” and “end” commits are not consecutive (i.e., there's at least one commit between them, or the “begin” commit had to skip missing data points), the text will show a range like “<begin_offset + 1> - <end_offset>”. Otherwise, it will just show the “<end_offset>”. The +1 for the begin offset in a range is to ensure the displayed range starts after the last known good commit.
    • The element supports two display modes controlled by the showLinks property:
    • If showLinks is false (default, or when the element is merely hovered over in some UIs), only the text representing the commit(s) is displayed.
    • If showLinks is true, a fully formed hyperlink (<a> tag) is rendered.

Key Components/Files:

  • commit-range-sk.ts: This is the core file defining the CommitRangeSk custom element.

    • It extends ElementSk, a base class for custom elements in the Skia infrastructure.
    • It manages the state of the component through properties like _trace, _commitIndex, _header, _url, _text, and _commitIds.
    • The recalcLink() method is central to its operation. It's triggered whenever relevant input properties (trace, commitIndex, header) change. This method orchestrates the process of finding commit IDs, converting them to hashes, and generating the URL and display text.
    • setCommitIds() implements the logic for determining the start and end commit numbers based on the input trace and header, handling missing data points.
    • It uses the lit/html library for templating, allowing for efficient rendering and updates to the DOM.
  • commit-range-sk-demo.ts and commit-range-sk-demo.html: These files provide a demonstration page for the commit-range-sk element.

    • commit-range-sk-demo.ts sets up a mock environment, including mocking the fetch call to //cid/``usingfetch-mock. This is crucial for demonstrating the element's behavior without needing a live backend.
    • It also initializes the global window.perf object with necessary configuration, such as the commit_range_url template.
    • It then instantiates the <commit-range-sk> element and populates its properties to showcase its functionality.
  • commit-range-sk_test.ts: This file contains unit tests for the CommitRangeSk element.

    • It utilizes chai for assertions and setUpElementUnderTest for easy instantiation of the element in a test environment.
    • A key testing strategy involves overriding the commitNumberToHashes method on the element instance to provide controlled hash values and assert the correctness of the generated URL and text, especially in scenarios involving MISSING_DATA_SENTINEL.
  • BUILD.bazel: Defines how the module is built, its dependencies (e.g., //infra-sk/modules/ElementSk, //perf/modules/json, lit), and how the demo page and tests are structured.

Workflow Example: Generating a Commit Range Link

  1. Initialization:

    • The application using <commit-range-sk> sets the global window.perf.commit_range_url (e.g., "http://example.com/range/{begin}/{end}").
    • The <commit-range-sk> element is added to the DOM.
  2. Property Setting:

          - The application provides data to the element:
            - `element.trace = [10, MISSING_DATA_SENTINEL, 12, 15];`
            - `element.header = [{offset: C1}, {offset: C2}, {offset: C3},
    
    {offset: C4}];`(where C1-C4 are commit numbers)
    

    -element.commitIndex = 3;(points to the data15and commitC4)

    - `element.showLinks = true;`
    
  3. recalcLink() Triggered:

    • Changing any of the above properties automatically calls recalcLink().
  4. Determine Commit IDs (setCommitIds()):

    • End commit: header[commitIndex].offset => C4.
    • Previous commit search:
      • Start at commitIndex - 1 = 2. trace[2] is 12 (not missing). So, header[2].offset => C3.
    • _commitIds becomes [C3, C4].
  5. Check if Range (isRange()):

    • Is C3 + 1 === C4? Let's assume C3 and C4 are not consecutive (e.g., C3=100, C4=102). isRange() returns true.
    • Text becomes: "${C3 + 1} - ${C4}" (e.g., "101 - 102").
  6. Convert Commit IDs to Hashes (commitNumberToHashes):

    - `commitNumberToHashes([C3, C4])` is called.
    - Internally, this likely makes a POST request to `/`/cid/``with`[C3,
      C4]`.
    - Backend returns: `{ commitSlice: [{hash: "hash_for_C3"}, {hash:
    

    “hash_for_C4”}] }`.

    • The function resolves with ["hash_for_C3", "hash_for_C4"].
  7. Construct URL:

    • url = window.perf.commit_range_url (e.g., "http://example.com/range/{begin}/{end}")
    • url = url.replace('{begin}', "hash_for_C3")
    • url = url.replace('{end}', "hash_for_C4")
    • _url becomes "http://example.com/range/hash_for_C3/hash_for_C4".
  8. Render:

    - Since `showLinks` is true, the template becomes: `<a
    

    href=“http://example.com/range/hash_for_C3/hash_for_C4” target=“_blank”>101 - 102` - The element updates its content with this HTML.

This workflow demonstrates how commit-range-sk encapsulates the logic for finding relevant commits, converting their identifiers, and presenting a user-friendly link to explore changes between them, abstracting away the complexities of interacting with commit data and URL templates.

Module: /modules/common

Common Module

The common module houses utility functions and data structures that are shared across various parts of the Perf application, particularly those related to data visualization and testing. Its primary purpose is to promote code reuse and maintain consistency in how data is processed and displayed.

Responsibilities and Key Components

The module's responsibilities can be broken down into the following areas:

  1. Plot Data Construction and Formatting:

    • Why: Visualizing performance data often involves transforming raw data into formats suitable for charting libraries (like Google Charts). This process needs to be standardized to ensure plots are consistent and correctly represent the underlying information.

    • How:

      • plot-builder.ts: This file is central to preparing data for plotting.

      • convertFromDataframe: This function is crucial for adapting data organized in a DataFrame structure (where traces are rows) into a format suitable for Google Charts, which typically expects data in columns. It essentially transposes the TraceSet. The domain parameter allows specifying whether the x-axis should represent commit positions, dates, or both, providing flexibility in how time-series data is visualized.

        Input DataFrame (TraceSet):
        TraceA: [val1, val2, val3]
        TraceB: [valA, valB, valC]
        Header: [commit1, commit2, commit3]
        
        convertFromDataframe (domain='commit') ->
        
        Output for Google Chart:
        ["Commit Position", "TraceA", "TraceB"]
        [commit1_offset,  val1,     valA    ]
        [commit2_offset,  val2,     valB    ]
        [commit3_offset,  val3,     valC    ]
        
      • ConvertData: This function takes a ChartData object, which is a more abstract representation of plot data (lines with x, y coordinates and labels), and transforms it into the specific array-of-arrays format required by Google Charts. This abstraction allows other parts of the application to work with ChartData without needing to know the exact details of the charting library's input format.

        Input ChartData:
        xLabel: "Time"
        lines: {
          "Line1": [{x: t1, y: v1}, {x: t2, y: v2}],
          "Line2": [{x: t1, y: vA}, {x: t2, y: vB}]
        }
        
        ConvertData ->
        
        Output for Google Chart:
        ["Time", "Line1", "Line2"]
        [t1,     v1,      vA     ]
        [t2,     v2,      vB     ]
        
      • mainChartOptions and SummaryChartOptions: These functions provide pre-configured option objects for Google Line Charts. They encapsulate common styling and behavior (like colors, axis formatting, tooltip behavior, and null interpolation) to ensure a consistent look and feel for different types of charts (main detail charts vs. summary overview charts). This avoids repetitive configuration and makes it easier to maintain visual consistency. The options are also designed to adapt to the current theme (light/dark mode) by using CSS custom properties.

      • defaultColors: A predefined array of colors used for chart series, ensuring a consistent and visually distinct palette.

  2. Plotting Utilities:

    • Why: Beyond basic data transformation, there are common tasks related to preparing data specifically for plotting, such as associating anomalies with data points and handling missing values.

    • How:

      • plot-util.ts: This file contains helper functions that build upon plot-builder.ts.

      • CreateChartDataFromTraceSet: This function serves as a higher-level constructor for ChartData. It takes a raw TraceSet (a dictionary where keys are trace identifiers and values are arrays of numbers), corresponding x-axis labels (commit numbers or dates), the desired x-axis format, and anomaly information. It then iterates through the traces, constructs DataPoint objects (which include x, y, and any associated anomaly), and organizes them into the ChartData structure. A key aspect is its handling of MISSING_DATA_SENTINEL to exclude missing points from the chart data, relying on the charting library's interpolation. It also uses findMatchingAnomaly to link anomalies to their respective data points.

        Input TraceSet:
        "trace_foo": [10, 12, MISSING_DATA_SENTINEL, 15]
        xLabels: [c1, c2, c3, c4]
        Anomalies: { "trace_foo": [{x: c2, y: 12, anomaly: {...}}] }
        
        CreateChartDataFromTraceSet ->
        
        Output ChartData:
        lines: {
          "trace_foo": [
            {x: c1, y: 10, anomaly: null},
            {x: c2, y: 12, anomaly: {...}},
            // Point for c3 is skipped due to MISSING_DATA_SENTINEL
            {x: c4, y: 15, anomaly: null}
          ]
        }
        ...
        
      • findMatchingAnomaly: A utility to efficiently check if a given data point (identified by its trace key, x-coordinate, and y-coordinate) corresponds to a known anomaly. This is used by CreateChartDataFromTraceSet to enrich data points with anomaly details.

  3. Test Utilities:

    • Why: Writing effective unit and integration tests, as well as creating demo pages, often requires mock data and simulated API responses. Centralizing these test utilities avoids duplication and makes tests easier to write and maintain.
    • How:
      • test-util.ts: This file provides functions to set up a common testing and demo environment.
      • setUpExploreDemoEnv: This is a comprehensive function that uses fetch-mock to intercept various API calls that are typically made by Perf frontend components (e.g., explore page, alert details). It returns predefined, static responses for endpoints like /_/login/status, /_/initpage/..., /_/count/, /_/frame/start, /_/defaults/, /_/status/..., /_/cid/, /_/details/, /_/shortcut/get, /_/nextParamList/, and /_/shortcut/update.
      • The purpose of mocking these endpoints is to allow frontend components to be tested or demonstrated in isolation, without requiring a live backend. The mocked data is designed to be representative of real API responses, enabling realistic testing scenarios. For example, it provides sample paramSet data, DataFrame structures, commit information, and default configurations. This ensures that components relying on these API calls behave predictably in a test or demo environment. The function also checks for a proxy_endpoint cookie to avoid mocking if a real backend is being proxied for development or demo purposes.

Module: /modules/const

The /modules/const module serves as a centralized repository for constants utilized throughout the Perf UI. Its primary purpose is to ensure consistency and maintainability by providing a single source of truth for values that are shared across different parts of the user interface.

A key design decision behind this module is to manage values that might also be defined in the backend. This avoids potential discrepancies and ensures that frontend and backend systems operate with the same understanding of specific sentinel values or configurations.

The core responsibility of this module is to define and export these shared constants.

One of the key components is the const.ts file. This file contains the actual definitions of the constants. A notable constant defined here is MISSING_DATA_SENTINEL.

The MISSING_DATA_SENTINEL constant (value: 1e32) is critical for representing missing data points within traces. The backend uses this specific floating-point value to indicate that a sample is absent. The choice of 1e32 is deliberate. JSON, the data interchange format used, does not natively support NaN (Not a Number) or infinity values (+/- Inf). Therefore, a valid float32 that has a compact JSON representation and is unlikely to clash with actual data values was chosen. It is imperative that this frontend constant remains synchronized with the MissingDataSentinel constant defined in the backend Go package //go/vec32/vec. This synchronization ensures that both the UI and the backend correctly interpret missing data.

Any part of the Perf UI that needs to interpret or display trace data, especially when dealing with potentially incomplete datasets, will rely on this MISSING_DATA_SENTINEL. For instance, charting libraries or data table components might use this constant to visually differentiate missing points or to exclude them from calculations.

Workflow involving MISSING_DATA_SENTINEL:

Backend Data Generation --> Data contains MissingDataSentinel from //go/vec32/vec | V Data Serialization (JSON) --> 1e32 is used for missing data | V Frontend Data Fetching | V Frontend UI Component (e.g., a chart) | V UI uses MISSING_DATA_SENTINEL from /modules/const/const.ts to identify missing points | V Appropriate rendering (e.g., gap in a line chart, specific placeholder in a table)

Module: /modules/csv

The /modules/csv module provides functionality to convert DataFrame objects, a core data structure representing performance or experimental data, into the Comma Separated Values (CSV) format. This conversion is essential for users who wish to export data for analysis in external tools, spreadsheets, or for archival purposes.

The primary challenge in converting a DataFrame to CSV lies in representing the potentially sparse and varied parameter sets associated with each trace (data series) in a flat, tabular format. The DataFrame stores traces indexed by a “trace ID,” which is a string encoding of key-value pairs representing the parameters that uniquely identify that trace.

The conversion process addresses this challenge through a multi-step approach:

  1. Parameter Key Consolidation:

    • The parseIdsIntoParams function takes an array of trace IDs and transforms each ID string back into its constituent key-value parameter pairs. This is achieved by leveraging the fromKey function from the //perf/modules/paramtools module.
    • The allParamKeysSorted function then iterates through all these parsed parameter sets to identify the complete, unique set of all parameter keys present across all traces. These keys are then sorted alphabetically. This sorted list of unique parameter keys will form the initial set of columns in the CSV, ensuring a consistent order and comprehensive representation of all parameters.

    Pseudocode for parameter key consolidation:

    traceIDs = ["key1=valueA,key2=valueB", "key1=valueC,key3=valueD"]
    parsedParams = {}
    for each id in traceIDs:
      parsedParams[id] = fromKey(id) // e.g., {"key1=valueA,key2=valueB": {key1:"valueA", key2:"valueB"}}
    
    allKeys = new Set()
    for each params in parsedParams.values():
      for each key in params.keys():
        allKeys.add(key)
    
    sortedColumnNames = sorted(Array.from(allKeys)) // e.g., ["key1", "key2", "key3"]
    
  2. Header Row Generation:

    • The dataFrameToCSV function begins by constructing the header row of the CSV.
    • This row starts with the sortedColumnNames derived in the previous step.
    • It then appends column headers derived from the DataFrame's header property. Each element in df.header typically represents a point in time (or a commit, build, etc.), and its timestamp field is converted into an ISO 8601 formatted date string.

    Pseudocode for header row generation:

    csvHeader = sortedColumnNames
    for each columnHeader in df.header:
      csvHeader.push(new Date(columnHeader.timestamp * 1000).toISOString())
    csvLines.push(csvHeader.join(','))
    
  3. Data Row Generation:

    • For each trace in the df.traceset (excluding “special_” traces, which are likely internal or metadata traces not intended for direct CSV export):
      • The corresponding parameter values for the sortedColumnNames are retrieved. If a trace does not have a value for a particular parameter key, an empty string is used, ensuring that each row has the same number of columns corresponding to the parameter keys.
      • The actual data points for the trace are then appended. The MISSING_DATA_SENTINEL (defined in //perf/modules/const) is a special value indicating missing data; this is converted to an empty string in the CSV to represent a null or missing value. Other numerical values are appended directly.
      • Each fully constructed row is then joined by commas.

    Pseudocode for data row generation:

    for each traceId, traceData in df.traceset:
      if traceId starts with "special_":
        continue
    
      traceParams = parsedParams[traceId]
      rowData = []
      for each columnName in sortedColumnNames:
        rowData.push(traceParams[columnName] or "") // Add parameter value or empty string
    
      for each value in traceData:
        if value is MISSING_DATA_SENTINEL:
          rowData.push("")
        else:
          rowData.push(value)
      csvLines.push(rowData.join(','))
    
  4. Final CSV String Assembly:

    • Finally, all the generated lines (header and data rows) are joined together with newline characters (\n) to produce the complete CSV string.

The design prioritizes creating a CSV that is both human-readable and easily parsable by other tools. By dynamically determining the parameter columns based on the input DataFrame and sorting them, it ensures that all relevant trace metadata is included in a consistent manner. The explicit handling of MISSING_DATA_SENTINEL ensures that missing data is represented clearly as empty fields.

The key files in this module are:

  • index.ts: This file contains the core logic for the CSV conversion. It houses the parseIdsIntoParams, allParamKeysSorted, and the main dataFrameToCSV functions. It leverages helper functions from //perf/modules/paramtools for parsing trace ID strings and relies on constants from //perf/modules/const for identifying missing data.
  • index_test.ts: This file provides unit tests for the dataFrameToCSV function. It defines a sample DataFrame with various scenarios, including different parameter sets per trace and missing data points, and asserts that the generated CSV matches the expected output. This is crucial for ensuring the correctness and robustness of the CSV generation logic.

The dependencies on //perf/modules/const (for MISSING_DATA_SENTINEL) and //perf/modules/json (for DataFrame, ColumnHeader, Params types) indicate that this module is tightly integrated with the broader data representation and handling mechanisms of the Perf system. The dependency on //perf/modules/paramtools (for fromKey) highlights its role in interpreting the structured information encoded within trace IDs.

Module: /modules/dataframe

The dataframe module is designed to manage and manipulate time-series data, specifically performance testing traces, within the Perf application. It provides a centralized way to fetch, store, and process trace data, enabling functionalities like visualizing performance trends, identifying anomalies, and managing user-reported issues.

The core idea is to have a reactive data repository that components can consume. This allows for efficient data loading and updates, especially when dealing with large datasets and dynamic time ranges. Instead of each component fetching and managing its own data, they can rely on a shared DataFrameRepository to handle these tasks. This promotes consistency and reduces redundant data fetching.

Key Components and Responsibilities

dataframe_context.ts

This file defines the DataFrameRepository class, which acts as the central data store and manager. It‘s implemented as a LitElement (<dataframe-repository-sk>) that doesn’t render any UI itself but provides data and loading states through Lit contexts.

Why a LitElement with Contexts? Using a LitElement allows easy integration into the existing component-based architecture. Lit contexts (@lit/context) provide a clean and reactive way for child components to consume the DataFrame and related information without prop drilling or complex event bus implementations.

Core Functionalities:

  • Data Fetching:

    • resetTraces(range, paramset): Fetches an initial set of traces based on a time range and a ParamSet (a set of key-value pairs defining the traces to query). This is typically called when the user defines a new query. User defines query -> explore-simple-sk calls resetTraces() | V DataFrameRepository -> Fetches data from /_/frame/start | V Updates internal _header, _traceset, anomaly, userIssues | V Provides DataFrame, DataTable, AnomalyMap, UserIssueMap via context
    • extendRange(offsetInSeconds): Fetches additional data to extend the current time range, either forwards or backwards. This is used for infinite scrolling or when the user wants to see more data. To improve performance for large range extensions, it slices the requested range into smaller chunks (chunkSize) and fetches them concurrently. User scrolls/requests more data -> UI calls extendRange() | V DataFrameRepository -> Slices range into chunks if needed | V Fetches data for each chunk from /_/frame/start concurrently | V Merges new data with existing _header, _traceset, anomaly | V Provides updated DataFrame, DataTable, AnomalyMap via context
    • The fetching mechanism uses the /_/frame/start endpoint, sending a FrameRequest which includes the time range, query (derived from ParamSet), and timezone.
    • It handles responses, including potential errors or “Finished” status with no data (e.g., no commits in the requested range).
  • Data Caching and Merging:

    • Maintains an internal representation of the data: _header (array of ColumnHeader objects, representing commit points/timestamps) and _traceset (a TraceSet object mapping trace keys to their data arrays).
    • When new data is fetched (either initial load or extension), it‘s merged with the existing cached data. The merging logic ensures that headers are correctly ordered and trace data is appropriately prepended or appended. If a trace being extended isn’t present in a new data chunk, it's padded with MISSING_DATA_SENTINEL to maintain alignment with the header.
  • Anomaly Management:

    • Fetches anomaly data (AnomalyMap) along with the trace data.
    • updateAnomalies(anomalies, id): Allows merging new anomalies and removing specific anomalies (e.g., when an anomaly is nudged or re-triaged). This uses mergeAnomaly and removeAnomaly from index.ts.
  • User-Reported Issue Management:

    • getUserIssues(traceKeys, begin, end): Fetches user-reported issues (e.g., Buganizer bugs linked to specific data points) from the /_/user_issues/ endpoint for a given set of traces and commit range.
    • updateUserIssue(traceKey, commitPosition, bugId): Updates the local cache of user issues, typically after a new issue is filed or an existing one is modified.
    • Trace keys are normalized by removing special functions (e.g., norm()) before querying for user issues to ensure issues are found even if the displayed trace is a transformed version of the original.
  • Google DataTable Conversion:

    • Converts the internal DataFrame into a google.visualization.DataTable format using convertFromDataframe (from perf/modules/common:plot-builder_ts_lib). This DataTable is then provided via dataTableContext and is typically consumed by charting components like <plot-google-chart-sk>.
    • The Google Chart library is loaded asynchronously (DataFrameRepository.loadPromise).
  • State Management:

    • loading: A boolean provided via dataframeLoadingContext to indicate if a data request is in flight.
    • _requestComplete: A Promise that resolves when the current data fetching operation completes. This can be used to coordinate actions that depend on data being available.

Contexts Provided:

  • dataframeContext: Provides the current DataFrame object.
  • dataTableContext: Provides the google.visualization.DataTable derived from the DataFrame.
  • dataframeAnomalyContext: Provides the AnomalyMap for the current data.
  • dataframeUserIssueContext: Provides the UserIssueMap for the current data.
  • dataframeLoadingContext: Provides a boolean indicating if data is currently being loaded.
  • dataframeRepoContext: Provides the DataFrameRepository instance itself, allowing consumers to call its methods (e.g., extendRange).

index.ts

This file contains utility functions for manipulating DataFrame structures, similar to its Go counterpart (//perf/go/dataframe/dataframe.go). These functions are crucial for merging, slicing, and analyzing the data.

Key Functions:

  • findSubDataframe(header, range, domain): Given a DataFrame header and a time/offset range, this function finds the start and end indices within the header that correspond to the given range. This is essential for slicing data.
  • generateSubDataframe(dataframe, range): Creates a new DataFrame containing only the data within the specified index range of the original DataFrame.
  • mergeAnomaly(anomaly1, ...anomalies): Merges multiple AnomalyMap objects into a single one. If anomalies exist for the same trace and commit, the later ones in the arguments list will overwrite earlier ones. It always returns a non-null AnomalyMap.
  • removeAnomaly(anomalies, id): Creates a new AnomalyMap excluding any anomalies with the specified id. This is used when an anomaly is moved or re-triaged on the backend, and the old entry needs to be cleared.
  • findAnomalyInRange(allAnomaly, range): Filters an AnomalyMap to include only anomalies whose commit positions fall within the given commit range.
  • mergeColumnHeaders(a, b): Merges two arrays of ColumnHeader objects, producing a new sorted array of unique headers. It also returns mapping objects (aMap, bMap) that indicate the new index of each header from the original arrays. This is fundamental for the join operation.
    • Why map objects? When merging traces from two DataFrames, the data points need to be placed at the correct positions in the newly merged header. The maps provide this correspondence.
  • join(a, b): Combines two DataFrame objects into a new one.
    1. It first merges their headers using mergeColumnHeaders.
    2. Then, it creates a new traceset. For each trace in the original DataFrames, it uses the aMap and bMap to place the trace data points into the correct slots in the new, longer trace arrays, filling gaps with MISSING_DATA_SENTINEL.
    3. It also merges the paramset from both DataFrames.
    4. Purpose: This is useful when combining data from different sources or different time periods that might not perfectly align.
  • buildParamSet(d): Reconstructs the paramset of a DataFrame based on the keys present in its traceset. This ensures the paramset accurately reflects the data.
  • timestampBounds(df): Returns the earliest and latest timestamps present in the DataFrame's header.

traceset.ts

This file provides utility functions for extracting and formatting information from the trace keys within a DataFrame or DataTable. Trace keys are strings that encode various parameters (e.g., ",benchmark=Speedometer,test=MotionMark,").

Key Functions:

  • getAttributes(df): Extracts all unique attribute keys (e.g., “benchmark”, “test”) present across all trace keys in a DataFrame.
  • getTitle(dt): Identifies the common key-value pairs across all trace labels in a DataTable. These common pairs form the “title” of the chart, representing what all displayed traces have in common.
    • Why DataTable input? This function is often used directly with the DataTable that feeds a chart, as column labels in the DataTable are typically the trace keys.
  • getLegend(dt): Identifies the key-value pairs that are not common across all trace labels in a DataTable. These differing parts form the “legend” for each trace, distinguishing them from one another.
    • It ensures that all legend objects have the same set of keys (sorted alphabetically), filling in missing values with "untitled_key" for consistency in display.
  • titleFormatter(title): Formats the output of getTitle (an object) into a human-readable string, typically by joining values with ‘/’.
  • legendFormatter(legend): Formats the output of getLegend (an array of objects) into an array of human-readable strings.
  • getLegendKeysTitle(label): Takes a legend object (for a single trace) and creates a string by joining its keys, often used as a title for the legend section.
  • isSingleTrace(dt): Checks if a DataTable contains data for only a single trace (i.e., has 3 columns: domain, commit position/date, and one trace).
  • findTraceByLabel(dt, legendTraceId): Finds the column label (trace key) in a DataTable that matches the given legendTraceId.
  • findTracesForParam(dt, paramKey, paramValue): Finds all trace labels in a DataTable that contain a specific key-value pair.
  • removeSpecialFunctions(key): A helper used internally to strip function wrappers (like norm(...)) from trace keys before processing, ensuring that the underlying parameters are correctly parsed.

Design Rationale for Title/Legend Generation: When multiple traces are plotted, the title should reflect what‘s common among them (e.g., “benchmark=Speedometer”), and the legend should highlight what’s different (e.g., “test=Run1” vs. “test=Run2”). These functions automate this process by analyzing the trace keys.

Workflows

Initial Data Load and Display

1. User navigates to a page or submits a query.
   |
   V
2. <explore-simple-sk> (or similar component) determines initial time range and ParamSet.
   |
   V
3. Calls `dataframeRepository.resetTraces(initialRange, initialParamSet)`.
   |
   V
4. DataFrameRepository:
   a. Sets `loading = true`.
   b. Constructs `FrameRequest`.
   c. POSTs to `/_/frame/start`.
   d. Receives `FrameResponse` (containing DataFrame and AnomalyMap).
   e. Updates its internal `_header`, `_traceset`, `anomaly`.
   f. Calls `setDataFrame()`:
      i. Updates `this.dataframe` (triggers `dataframeContext`).
      ii. Converts DataFrame to `google.visualization.DataTable`.
      iii. Updates `this.data` (triggers `dataTableContext`).
   g. Updates `this.anomaly` (triggers `dataframeAnomalyContext`).
   h. Sets `loading = false`.
   |
   V
5. Charting components (consuming `dataTableContext`) re-render with the new data.
   |
   V
6. Other UI elements (consuming `dataframeContext`, `dataframeAnomalyContext`) update.

Extending Time Range (e.g., Scrolling)

1. User action triggers a request to load more data (e.g., scrolls near edge of chart).
   |
   V
2. UI component calls `dataframeRepository.extendRange(offsetInSeconds)`.
   |
   V
3. DataFrameRepository:
   a. Sets `loading = true`.
   b. Calculates the new time range (`deltaRange`).
   c. Slices the new range into chunks if `offsetInSeconds` is large (`sliceRange`).
   d. For each chunk:
      i. Constructs `FrameRequest`.
      ii. POSTs to `/_/frame/start`.
   e. `Promise.all` awaits all chunk responses.
   f. Filters out empty/error responses and sorts responses by timestamp.
   g. Merges `header` and `traceset` from sorted responses into existing `_header` and `_traceset`.
      - For traceset: pads with `MISSING_DATA_SENTINEL` if a trace is missing in a new chunk.
   h. Merges `anomalymap` from sorted responses into existing `anomaly`.
   i. Calls `setDataFrame()` (as in initial load).
   j. Sets `loading = false`.
   |
   V
4. Charting components and other UI elements update.

Displaying Chart Title and Legend

1. Charting component (e.g., <perf-explore-sk>) has access to the `DataTable` via `dataTableContext`.
   |
   V
2. It calls `getTitle(dataTable)` and `getLegend(dataTable)` from `traceset.ts`.
   |
   V
3. It then uses `titleFormatter` and `legendFormatter` to get displayable strings.
   |
   V
4. Renders these strings as the chart title and legend series.

Testing

  • dataframe_context_test.ts: Tests the DataFrameRepository class. It uses fetch-mock to simulate API responses from /_/frame/start and /_/user_issues/. Tests cover initialization, data loading (resetTraces), range extension (extendRange) with and without chunking, anomaly merging, and user issue fetching/updating.
  • index_test.ts: Tests the utility functions in index.ts, such as mergeColumnHeaders, join, findSubDataframe, mergeAnomaly, etc. It uses manually constructed DataFrame objects to verify the logic of these data manipulation functions.
  • traceset_test.ts: Tests the functions in traceset.ts for extracting titles and legends from trace keys. It generates DataFrame objects with various key combinations, converts them to DataTable (requiring Google Chart API to be loaded), and then asserts the output of getTitle, getLegend, etc.
  • test_utils.ts: Provides helper functions for tests, notably:
    • generateFullDataFrame: Creates mock DataFrame objects with specified structures, which is invaluable for setting up consistent test scenarios.
    • generateAnomalyMap: Creates mock AnomalyMap objects linked to a DataFrame.
    • mockFrameStart: A utility to easily mock the /_/frame/start endpoint with fetch-mock, returning parts of a provided full DataFrame based on the request's time range.
    • mockUserIssues: Mocks the /_/user_issues/ endpoint.

The testing strategy relies heavily on creating controlled mock data and API responses to ensure that the data processing and fetching logic behaves as expected under various conditions.

Module: /modules/day-range-sk

The day-range-sk module provides a custom HTML element for selecting a date range. It allows users to pick a “begin” and “end” date, which is a common requirement in applications that deal with time-series data or event logging.

The primary goal of this module is to offer a user-friendly way to define a time interval. It achieves this by composing two calendar-input-sk elements, one for the start date and one for the end date. This design choice leverages an existing, well-tested component for date selection, promoting code reuse and consistency.

Key Components and Responsibilities:

  • day-range-sk.ts: This is the core file defining the DayRangeSk custom element.

    • Why: It encapsulates the logic for managing the begin and end dates, handling user interactions, and emitting an event when the range changes.
    • How:
    • It extends ElementSk, a base class for custom elements, providing lifecycle callbacks and rendering capabilities.
    • It uses the lit-html library for templating, rendering two calendar-input-sk elements labeled “Begin” and “End”.
    • The begin and end dates are stored as attributes (and corresponding properties) representing Unix timestamps in seconds. This is a common and unambiguous way to represent points in time.
    • When either calendar-input-sk element fires an input event (signifying a date change), the DayRangeSk element updates its corresponding begin or end attribute and then dispatches a custom event named day-range-change.
    • The day-range-change event's detail object contains the begin and end timestamps, allowing parent components to easily consume the selected range.
    • Default values for begin and end are set if not provided: begin defaults to 24 hours before the current time, and end defaults to the current time. This provides a sensible initial state.
    • The connectedCallback and attributeChangedCallback are used to ensure the element renders correctly when added to the DOM or when its attributes are modified.
  • day-range-sk.scss: This file contains the styling for the day-range-sk element.

    • Why: To provide a consistent visual appearance and integrate with the application's theming.
    • How: It imports common theme variables (themes.scss) and defines specific styles for the labels and input fields within the day-range-sk component, ensuring they adapt to light and dark modes.
  • day-range-sk-demo.html and day-range-sk-demo.ts: These files provide a demonstration page for the day-range-sk element.

    • Why: To showcase the element's functionality, allow for interactive testing, and serve as an example of how to use it.
    • How:
    • The HTML file includes instances of day-range-sk with different initial begin and end attributes.
    • The TypeScript file listens for the day-range-change event from these instances and displays the event details in a <pre> tag, demonstrating how to retrieve the selected date range.
  • day-range-sk_puppeteer_test.ts: This file contains Puppeteer tests for the day-range-sk element.

    • Why: To ensure the element renders correctly and behaves as expected in a browser environment.
    • How: It uses the loadCachedTestBed utility to set up a testing environment, navigates to the demo page, and takes screenshots for visual regression testing. It also performs a basic smoke test to confirm the element is present on the page.

Key Workflows:

  1. Initialization: User HTML -> day-range-sk (attributes: begin, end) day-range-sk.connectedCallback() IF begin/end not set Set default begin (now - 24h), end (now) _render() Create two <calendar-input-sk> elements with initial dates

  2. User Selects a New “Begin” Date: User interacts with "Begin" <calendar-input-sk> <calendar-input-sk> fires "input" event (with new Date) day-range-sk._beginChanged(event) Update this.begin (convert Date to timestamp) this._sendEvent() Dispatch "day-range-change" event with { begin: new_begin_timestamp, end: current_end_timestamp }

  3. User Selects a New “End” Date: User interacts with "End" <calendar-input-sk> <calendar-input-sk> fires "input" event (with new Date) day-range-sk._endChanged(event) Update this.end (convert Date to timestamp) this._sendEvent() Dispatch "day-range-change" event with { begin: current_begin_timestamp, end: new_end_timestamp }

  4. Parent Component Consumes Date Range: Parent Component Listen for "day-range-change" on <day-range-sk> On event: Access event.detail.begin Access event.detail.end Perform actions with the new date range

The conversion between Date objects (used by calendar-input-sk) and numeric timestamps (used by day-range-sk's attributes and events) is handled internally by the dateFromTimestamp utility function and by using Date.prototype.valueOf() / 1000. This design ensures that the day-range-sk element exposes a simple, numeric API for its date range while leveraging a more complex date object-based component for the UI.

Module: /modules/domain-picker-sk

The domain-picker-sk module provides a custom HTML element <domain-picker-sk> that allows users to select a data domain. This domain can be defined in two ways: either as a specific date range or as a number of data points (commits) preceding a chosen end date. This flexibility is crucial for applications that need to visualize or analyze time-series data where users might want to focus on a specific period or view the most recent N data points.

The core design choice is to offer these two distinct modes of domain selection, catering to different user needs. The “Date Range” mode is useful when users know the specific start and end dates they are interested in. The “Dense” mode is more suitable when users want to see a fixed amount of recent data, regardless of the specific start date.

The component's state is managed internally and can also be set externally via the state property. This state object, defined by the DomainPickerState interface, holds the begin and end timestamps (in Unix seconds), the num_commits (for “Dense” mode), and the request_type which indicates the current selection mode (0 for “Date Range” - RANGE, 1 for “Dense” - DENSE).

Key Files and Their Responsibilities:

  • domain-picker-sk.ts: This is the heart of the module. It defines the DomainPickerSk class, which extends ElementSk.

    • Why: It encapsulates all the logic for rendering the UI, handling user interactions, and managing the component's state.
    • How:
    • It uses the lit-html library for templating, allowing for efficient updates to the DOM when the state changes. The template static method defines the basic structure, and _showRadio and _requestType static methods conditionally render different parts of the UI based on the current request_type and the force_request_type attribute.
    • It manages the _state object. Initial default values are set in the constructor (e.g., end date is now, begin date is 24 hours ago, default num_commits is 50).
    • Event handlers like typeRange, typeDense, beginChange, endChange, and numChanged update the internal _state and then call render() to reflect these changes in the UI.
    • The force_request_type attribute ('range' or 'dense') allows the consuming application to lock the picker into a specific mode, hiding the radio buttons that would normally allow the user to switch. This is useful when the application context dictates a specific type of domain selection. The attributeChangedCallback and the getter/setter for force_request_type handle this.
    • It leverages other custom elements: radio-sk for mode selection and calendar-input-sk for date picking, promoting modularity and reuse.
  • domain-picker-sk.scss: This file contains the SASS styles for the component.

    • Why: It separates the presentation from the logic, making the component easier to style and maintain.
    • How: It defines styles for the layout of controls (e.g., using flexbox to align items), descriptive text, input fields, and the calendar input. It also imports shared styles from elements-sk/modules/styles for consistency (e.g., buttons, colors).
  • index.ts: A simple entry point that imports and registers the domain-picker-sk custom element.

    • Why: This is a common pattern for web components, making it easy for other parts of the application to import and use the component.
    • How: It executes import './domain-picker-sk'; which ensures the DomainPickerSk class is defined and registered with the browser's CustomElementRegistry via the define function call within domain-picker-sk.ts.
  • domain-picker-sk-demo.html and domain-picker-sk-demo.ts: These files provide a demonstration page for the component.

    • Why: They allow developers to see the component in action, test its different states and attributes, and serve as a basic example of how to use it.
    • How: domain-picker-sk-demo.html includes instances of <domain-picker-sk>, some with the force_request_type attribute set. domain-picker-sk-demo.ts initializes the state of these demo instances with sample data.
  • domain-picker-sk_puppeteer_test.ts: Contains Puppeteer tests for the component.

    • Why: To ensure the component renders correctly and behaves as expected in a browser environment.
    • How: It uses the puppeteer-tests/util library to load the demo page and take screenshots, verifying the visual appearance of the component in its default state.

Key Workflows/Processes:

  1. Initialization and Rendering:

    • <domain-picker-sk> element is added to the DOM.
    • connectedCallback is invoked.
    • Properties like state and force_request_type are upgraded (if set as attributes before the element was defined).
    • Default _state is established (e.g., end = now, begin = 24h ago, mode = RANGE).
    • render() is called:
      • It checks force_request_type. If set, it overrides _state.request_type.
      • The main template is rendered.
      • _showRadio decides whether to show mode selection radio buttons.
      • _requestType renders either the “Begin” date input (for RANGE mode) or the “Points” number input (for DENSE mode).
    [DOM Insertion] -> connectedCallback() -> _upgradeProperty('state')
                                           -> _upgradeProperty('force_request_type')
                                           -> render()
                                                |
                                                V
                                          [UI Displayed]
    
  2. User Changes Mode (if force_request_type is not set):

    • User clicks on “Date Range” or “Dense” radio button.
    • @change event triggers typeRange() or typeDense().
    • _state.request_type is updated.
    • render() is called.
    • The UI updates to show the relevant inputs (Begin date vs. Points).
    [User clicks radio] -> typeRange()/typeDense() -> _state.request_type updated
                                                 -> render()
                                                      |
                                                      V
                                                [UI Updates]
    
  3. User Changes Date/Number of Commits:

    • User interacts with <calendar-input-sk> (for Begin/End dates) or the <input type="number"> (for Points).
    • @input (for calendar) or @change (for number input) event triggers beginChange(), endChange(), or numChanged().
    • The corresponding part of _state (e.g., _state.begin, _state.end, _state.num_commits) is updated.
    • render() is called (though in the case of date changes, the <calendar-input-sk> handles its own visual update for the date display, and render() here ensures the parent component is aware and can re-render if other parts depend on these values, although in the current implementation, render() on the parent might be redundant for just date changes if no other part of this component's template changes directly).
    [User changes input] -> beginChange()/endChange()/numChanged()
                            |
                            V
                        _state updated
                            |
                            V
                          render()  // Potentially re-renders the component
                            |
                            V
                      [UI reflects new value]
    

The component emits no custom events itself but relies on the events from its child components (radio-sk, calendar-input-sk) to trigger internal state updates and re-renders. Consumers of domain-picker-sk would typically read the state property to get the user's selection.

Module: /modules/errorMessage

The errorMessage module provides a wrapper around the errorMessage function from the elements-sk library. Its primary purpose is to offer a more convenient way to display persistent error messages to the user.

Core Functionality and Design Rationale:

The key differentiation of this module lies in its default behavior for message display duration. While the elements-sk errorMessage function requires a duration to be specified for how long a message (often referred to as a “toast”) remains visible, this module defaults the duration to 0 seconds.

This design choice is intentional: a duration of 0 typically signifies that the error message will not automatically close. This is particularly useful in scenarios where an error is critical or requires user acknowledgment, and an auto-dismissing message might be missed. By defaulting to a persistent display, the module prioritizes ensuring the user is aware of the error.

Responsibilities and Key Components:

The module exposes a single function: errorMessage.

  • errorMessage(message: string | { message: string } | { resp: Response } | object, duration: number = 0): void:
    • This function is responsible for displaying an error message to the user.
    • It accepts the same flexible message parameter as the underlying elements-sk function. This means it can handle plain strings, objects with a message property, objects containing a Response object (from which an error message can often be extracted), or generic objects.
    • The crucial aspect is the duration parameter. If not explicitly provided by the caller, it defaults to 0. This default triggers the persistent display behavior mentioned above.
    • Internally, this function simply calls elementsErrorMessage from the elements-sk library, passing along the provided message and the (potentially defaulted) duration.

Workflow:

The typical workflow for using this module is straightforward:

  1. Import: The errorMessage function is imported from this module.
  2. Invocation: When an error condition occurs that needs to be communicated to the user persistently, the errorMessage function is called with the error details.
    • errorMessage("A critical error occurred.") -> Displays “A critical error occurred.” indefinitely.
    • errorMessage("Something went wrong.", 5000) -> Displays “Something went wrong.” for 5 seconds (overriding the default).

Essentially, this module acts as a thin convenience layer, promoting a specific error display pattern (persistent messages) by changing the default behavior of a more general utility. This reduces boilerplate for common use cases where persistent error notification is desired.

Module: /modules/existing-bug-dialog-sk

The existing-bug-dialog-sk module provides a user interface element for associating performance anomalies with existing bug reports in a bug tracking system (like Monorail). It's designed to be used within a larger performance monitoring application where users need to triage and manage alerts generated by performance regressions.

The core purpose of this module is to simplify the workflow of linking one or more detected anomalies to a pre-existing bug. Instead of manually navigating to the bug tracker and updating the bug, users can do this directly from the performance monitoring interface. This reduces context switching and streamlines the bug management process.

Key Components and Responsibilities:

  • existing-bug-dialog-sk.ts: This is the heart of the module, defining the custom HTML element existing-bug-dialog-sk.

    • Why: It encapsulates the entire UI and logic for the dialog. This includes displaying a form for entering a bug ID, a dropdown for selecting the bug tracking project (though currently hardcoded to ‘chromium’), and a list of already associated bugs for the selected anomalies.
    • How:
    • It uses Lit for templating and rendering the dialog's HTML structure.
    • It manages the dialog's visibility (open(), closeDialog()).
    • It handles form submission:
      • Takes the entered bug ID and the list of selected anomalies (_anomalies).
      • Makes an HTTP POST request to a backend endpoint (/_/triage/associate_alerts) to create the association.
      • Upon success, it opens the bug page in a new tab and dispatches a custom event anomaly-changed. This event signals other parts of the application (e.g., charts or lists displaying anomalies) that the anomaly data has been updated (specifically, the bug_id field) and they might need to re-render.
      • Handles potential errors by displaying an error message toast.
    • It fetches and displays a list of bugs already associated with the anomalies in the current group. This involves:
      • Making a POST request to /_/anomalies/group_report to get details of anomalies in the same group, including their associated bug_ids. This endpoint might return a sid (state ID) if the report generation is asynchronous, requiring a follow-up request.
      • Once the list of associated bug IDs is retrieved, it makes another POST request to /_/triage/list_issues to fetch the titles of these bugs. This provides more context to the user than just showing bug IDs.
    • The setAnomalies() method is crucial for initializing the dialog with the relevant anomaly data when it's about to be shown.
    • It relies on window.perf.bug_host_url to construct links to the bug tracker.
  • existing-bug-dialog-sk.scss: This file contains the SASS/CSS styles for the dialog.

    • Why: It ensures the dialog has a consistent look and feel with the rest of the application, using shared theme variables (--on-background, --background, etc.).
    • How: It defines styles for the dialog container, input fields, buttons, close icon, and the list of associated bugs. It also includes specific styling for the loading spinner and selected items.
  • index.ts: This is a simple entry point that imports and registers the existing-bug-dialog-sk custom element, making it available for use in HTML.

Workflow for Associating Anomalies with an Existing Bug:

  1. User Action: The user selects one or more anomalies in the main application interface and chooses an option to associate them with an existing bug.
  2. Dialog Initialization: The application calls setAnomalies() on an existing-bug-dialog-sk instance, passing the selected anomalies.
  3. Dialog Display: The application calls open() on the dialog instance. Application existing-bug-dialog-sk | | | -- setAnomalies(anomalies) --> | | | | ------ open() ---------> | | | | -- fetch_associated_bugs() --> Backend API (/anomalies/group_report) | | | <-- (Associated Bug IDs) -- | | | | -- fetch_bug_titles() ---> Backend API (/triage/list_issues) | | | <--- (Bug Titles) -------- | | | | -- Renders Dialog with form & associated bugs list --
  4. User Interaction:
    • The user sees the dialog.
    • If there are other anomalies in the same group already linked to bugs, these bugs (ID and title) are listed.
    • The user enters a Bug ID into the input field.
    • The user clicks the “Submit” button.
  5. Form Submission and Backend Communication: existing-bug-dialog-sk | | -- (User Submits Form) --> | | | | -- _spinner.active = true --> (UI Update: Show spinner) | | | -- fetch('/_/triage/associate_alerts', POST, {bug_id, keys}) --> Backend API | | | <---- (Success/Failure) ---- | | | | -- _spinner.active = false -> (UI Update: Hide spinner) | | | -- IF Success: | | | -- closeDialog() ------> (UI Update: Hide dialog) | | | | | -- window.open(bug_url) -> (Opens bug in new tab) | | | | | -- dispatchEvent('anomaly-changed') --> Application (Notifies other components) | | | -- IF Failure: | | | -- errorMessage(msg) -> (UI Update: Show error toast)
  6. Outcome:
    • Success: The anomalies are linked to the specified bug in the backend. The dialog closes, the bug page opens in a new tab, and other parts of the UI (listening for anomaly-changed) update to reflect the new association.
    • Failure: An error message is shown, and the dialog remains open, allowing the user to try again or correct the input.

The design prioritizes a clear and focused user experience for a common task in performance alert triaging. By integrating directly with the backend API for bug association and fetching related bug information, it aims to be an efficient tool for developers and SREs. The use of custom events allows for loose coupling with other components in the larger application.

Module: /modules/explore-multi-sk

explore-multi-sk Module

Overview

The explore-multi-sk module provides a user interface for displaying and interacting with multiple performance data graphs simultaneously. This is particularly useful when users need to compare different metrics, configurations, or time ranges side-by-side. The core idea is to leverage the functionality of individual explore-simple-sk elements, which represent single graphs, and manage their states and interactions within a unified multi-graph view.

Key Design Decisions and Implementation Choices

State Management: A central State object within explore-multi-sk manages properties that are common across all displayed graphs. These include the time range (begin, end), display options (showZero, dots), and pagination settings (pageSize, pageOffset). This approach simplifies the overall state management and keeps the URL from becoming overly complex, as only a limited set of shared parameters need to be reflected in the URL.

Each individual graph (explore-simple-sk instance) maintains its own specific state related to the data it displays (formulas, queries, selected keys). explore-multi-sk stores an array of GraphConfig objects, where each object corresponds to an explore-simple-sk instance and holds its unique configuration.

The stateReflector utility is used to synchronize the shared State with the URL, allowing for bookmarking and sharing of multi-graph views.

Dynamic Graph Addition and Removal: Users can dynamically add new graphs to the view. When a new graph is added, an empty explore-simple-sk instance is created and the user can then configure its data source (query or formula).

If the useTestPicker option is enabled (often determined by backend defaults), instead of a simple “Add Graph” button, a test-picker-sk element is displayed. This component provides a more structured way to select tests and parameters, and upon selection, a new graph is automatically generated and populated.

Graphs can also be removed. Event listeners are in place to handle remove-explore custom events, which are typically dispatched by the individual explore-simple-sk elements when a user closes them in a “Multiview” context (where useTestPicker is active).

Pagination: To handle potentially large numbers of graphs, pagination is implemented using the pagination-sk element. This allows users to view a subset of the total graphs at a time, improving performance and usability. The pageSize and pageOffset are part of the shared state.

Graph Manipulation (Split and Merge):

  • Split Graph: If a single graph displaying multiple traces is present, the “Split Graph” functionality allows the user to create separate graphs for each of those traces. This is useful for focusing on individual trends that were previously combined.
  • Merge Graphs: Conversely, the “Merge Graphs” functionality takes all currently displayed graphs and combines their traces into a single graph. This can be helpful for seeing an aggregated view.

These operations primarily involve manipulating the graphConfigs array and then re-rendering the graphs.

Shortcuts: The module supports saving and loading multi-graph configurations using shortcuts. When the configuration of graphs changes (traces added/removed, graphs split/merged), updateShortcutMultiview is called. This function communicates with a backend service (/_/shortcut/get and a corresponding save endpoint invoked by updateShortcut from explore-simple-sk) to store or retrieve the graphConfigs associated with a unique shortcut ID. This ID is then reflected in the URL, allowing users to share specific multi-graph setups.

Synchronization of Interactions:

  • X-Axis Label: When the x-axis label (e.g., switching between commit number and date) is toggled on one graph, a custom event x-axis-toggled is dispatched. explore-multi-sk listens for this and updates the x-axis on all other visible graphs to maintain consistency.
  • Chart Selection (Plot Summary): While not explicitly detailed in explore-multi-sk.ts, the explore-simple-sk component likely has mechanisms for plot selection. If the plotSummary feature is active, selections on one graph might influence others, though the provided code for explore-multi-sk doesn't directly show this cross-graph selection synchronization logic, but it does have syncChartSelection which would handle this.

Defaults and Configuration: The component fetches default configurations from a /_/defaults/ endpoint. These defaults can influence various aspects, such as: - Whether to use test-picker-sk (useTestPicker). - Default parameters and their order for test-picker-sk (include_params, default_param_selections). This allows for instance-specific customization of the Perf UI.

Responsibilities and Key Components

  • explore-multi-sk.ts:

    • Responsibilities: This is the main TypeScript file defining the ExploreMultiSk custom element. It is responsible for:
    • Managing the overall state of the multi-graph view (shared properties like time range, pagination).
    • Handling the addition, removal, and configuration of individual explore-simple-sk graph elements.
    • Interacting with the stateReflector to update the URL based on the shared state.
    • Implementing the “Split Graph” and “Merge Graphs” functionalities.
    • Managing pagination for the displayed graphs.
    • Fetching and applying default configurations.
    • Coordinating interactions between graphs (e.g., synchronizing x-axis labels).
    • Interacting with the test-picker-sk if enabled.
    • Handling user authentication status for features like “Add to Favorites”.
    • Managing shortcuts for saving and loading multi-graph configurations.
    • Key Interactions:
    • Creates and manages instances of explore-simple-sk.
    • Uses pagination-sk for displaying graphs in pages.
    • Uses test-picker-sk for adding graphs when useTestPicker is true.
    • Uses favorites-dialog-sk to allow users to save graph configurations.
    • Communicates with backend services for shortcuts and default configurations.
  • explore-multi-sk.html (Inferred from the Lit html template in explore-multi-sk.ts):

    • Responsibilities: Defines the structure of the explore-multi-sk element. This includes:
    • A menu section with buttons for “Add Graph”, “Split Graph”, “Merge Graphs”, and “Add to Favorites”.
    • The test-picker-sk element (conditionally visible).
    • pagination-sk elements for navigating through graph pages.
    • A container (#graphContainer) where the individual explore-simple-sk elements are dynamically rendered.
    • Key Components:
    • <button> elements for user actions.
    • <test-picker-sk> for test selection.
    • <pagination-sk> for graph pagination.
    • <favorites-dialog-sk> for saving favorites.
    • A div (#graphContainer) to hold the explore-simple-sk instances.
  • explore-multi-sk.scss:

    • Responsibilities: Provides the styling for the explore-multi-sk element and its children. It ensures that the layout is appropriate for displaying multiple graphs and their controls.
    • Key Aspects:
    • Styles the #menu and #pagination areas.
    • Defines the height of embedded explore-simple-sk plots.
    • Handles the conditional visibility of elements like #test-picker and #add-graph-button.

Key Workflows

1. Initial Load and State Restoration:

User navigates to URL with explore-multi-sk
    |
    V
explore-multi-sk.connectedCallback()
    |
    V
Fetch defaults from /_/defaults/
    |
    V
stateReflector() is initialized
    |
    V
State is read from URL (or defaults if URL is empty)
    |
    V
IF state.shortcut is present:
    Fetch graphConfigs from /_/shortcut/get using the shortcut ID
    |
    V
ELSE (or after fetching):
    For each graphConfig (or if starting fresh, often one empty graph is implied or added):
        Create/configure explore-simple-sk instance
        Set its state based on graphConfig and shared state
    |
    V
Add graphs to the current page based on pagination settings
    |
    V
Render the component

2. Adding a Graph (without Test Picker):

User clicks "Add Graph" button
    |
    V
explore-multi-sk.addEmptyGraph() is called
    |
    V
A new ExploreSimpleSk instance is created
A new empty GraphConfig is added to this.graphConfigs
    |
    V
explore-multi-sk.updatePageForNewExplore()
    |
    V
IF current page is full:
    Increment pageOffset (triggering pageChanged)
ELSE:
    Add new graph to current page
    |
    V
The new explore-simple-sk element might open its query dialog for the user

3. Adding a Graph (with Test Picker):

TestPickerSk is visible (due to defaults or state)
    |
    V
User interacts with TestPickerSk, selects tests/parameters
    |
    V
User clicks "Plot" button in TestPickerSk
    |
    V
TestPickerSk dispatches 'plot-button-clicked' event
    |
    V
explore-multi-sk listens for 'plot-button-clicked'
    |
    V
explore-multi-sk.addEmptyGraph(unshift=true) is called (new graph at the top)
    |
    V
explore-multi-sk.addGraphsToCurrentPage() updates the view
    |
    V
TestPickerSk.createQueryFromFieldData() gets the query
    |
    V
The new ExploreSimpleSk instance has its query set

4. Splitting a Graph:

User has one graph with multiple traces and clicks "Split Graph"
    |
    V
explore-multi-sk.splitGraph()
    |
    V
this.getTracesets() retrieves traces from the first (and only) graph
    |
    V
this.clearGraphs() removes the existing graph configuration
    |
    V
FOR EACH trace in the retrieved traceset:
    this.addEmptyGraph()
    A new GraphConfig is created for this trace (e.g., config.queries = [queryFromKey(trace)])
    |
    V
this.updateShortcutMultiview() (new shortcut reflecting multiple graphs)
    |
    V
this.state.pageOffset is reset to 0
    |
    V
this.addGraphsToCurrentPage() renders the new set of individual graphs

5. Saving/Updating a Shortcut:

Graph configuration changes (e.g., trace added/removed, graph split/merged, new graph added)
    |
    V
explore-multi-sk.updateShortcutMultiview() is called
    |
    V
Calls exploreSimpleSk.updateShortcut(this.graphConfigs)
    |
    V
(Inside updateShortcut)
IF graphConfigs is not empty:
    POST this.graphConfigs to backend (e.g., /_/shortcut/new or /_/shortcut/update)
    Backend returns a new or existing shortcut ID
    |
    V
explore-multi-sk.state.shortcut is updated with the new ID
    |
    V
this.stateHasChanged() is called, triggering stateReflector to update the URL

Module: /modules/explore-simple-sk

The explore-simple-sk module provides a custom HTML element for exploring and visualizing performance data. It allows users to query, plot, and analyze traces, identify anomalies, and interact with commit details. This element is a core component of the Perf application's data exploration interface.

Core Functionality:

The element's primary responsibility is to provide a user interface for:

  1. Querying Data: Users can construct queries to select specific traces based on various parameters.
  2. Plotting Traces: Selected traces are rendered on a graph, allowing for visual inspection of performance trends.
  3. Analyzing Data: Users can interact with the plot to zoom, pan, and select individual data points for detailed inspection.
  4. Anomaly Detection: The element integrates with anomaly detection services to highlight and manage performance regressions or improvements.
  5. Commit Details: Information about the commits associated with data points can be displayed, linking performance changes to specific code modifications.

Key Design Decisions and Implementation Choices:

  • State Management: The element's state (e.g., current query, time range, plot settings) is managed internally and reflected in the URL. This allows users to share specific views of the data and enables bookmarking. The State class in explore-simple-sk.ts defines the structure of this state.
  • Data Fetching: Data is fetched asynchronously from the backend using the /frame/start endpoint. The requestFrame method handles initiating these requests and processing the responses. The FrameRequest and FrameResponse types define the communication contract with the server.
  • Plotting Library: The module supports two plotting libraries: plot-simple-sk (a custom canvas-based plotter) and plot-google-chart-sk (which wraps Google Charts). The choice of plotter can be configured.
  • Component-Based Architecture: The UI is built using a collection of smaller, specialized custom elements (e.g., query-sk for query input, paramset-sk for displaying parameters, commit-detail-panel-sk for commit information). This promotes modularity and reusability.
  • Event-Driven Communication: Components communicate with each other and with the main explore-simple-sk element through custom events. For example, when a query changes in query-sk, it emits a query-change event that explore-simple-sk listens to.
  • Caching and Optimization: To improve performance, the element employs strategies like incremental data loading when panning and caching commit details.

Key Files and Components:

  • explore-simple-sk.ts: This is the main TypeScript file that defines the ExploreSimpleSk custom element. It handles:
    • State management and URL reflection.
    • Data fetching and processing.
    • Rendering the UI template.
    • Event handling and coordination between child components.
    • Interaction logic for plotting, zooming, selecting points, etc.
  • explore-simple-sk.html (embedded in explore-simple-sk.ts): This Lit-html template defines the structure of the element's UI. It includes placeholders for various child components and dynamic content.
  • explore-simple-sk.scss: This SCSS file provides the styling for the element and its components.
  • Child Components (imported in explore-simple-sk.ts):
    • query-sk: For constructing and managing queries.
    • paramset-sk: For displaying and interacting with parameter sets.
    • plot-simple-sk / plot-google-chart-sk: For rendering the plots.
    • commit-detail-panel-sk: For displaying commit information.
    • anomaly-sk: For displaying and managing anomalies.
    • Many other components for specific UI elements like dialogs, buttons, and icons.

Workflow Example: Plotting a Query

  1. User Interaction: The user interacts with the query-sk element to define a query.
  2. Event Emission: query-sk emits a query-change event with the new query.
  3. State Update: explore-simple-sk listens for this event, updates its internal state (specifically the queries array in the State object), and triggers a re-render.
  4. Data Request: explore-simple-sk constructs a FrameRequest based on the updated state and calls requestFrame to fetch data from the server. User Input (query-sk) -> Event (query-change) -> State Update (ExploreSimpleSk) -> Data Request (requestFrame)
  5. Data Processing: Upon receiving the FrameResponse, explore-simple-sk processes the data, updates its internal _dataframe object, and prepares the data for plotting.
  6. Plot Rendering: explore-simple-sk passes the processed data to the plot-simple-sk or plot-google-chart-sk element, which then renders the traces on the graph. Server Response (FrameResponse) -> Data Processing (ExploreSimpleSk) -> Plot Update (plot-simple-sk/plot-google-chart-sk) -> Visual Output
  7. URL Update: The state change is reflected in the URL, allowing the user to bookmark or share the current view.

This workflow illustrates the reactive nature of the element, where user interactions trigger state changes, which in turn lead to data fetching and UI updates.

Module: /modules/explore-sk

The explore-sk module serves as the primary user interface for exploring and analyzing performance data within the Perf application. It provides a comprehensive view for users to query, visualize, and interact with performance traces.

The core functionality of explore-sk is built upon the explore-simple-sk element. explore-sk acts as a wrapper, enhancing explore-simple-sk with additional features like user authentication integration, default configuration loading, and the optional test-picker-sk for more guided query construction.

Key Responsibilities and Components:

  • explore-sk.ts: This is the main TypeScript file defining the ExploreSk custom element.

    • Why: It orchestrates the interaction between various sub-components and manages the overall state of the exploration page.
    • How:
    • It initializes by fetching default configurations (e.g., query parameters, display settings) from a backend endpoint (/_/defaults/). This ensures that the exploration view is pre-configured with sensible starting points.
    • It integrates with alogin-sk to determine the logged-in user's status. This information is used to enable features like “favorites” if a user is logged in.
    • It utilizes stateReflector to persist and restore the state of the underlying explore-simple-sk element in the URL. This allows users to share specific views or bookmark their current exploration state.
    • It conditionally initializes and displays test-picker-sk. If the use_test_picker_query flag is set in the state (often via URL parameters or defaults), the test-picker-sk component is shown, providing a structured way to build queries based on available parameter keys and values.
    • It listens for events from test-picker-sk (e.g., plot-button-clicked, remove-all, populate-query) and translates these into actions on the explore-simple-sk element, such as adding new traces based on the selected test parameters or clearing the view.
    • It provides buttons like “View in multi-graph” and “Toggle Chart Style” which directly interact with methods exposed by explore-simple-sk.
  • explore-simple-sk (imported module): This is a fundamental building block that handles the core trace visualization, querying logic, and interaction with the graph.

    • Why: Encapsulates the complex logic of fetching trace data, rendering graphs, and handling user interactions like zooming, panning, and selecting traces.
    • How: explore-sk delegates most of the heavy lifting related to data exploration to this component. It passes down the initial state, default configurations, and user-specific settings.
  • test-picker-sk (imported module): A component that allows users to build queries by selecting from available test parameters and their values.

    • Why: Simplifies the query construction process, especially when dealing with a large number of possible parameters. It provides a more user-friendly alternative to manually typing complex query strings.
    • How: When active, it presents a UI for selecting dimensions and values. Upon user action (e.g., clicking a “plot” button), it emits an event with the constructed query, which explore-sk then uses to fetch and display the corresponding traces via explore-simple-sk. It can also be populated based on a highlighted trace, allowing users to quickly refine queries based on existing data.
  • favorites-dialog-sk (imported module): Enables users to save and manage their favorite query configurations.

    • Why: Provides a convenient way for users to quickly return to frequently used or important exploration views.
    • How: Integrated into explore-simple-sk and its functionality is enabled by explore-sk based on the user's login status.
  • State Management (stateReflector):

    • Why: To make the exploration state shareable and bookmarkable. Changes in the exploration view (queries, zoom levels, etc.) are reflected in the URL.
    • How: explore-sk uses stateReflector to listen for state changes in explore-simple-sk. When the state changes, stateReflector updates the URL. Conversely, when the page loads or the URL changes, stateReflector parses the URL and applies the state to explore-simple-sk.

Workflow Example: Initial Page Load with Test Picker

  1. explore-sk element is connected to the DOM.
  2. connectedCallback is invoked:
    • Renders its initial template.
    • Fetches default configurations from /_/defaults/.
    • stateReflector is initialized. If the URL contains state for explore-simple-sk, it's applied.
    • The state might indicate use_test_picker_query = true.
  3. If use_test_picker_query is true:
    • initializeTestPicker() is called.
    • test-picker-sk element is made visible.
    • test-picker-sk is initialized with parameters from the defaults (e.g., include_params, default_param_selections) or from existing queries in the state.
  4. User interacts with test-picker-sk to select desired test parameters.
  5. User clicks the “Plot” button within test-picker-sk.
  6. test-picker-sk emits a plot-button-clicked event.
  7. explore-sk listens for this event:
    • It retrieves the query constructed by test-picker-sk.
    • It calls exploreSimpleSk.addFromQueryOrFormula() to add the new traces to the graph.
  8. explore-simple-sk fetches the data, renders the traces, and emits a state_changed event.
  9. stateReflector captures this state_changed event and updates the URL to reflect the new query.

This workflow illustrates how explore-sk acts as a central coordinator, integrating various specialized components to provide a cohesive data exploration experience. The design emphasizes modularity, with explore-simple-sk handling the core plotting and test-picker-sk offering an alternative query input mechanism, all managed and presented by explore-sk.

Module: /modules/favorites-dialog-sk

The favorites-dialog-sk module provides a custom HTML element that displays a modal dialog for users to add or edit “favorites.” Favorites, in this context, are likely user-defined shortcuts or bookmarks to specific views or states within the application, identified by a name, description, and a URL.

Core Functionality and Design:

The primary purpose of this module is to present a user-friendly interface for managing these favorites. It‘s designed as a modal dialog to ensure that the user’s focus is on the task of adding or editing a favorite without distractions from the underlying page content.

Key Components:

  • favorites-dialog-sk.ts: This is the heart of the module, defining the FavoritesDialogSk custom element.

    • Why: It encapsulates the logic for displaying the dialog, handling user input, and interacting with a backend service to persist favorite data.
    • How:
    • It extends ElementSk, a base class for custom elements in the Skia infrastructure, providing a common foundation.
    • It uses the Lit library (lit/html.js) for templating, allowing for declarative and efficient rendering of the dialog's UI.
    • The open() method is the public API for triggering the dialog. It accepts optional parameters for pre-filling the form when editing an existing favorite. Crucially, it returns a Promise. This promise-based approach is a key design choice. It resolves when the favorite is successfully saved and rejects if the user cancels the dialog. This allows the calling code (likely a parent component managing the list of favorites) to react appropriately, for instance, by re-fetching the updated list of favorites only when a change has actually occurred.
    • Input fields for “Name,” “Description,” and “URL” capture the necessary information. The “Name” and “URL” fields are mandatory.
    • The confirm() method handles the submission logic. It performs basic validation (checking for empty name and URL) and then makes an HTTP POST request to either /_/favorites/new or /_/favorites/edit depending on whether a new favorite is being created or an existing one is being modified.
    • A spinner-sk element is used to provide visual feedback to the user during the asynchronous operation of saving the favorite.
    • Error handling is implemented using errorMessage to display issues to the user, such as network errors or validation failures from the backend.
    • The dismiss() method handles the cancellation of the dialog, rejecting the promise returned by open().
    • Input event handlers (filterName, filterDescription, filterUrl) update the component's internal state as the user types, and trigger re-renders via this._render().
  • favorites-dialog-sk.scss: This file contains the SASS styles for the dialog.

    • Why: It separates the presentation concerns from the JavaScript logic, making the component more maintainable.
    • How: It defines styles for the <dialog> element, input fields, labels, and buttons, ensuring a consistent look and feel within the application's theme (as indicated by @import '../themes/themes.scss';).
  • favorites-dialog-sk-demo.html / favorites-dialog-sk-demo.ts: These files provide a demonstration page for the favorites-dialog-sk element.

    • Why: This allows developers to see the component in isolation, test its functionality, and understand how to integrate it.
    • How: The HTML sets up a basic page with buttons to trigger the dialog in “new favorite” and “edit favorite” modes. The TypeScript file wires up event listeners on these buttons to call the open() method of the favorites-dialog-sk element with appropriate parameters.

Workflow: Adding/Editing a Favorite

A typical workflow involving this dialog would be:

  1. User Action: The user clicks a button (e.g., “Add Favorite” or an “Edit” icon next to an existing favorite) in the main application UI.

  2. Dialog Invocation: The event handler for this action calls the open() method of an instance of favorites-dialog-sk.

    • If adding a new favorite, open() might be called with minimal or no arguments, defaulting the URL to the current page.
    • If editing, open() is called with the favId, name, description, and url of the favorite to be edited.
    User clicks "Add New"  --> favoritesDialog.open('', '', '', 'current.page.url')
                                        |
                                        V
                                  Dialog Appears
                                        |
                                        V
    User fills form, clicks "Save" --> confirm() is called
                                        |
                                        V
                                 POST /_/favorites/new
                                        |
                                        V (Success)
                                  Dialog closes, open() Promise resolves
                                        |
                                        V
                             Calling component re-fetches favorites
    
    -------------------------------- OR ---------------------------------
    
    User clicks "Edit Favorite" --> favoritesDialog.open('id123', 'My Fav', 'Desc', 'fav.url.com')
                                        |
                                        V
                                  Dialog Appears (pre-filled)
                                        |
                                        V
    User modifies form, clicks "Save" --> confirm() is called
                                        |
                                        V
                                 POST /_/favorites/edit (with 'id123')
                                        |
                                        V (Success)
                                  Dialog closes, open() Promise resolves
                                        |
                                        V
                             Calling component re-fetches favorites
    
    -------------------------------- OR ---------------------------------
    
    User clicks "Cancel" or Close Icon --> dismiss() is called
                                        |
                                        V
                                  Dialog closes, open() Promise rejects
                                        |
                                        V
                             Calling component does nothing (no re-fetch)
    
  3. User Interaction: The user fills in or modifies the “Name,” “Description,” and “URL” fields in the dialog.

  4. Submission/Cancellation:

    • Save: The user clicks the “Save” button.
      • The confirm() method is invoked.
      • Input validation (name and URL not empty) is performed.
      • A fetch request is made to the backend API (/_/favorites/new or /_/favorites/edit).
      • A spinner is shown during the API call.
      • Upon successful completion, the dialog closes, and the Promise returned by open() resolves.
      • If the API call fails, an error message is displayed, and the dialog remains open (or the promise might reject depending on specific error handling in confirm).
    • Cancel: The user clicks the “Cancel” button or the close icon.
      • The dismiss() method is invoked.
      • The dialog closes.
      • The Promise returned by open() rejects.
  5. Post-Dialog Action: The component that initiated the dialog (e.g., a favorites-sk list component) uses the resolved/rejected state of the Promise to decide whether to refresh its list of favorites. This is a key aspect of the design – it avoids unnecessary re-fetches if the user simply cancels the dialog.

The design prioritizes a clear separation of concerns, using custom elements for UI encapsulation, SASS for styling, and a promise-based API for asynchronous operations and communication with parent components. This makes the favorites-dialog-sk a reusable and well-defined piece of UI for managing user favorites.

Module: /modules/favorites-sk

The favorites-sk module provides a user interface element for displaying and managing a user's “favorites”. Favorites are essentially bookmarked URLs, categorized into sections. This module allows users to view their favorited links, edit their details (name, description, URL), and delete them.

Core Functionality & Design:

The primary responsibility of favorites-sk is to fetch favorite data from a backend endpoint (/_/favorites/) and render it in a user-friendly way. It also handles interactions for modifying these favorites, such as editing and deleting.

  • Data Fetching and Rendering:

    • Upon connection to the DOM (connectedCallback), the element attempts to fetch the favorites configuration from the backend.
    • The fetched data, expected to be in a Favorites JSON format (defined in perf/modules/json), is stored in the favoritesConfig property.
    • The _render() method is called to update the display.
    • The rendering logic iterates through sections and then links within each section, generating an HTML table for display.
    • A key design choice is to distinguish “My Favorites” from other sections. “My Favorites” are displayed with “Edit” and “Delete” buttons, implying user ownership and modifiability. Other sections are presented as read-only.
  • Favorite Management:

    • Deletion:
    • When a user clicks the “Delete” button for a favorite in the “My Favorites” section, the deleteFavoriteConfirm method is invoked.
    • This method displays a standard browser confirmation dialog (window.confirm) to prevent accidental deletions.
    • If confirmed, deleteFavorite sends a POST request to /_/favorites/delete with the ID of the favorite to be removed.
    • After a successful deletion, the favorites list is re-fetched to reflect the change.
    • Editing:
    • Clicking the “Edit” button calls the editFavorite method.
    • This method interacts with a favorites-dialog-sk element (defined in perf/modules/favorites-dialog-sk).
    • The favorites-dialog-sk is responsible for presenting a modal dialog where the user can modify the favorite's name, description, and URL.
    • Upon successful editing (dialog submission), the favorites list is re-fetched.
  • Error Handling:

    • Network errors or non-OK responses during fetch operations (fetching favorites, deleting favorites) are caught.
    • An error message is displayed to the user via the errorMessage utility (from elements-sk/modules/errorMessage).

Key Components/Files:

  • favorites-sk.ts: This is the heart of the module. It defines the FavoritesSk custom element, extending ElementSk. It contains the logic for fetching, rendering, deleting, and initiating the editing of favorites.
    • constructor(): Initializes the element with its Lit-html template.
    • deleteFavorite(): Handles the asynchronous request to the backend for deleting a favorite.
    • deleteFavoriteConfirm(): Provides a confirmation step before actual deletion.
    • editFavorite(): Manages the interaction with the favorites-dialog-sk for editing.
    • template(): The static Lit-html template function that defines the overall structure of the element.
    • getSectionsTemplate(): A helper function that dynamically generates the HTML for displaying sections and their links based on favoritesConfig. It specifically adds edit/delete controls for the “My Favorites” section.
    • fetchFavorites(): Fetches the favorites data from the backend and triggers a re-render.
    • connectedCallback(): A lifecycle method that ensures favorites are fetched when the element is added to the page.
  • favorites-sk.scss: Provides the styling for the favorites-sk element, defining its layout, padding, colors for links, and table appearance.
  • index.ts: A simple entry point that imports and registers the favorites-sk custom element, making it available for use in HTML.
  • favorites-sk-demo.html & favorites-sk-demo.ts: These files provide a demonstration page for the favorites-sk element. The HTML includes an instance of <favorites-sk> and a <pre> tag to display events. The TypeScript file simply imports the element and sets up an event listener (though no custom events are explicitly dispatched by favorites-sk in the provided code).

Workflow: Deleting a Favorite

User Clicks "Delete" Button (for a link in "My Favorites")
    |
    V
favorites-sk.ts: deleteFavoriteConfirm(id, name)
    |
    V
window.confirm("Deleting favorite: [name]. Are you sure?")
    |
    +-- User clicks "Cancel" --> Workflow ends
    |
    V User clicks "OK"
favorites-sk.ts: deleteFavorite(id)
    |
    V
fetch('/_/favorites/delete', { method: 'POST', body: {id: favId} })
    |
    +-- Network Error/Non-OK Response --> errorMessage() is called, display error
    |
    V Successful Deletion
favorites-sk.ts: fetchFavorites()
    |
    V
fetch('/_/favorites/')
    |
    V
Parse JSON response, update this.favoritesConfig
    |
    V
this._render() // Re-renders the component with the updated list

Workflow: Editing a Favorite

User Clicks "Edit" Button (for a link in "My Favorites")
    |
    V
favorites-sk.ts: editFavorite(id, name, desc, url)
    |
    V
Get reference to <favorites-dialog-sk id="fav-dialog">
    |
    V
favorites-dialog-sk.open(id, name, desc, url) // Opens the edit dialog
    |
    +-- User cancels dialog --> Promise rejects (potentially with undefined, handled)
    |
    V User submits changes in dialog
Promise resolves
    |
    V
favorites-sk.ts: fetchFavorites() // Re-fetches and re-renders the list
    |
    V
fetch('/_/favorites/')
    |
    V
Parse JSON response, update this.favoritesConfig
    |
    V
this._render()

The design relies on Lit for templating and rendering, which provides efficient updates to the DOM when the favoritesConfig data changes. The separation of concerns is evident: favorites-sk handles the list display and top-level actions, while favorites-dialog-sk manages the intricacies of the editing form.

Module: /modules/graph-title-sk

Graph Title (graph-title-sk)

The graph-title-sk module provides a custom HTML element designed to display titles for individual graphs in a structured and informative way. Its primary goal is to present key-value pairs of metadata associated with a graph in a visually clear and space-efficient manner.

Responsibilities and Key Components

The core of this module is the GraphTitleSk custom element (graph-title-sk.ts). Its main responsibilities are:

  1. Data Reception and Storage: It receives a Map<string, string> where keys represent parameter names (e.g., “bot”, “benchmark”) and values represent their corresponding values (e.g., “linux-perf”, “Speedometer2”). This map, along with the number of traces in the graph, is provided via the set() method.

  2. Dynamic Rendering: Based on the provided data, the element dynamically generates HTML to display the title. It iterates through the key-value pairs and renders them in a columnar layout. Each pair is displayed with the key (parameter name) in a smaller font above its corresponding value.

  3. Handling Empty or Generic Titles:

    • If a key or its corresponding value is an empty string, that particular entry is omitted from the displayed title. This ensures that the title remains concise and only shows relevant information.
    • If the input titleEntries map is empty but numTraces is greater than zero, it displays a generic title like “Multi-trace Graph (X traces)” to indicate a graph with multiple data series without specific shared parameters.
  4. Space Management and Truncation:

    - The title entries are arranged in a flexible, wrapping layout (`display:
    

    flex; flex-wrap: wrap;) using CSS (graph-title-sk.scss`). This allows the title to adapt to different screen widths.

    • To prevent overcrowding, especially when there are many parameters, the component implements a “show more” functionality. If the number of title entries exceeds a predefined limit (MAX_PARAMS, currently 8), it initially displays only the first MAX_PARAMS entries. A “Show Full Title” button (<md-text-button class="showMore">) is then provided, allowing the user to expand the view and see all title entries. Conversely, a “Show Short Title” mechanism is implied (though not explicitly shown as a button in the current code, showShortTitles() method exists) to revert to the truncated view.
    • Individual values that are very long are visually truncated in the display, but the full value is available as a tooltip when the user hovers over the text. This is achieved by setting thetitleattribute of thediv containing the value.

Design Decisions and Implementation Choices

  • Custom Element (ElementSk): The component is built as a custom element extending ElementSk. This aligns with the Skia infrastructure's approach to building reusable UI components and allows for easy integration into Skia applications.
  • Lit Library for Templating: The HTML structure is generated using the lit library‘s html template literal tag. This provides a declarative and efficient way to define the component’s view and update it when data changes. The _render() method, inherited from ElementSk, is called to trigger re-rendering when the internal state (_titleEntries, numTraces, showShortTitle) changes.
  • CSS for Styling: Styling is handled through a dedicated SCSS file (graph-title-sk.scss). This separates presentation concerns from the component‘s logic. CSS variables (e.g., var(--primary)) are used for theming, allowing the component’s appearance to be consistent with the overall application theme.
  • set() Method for Data Input: Instead of relying solely on HTML attributes for complex data like a map, a public set() method is provided. This is a common pattern for custom elements when dealing with non-string data or when updates need to trigger specific internal logic beyond simple attribute reflection.
  • Conditional Rendering for Title Brevity: The decision to truncate the number of displayed parameters by default (when exceeding MAX_PARAMS) and provide a “Show Full Title” option is a user experience choice. It prioritizes a clean initial view for complex graphs while still allowing users to access all details if needed.

Key Workflows

1. Initial Rendering with Data:

User/Application Code          GraphTitleSk Element
---------------------          --------------------
calls set(titleData, numTraces) -->
                                  stores titleData & numTraces
                                  calls _render()
                                  |
                                  V
                                  getTitleHtml() is invoked
                                  |
                                  V
                                  Iterates titleData:
                                    - Skips empty keys/values
                                    - If entries > MAX_PARAMS & showShortTitle is true:
                                      - Renders first MAX_PARAMS entries
                                      - Renders "Show Full Title" button
                                    - Else:
                                      - Renders all entries
                                  |
                                  V
                                  HTML template is updated with generated content
                                  Browser renders the title

2. Toggling Full/Short Title Display (when applicable):

User Interaction                  GraphTitleSk Element
----------------                  --------------------
Clicks "Show Full Title" button -->
                                  onClick handler (showFullTitle) executes
                                  |
                                  V
                                  this.showShortTitle = false
                                  calls _render()
                                  |
                                  V
                                  getTitleHtml() is invoked
                                  |
                                  V
                                  Now renders ALL title entries because showShortTitle is false
                                  |
                                  V
                                  HTML template is updated
                                  Browser re-renders the title to show all entries

A similar flow occurs if a mechanism to call showShortTitles() is implemented and triggered.

The demo page (graph-title-sk-demo.html and graph-title-sk-demo.ts) showcases various states of the graph-title-sk element, including:

  • A “good” example with several valid entries.
  • A “partial” example where some entries have empty keys or values.
  • A “generic” example where an empty map is provided, resulting in the “Multi-trace Graph” title.
  • An “empty” example (though the demo code doesn't explicitly create a state where numTraces is 0 and the map is also empty, which would result in no title being displayed).

Module: /modules/ingest-file-links-sk

Module: ingest-file-links-sk

Overview:

The ingest-file-links-sk module provides a custom HTML element, <ingest-file-links-sk>, designed to display a list of relevant links associated with a specific data point in the Perf performance monitoring system. These links are retrieved from the ingest.Format data structure, which can be generated by various ingestion processes. The primary purpose is to offer users quick access to related resources, such as Swarming task runs, Perfetto traces, or bot information, directly from the Perf UI.

Why:

Performance analysis often requires context beyond the raw data. Understanding the environment in which a test ran (e.g., specific bot configuration), or having direct access to detailed trace files, can be crucial for debugging performance regressions or understanding improvements. This module centralizes these relevant links in a consistent and easily accessible manner, improving the efficiency of performance investigations.

How:

The <ingest-file-links-sk> element fetches link data asynchronously. When its load() method is called with a CommitNumber (representing a specific point in time or version) and a traceID (identifying the specific data series), it makes a POST request to the /_/details/?results=false endpoint. This endpoint is expected to return a JSON object conforming to the ingest.Format structure.

The element then parses this JSON response. It specifically looks for the links field within the ingest.Format. If links exist and the version field in the ingest.Format is present (indicating a modern format), the element dynamically renders a list of these links.

Key design considerations and implementation details:

  • Asynchronous Loading: Link fetching is an asynchronous operation to avoid blocking the UI. A spinner-sk element is displayed while data is being loaded.
  • URL vs. Text: The module intelligently differentiates between actual URLs and plain text values within the links object. If a value is a valid URL, it‘s rendered as an <a> tag. Otherwise, it’s displayed as “Key: Value”.
  • Markdown Link Handling: The element includes logic to parse and convert Markdown-style links (e.g., [Link Text](url)) into standard HTML anchor tags. This allows ingestion processes to provide links in a more human-readable format if desired.
  • Sorted Display: Links are displayed in alphabetical order by their keys for consistent presentation.
  • Error Handling: If the fetch request fails or the response is not in the expected format, an error message is displayed, and the spinner is hidden.
  • Legacy Format Compatibility: The element checks for the version field in the response. If it‘s missing, it assumes a legacy data format that doesn’t support these links and gracefully avoids displaying anything.

Responsibilities and Key Components:

  • ingest-file-links-sk.ts: This is the core file defining the IngestFileLinksSk custom element.
    • It handles the fetching of link data from the backend API.
    • It manages the rendering of the link list based on the fetched data.
    • It includes the logic for differentiating between URLs and plain text, and for parsing Markdown links.
    • It manages the display of a loading spinner and error messages.
    • The load(cid: CommitNumber, traceid: string) method is the public API for triggering the data fetching and rendering process.
    • The displayLinks static method is responsible for generating the TemplateResult array for rendering the list items.
    • The isUrl and removeMarkdown helper functions provide utility for link processing.
  • ingest-file-links-sk.scss: This file contains the SASS styles for the custom element, defining its appearance, including list styling and spinner positioning.
  • ingest-file-links-sk-demo.html and ingest-file-links-sk-demo.ts: These files provide a demonstration page for the element. The demo page uses fetch-mock to simulate the backend API response, allowing developers to see the element in action and test its functionality in isolation.
  • ingest-file-links-sk_test.ts: This file contains unit tests for the IngestFileLinksSk element. It uses fetch-mock to simulate various API responses and asserts the element's behavior, such as correct link rendering, spinner state, and error handling.
  • ingest-file-links-sk_puppeteer_test.ts: This file contains Puppeteer-based end-to-end tests. These tests load the demo page in a headless browser and verify the element's visual rendering and basic functionality.

Key Workflow: Loading and Displaying Links

User Action/Page Load -> Calls ingest-file-links-sk.load(commit, traceID)
                             |
                             V
ingest-file-links-sk:  Show spinner-sk
                             |
                             V
                       Make POST request to /_/details/?results=false
                       (with commit and traceID in request body)
                             |
                             V
Backend API:          Processes request, retrieves links for the
                       given commit and trace
                             |
                             V
ingest-file-links-sk:  Receives JSON response (ingest.Format)
                             |
                             +----------------------+
                             |                      |
                             V                      V
                       Response OK?           Response Error?
                             |                      |
                             V                      V
                       Parse links            Display error message
                       Hide spinner             Hide spinner
                       Render link list

Module: /modules/json

JSON Module Documentation

This module defines TypeScript interfaces and types that represent the structure of JSON data used throughout the Perf application. It essentially acts as a contract between the Go backend and the TypeScript frontend, ensuring data consistency and type safety.

Why:

The primary motivation for this module is to leverage TypeScript's strong typing capabilities. By defining these interfaces, we can catch potential data inconsistencies and errors at compile time rather than runtime. This is particularly crucial for a data-intensive application like Perf, where the frontend relies heavily on JSON responses from the backend.

Furthermore, these definitions are automatically generated from Go struct definitions. This ensures that the frontend and backend data models remain synchronized. Any changes to the Go structs will trigger an update to these TypeScript interfaces, reducing the likelihood of manual errors and inconsistencies.

How:

The index.ts file contains all the interface and type definitions. These are organized into a flat structure for simplicity, with some nested namespaces (e.g., pivot, progress, ingest) where logical grouping is beneficial.

A key design choice is the use of nominal typing for certain primitive types (e.g., CommitNumber, TimestampSeconds, Trace). This is achieved by creating type aliases that are branded with a unique string literal type. For example:

export type CommitNumber = number & {
  _commitNumberBrand: 'type alias for number';
};

export function CommitNumber(v: number): CommitNumber {
  return v as CommitNumber;
}

This prevents accidental assignment of a generic number to a CommitNumber variable, even though they are structurally identical at runtime. This adds an extra layer of type safety, ensuring that, for example, a timestamp is not inadvertently used where a commit number is expected. Helper functions (e.g., CommitNumber(v: number)) are provided for convenient type assertion.

Key Components/Files/Submodules:

  • index.ts: This is the sole file in this module and contains all the TypeScript interface and type definitions. It serves as the single source of truth for JSON data structures used in the frontend.
    • Interfaces (e.g., Alert, DataFrame, FrameRequest, Regression): These define the shape of complex JSON objects. For instance, the Alert interface describes the structure of an alert configuration, including its query, owner, and various detection parameters. The DataFrame interface represents the core data structure for displaying traces, including the actual trace data (traceset), column headers (header), and associated parameter sets (paramset).
    • Type Aliases (e.g., ClusterAlgo, StepDetection, Status): These define specific allowed string values for certain properties, acting like enums. For example, ClusterAlgo can only be 'kmeans' or 'stepfit', ensuring that only valid clustering algorithms are specified.
    • Nominally Typed Aliases (e.g., CommitNumber, TimestampSeconds, Trace, ParamSet): As explained above, these provide stronger type checking for primitive types that have specific semantic meaning within the application. TraceSet, for example, is a map where keys are trace identifiers (strings) and values are Trace arrays (nominally typed number[]).
    • Namespaced Interfaces (e.g., pivot.Request, ingest.Format): Some interfaces are grouped under namespaces to organize related data structures. For example, pivot.Request defines the structure for requesting pivot table operations, including grouping criteria and aggregation operations. The ingest.Format interface defines the structure of data being ingested into Perf, including metadata like Git hash and the actual performance results.
    • Utility/Generic Types (e.g., ReadOnlyParamSet, AnomalyMap): These represent common data patterns. ReadOnlyParamSet is a map of parameter names to arrays of their possible string values, marked as read-only to reflect its typical usage. AnomalyMap is a nested map structure used to associate anomalies with specific commits and traces.

Workflow Example: Requesting and Displaying Trace Data

A common workflow involves the frontend requesting trace data from the backend and then displaying it.

  1. Frontend (Client) prepares a FrameRequest:

    Client Code --> Creates `FrameRequest` object:
                  {
                    begin: 1678886400, // Start timestamp
                    end: 1678972800,   // End timestamp
                    queries: ["config=gpu&name=my_test_trace"],
                    // ... other properties
                  }
    
  2. Frontend sends the FrameRequest to the Backend (Server).

  3. Backend processes the request and generates a FrameResponse:

    Server Logic --> Processes `FrameRequest`
                 --> Fetches data from database/cache
                 --> Constructs `FrameResponse` object:
                     {
                       dataframe: {
                         traceset: { "config=gpu&name=my_test_trace": [10.1, 10.5, 9.8, ...Trace] },
                         header: [ { offset: 12345, timestamp: 1678886400 }, ...ColumnHeader[] ],
                         paramset: { "config": ["gpu", "cpu"], "name": ["my_test_trace"] }
                       },
                       skps: [0, 5, 10], // Indices of significant points
                       // ... other properties like msg, display_mode, anomalymap
                     }
    
  4. Backend sends the FrameResponse (as JSON) back to the Frontend.

  5. Frontend receives the JSON and parses it, expecting it to conform to the FrameResponse interface: Client Code --> Receives JSON --> Parses JSON into a `FrameResponse` typed object --> Uses `frameResponse.dataframe.traceset` to render charts --> Uses `frameResponse.dataframe.header` to display commit information

This typed interaction ensures that if the backend, for example, renamed traceset to trace_data in its Go struct, the automatic generation would update the DataFrame interface. The TypeScript compiler would then flag an error in the frontend code trying to access frameResponse.dataframe.traceset, preventing a runtime error and guiding the developer to update the frontend code accordingly.

Module: /modules/json-source-sk

The json-source-sk module provides a custom HTML element, <json-source-sk>, designed to display the raw JSON data associated with a specific data point in a trace. This is particularly useful in performance analysis and debugging scenarios where understanding the exact input data ingested by the system is crucial.

The core responsibility of this module is to fetch and present JSON data in a user-friendly dialog. It aims to simplify the process of inspecting the source data for a given commit and trace identifier.

The key component is the JSONSourceSk class, defined in json-source-sk.ts. This class extends ElementSk, a base class for custom elements in the Skia infrastructure.

How it Works:

  1. Initialization and Properties:

    • The element requires two primary properties to be set:
      • cid: The Commit ID (represented as CommitNumber), which identifies a specific version or point in time.
      • traceid: A string identifier for the specific trace being examined.
    • When these properties are set, the element renders itself. If traceid is not a valid key (checked by validKey from perf/modules/paramtools), the control buttons are hidden.
  2. User Interaction and Data Fetching:

    • The element displays two buttons: “View Json File” and “View Short Json File”.
    • Clicking either button triggers the _loadSource or _loadSourceSmall methods, respectively.
    • These methods internally call _loadSourceImpl. This implementation detail allows for sharing the core fetching logic while differentiating the request URL.
    • _loadSourceImpl constructs a CommitDetailsRequest object containing the cid and traceid.
    • It then makes a POST request to the /_/details/ endpoint.
      • If “View Short Json File” was clicked (isSmall is true), the URL includes ?results=false, indicating to the backend that a potentially truncated or summarized version of the JSON is requested.
      • A spinner-sk element is activated to provide visual feedback during the fetch operation.
    • The response from the server is parsed as JSON using jsonOrThrow. If the request is successful, the JSON data is formatted with indentation and stored in the _json private property.
    • The element is then re-rendered to display the fetched JSON.
    • If an error occurs during fetching or parsing, errorMessage (from perf/modules/errorMessage) is used to display an error notification to the user.
  3. Displaying the JSON:

    • The fetched JSON data is displayed within a <dialog> element (#json-dialog).
    • The jsonFile() method in the template is responsible for rendering the <pre> tag containing the formatted JSON string, but only if _json is not empty.
    • The dialog is shown using showModal(), providing a modal interface for viewing the JSON.
    • A close button (#closeIcon with a close-icon-sk) allows the user to dismiss the dialog. Closing the dialog also clears the _json property.

Design Rationale:

  • Dedicated Element: Creating a dedicated custom element encapsulates the functionality of fetching and displaying JSON, making it reusable across different parts of the application where such inspection is needed.
  • Asynchronous Fetching: The use of async/await and fetch allows for non-blocking data retrieval, ensuring the UI remains responsive while waiting for the server.
  • Error Handling: Incorporating error handling via jsonOrThrow and errorMessage provides a better user experience by informing users about issues during data retrieval.
  • Clear Visual Feedback: The spinner-sk element clearly indicates when data is being loaded.
  • Modal Dialog: Using a modal dialog (<dialog>) for displaying the JSON helps focus the user's attention on the data without cluttering the main interface.
  • Option for Short JSON: The “View Short Json File” option caters to scenarios where the full JSON might be excessively large, providing a way to quickly inspect a summary or a smaller subset of the data. This can improve performance and readability for very large JSON files.
  • Styling and Theming: The SCSS file (json-source-sk.scss) provides basic styling and leverages existing button styles (//elements-sk/modules/styles:buttons_sass_lib). It also includes considerations for dark mode by using CSS variables like --on-background and --background.

Workflow Example: Viewing JSON Source

User Sets Properties       Element Renders         User Clicks Button      Fetches Data             Displays JSON
--------------------       ---------------         ------------------      ------------             -------------
[json-source-sk          -> [Buttons visible]  ->  ["View Json File"] -> POST /_/details/  ->   <dialog>
  .cid = 123                                                                {cid, traceid}      <pre>{json}</pre>
  .traceid = ",foo=bar,"]                                                                  </dialog>
                                                                        (spinner active)
                                                                              |
                                                                              V
                                                                        Response Received
                                                                        (spinner inactive)

The demo page (json-source-sk-demo.html and json-source-sk-demo.ts) illustrates how to use the <json-source-sk> element. It sets up mock data using fetchMock to simulate the backend endpoint and programmatically clicks the button to demonstrate the JSON loading functionality.

The Puppeteer test (json-source-sk_puppeteer_test.ts) ensures the element renders correctly and performs basic visual regression testing.

Module: /modules/new-bug-dialog-sk

The new-bug-dialog-sk module provides a user interface element for filing new bugs related to performance anomalies. It aims to streamline the bug reporting process by pre-filling relevant information and integrating with the Buganizer issue tracker.

Core Functionality:

The primary responsibility of this module is to display a dialog that allows users to input details for a new bug. This dialog is populated with information derived from one or more selected Anomaly objects. The user can then review and modify this information before submitting the bug.

Key Design Decisions and Implementation Choices:

  • Pre-population of Bug Details: To reduce manual effort and ensure consistency, the dialog attempts to intelligently pre-fill fields like the bug title, labels, and components.
    • The bug title is generated based on the nature (regression/improvement), magnitude (percentage change), and affected revision range of the anomalies. This logic, found in getBugTitle(), mimics the behavior of the legacy Chromeperf UI to maintain familiarity for users.
    • Labels and components are aggregated from all selected anomalies. Unique labels are presented as checkboxes (defaulting to checked), and unique components are presented as radio buttons (with the first one selected by default). This is handled by getLabelCheckboxes() and getComponentRadios().
  • Dynamic UI Generation: The dialog‘s content, specifically the label checkboxes and component radio buttons, is dynamically generated based on the provided Anomaly data. This ensures that only relevant options are presented to the user. Lit-html’s templating capabilities are used for this dynamic rendering.
  • User Contextualization: The dialog attempts to automatically CC the logged-in user on the new bug. This is achieved by fetching the user's login status via /alogin-sk.
  • Asynchronous Bug Filing: The actual bug filing process is asynchronous. When the user submits the form, a POST request is made to the /_/triage/file_bug endpoint.
    • A spinner (spinner-sk) is displayed during this operation to provide visual feedback.
    • Upon successful bug creation, the user is redirected to the newly created bug page in a new tab, and an anomaly-changed event is dispatched to notify other components (like explore-simple-sk or chart-tooltip-sk) that the anomalies have been updated with the new bug ID.
    • If an error occurs, an error message is displayed using error-toast-sk, and the dialog remains open, allowing the user to retry or correct information.
  • Standard HTML Dialog Element: The core dialog functionality leverages the native <dialog> HTML element, which provides built-in accessibility and modal behavior.

Workflow: Filing a New Bug

  1. Initialization: An external component (e.g., a chart displaying anomalies) invokes the setAnomalies() method on new-bug-dialog-sk, passing the relevant Anomaly objects and associated trace names.
  2. Opening the Dialog: The external component calls the open() method. User Action (e.g., click "File Bug" button) | V External Component --[setAnomalies(anomalies, traceNames)]--> new-bug-dialog-sk | V External Component --[open()]--> new-bug-dialog-sk
  3. Dialog Population: - new-bug-dialog-sk fetches the current user's login status to pre-fill the CC field. - The _render() method is called, which uses the Lit-html template. - getBugTitle() generates a suggested title. - getLabelCheckboxes() and getComponentRadios() create the UI for selecting labels and components based on the input anomalies. - The dialog (<dialog id="new-bug-dialog">) is displayed modally. new-bug-dialog-sk.open() | V [Fetch Login Status] --> Updates `_user` | V _render() |--> getBugTitle() --> Populates Title Input |--> getLabelCheckboxes() --> Creates Label Checkboxes |--> getComponentRadios() --> Creates Component Radios | V Dialog is displayed to the user
  4. User Interaction: The user reviews and potentially modifies the pre-filled information (title, description, labels, component, assignee, CCs).
  5. Submission: The user clicks the “Submit” button. User clicks "Submit" | V Form Submit Event | V new-bug-dialog-sk.fileNewBug()
  6. Bug Filing Process: - The fileNewBug() method is invoked. - The spinner is activated, and form buttons are disabled. - Form data (title, description, selected labels, selected component, assignee, CCs, anomaly keys, trace names) is collected. - A POST request is sent to /_/triage/file_bug with the collected data. fileNewBug() | V [Activate Spinner, Disable Buttons] | V [Extract Form Data] | V fetch('/_/triage/file_bug', {POST, body: jsonData})
  7. Response Handling: - Success: - The server responds with a JSON object containing the bug_id. - The spinner is deactivated, and buttons are re-enabled. - The dialog is closed. - A new browser tab is opened to the URL of the created bug (e.g., https://issues.chromium.org/issues/BUG_ID). - The bug_id is updated in the local _anomalies array. - An anomaly-changed custom event is dispatched with the updated anomalies and bug ID. - Failure: - The server responds with an error. - The spinner is deactivated, and buttons are re-enabled. - An error message is displayed to the user via errorMessage(). The dialog remains open. fetch Response | +-- Success (HTTP 200, valid JSON with bug_id) | | | V | [Deactivate Spinner, Enable Buttons] | | | V | closeDialog() | | | V | window.open(bugUrl, '_blank') | | | V | Update local _anomalies with bug_id | | | V | dispatchEvent('anomaly-changed', {anomalies, bugId}) | +-- Failure (HTTP error or invalid JSON) | V [Deactivate Spinner, Enable Buttons] | V errorMessage(errorMsg) --> Displays error toast

Key Files:

  • new-bug-dialog-sk.ts: This is the core file containing the NewBugDialogSk class definition, which extends ElementSk. It includes the Lit-html template for the dialog, the logic for populating form fields based on Anomaly data, handling form submission, interacting with the backend API to file the bug, and managing the dialog's visibility and state.
  • new-bug-dialog-sk.scss: This file defines the styles for the dialog, ensuring it integrates visually with the rest of the application and themes. It styles the dialog container, input fields, buttons, and the close icon.
  • new-bug-dialog-sk-demo.ts and new-bug-dialog-sk-demo.html: These files provide a demonstration page for the new-bug-dialog-sk element. The .ts file sets up mock data (Anomaly objects) and mock fetch responses to simulate the bug filing process, allowing for isolated testing and development of the dialog. The .html file includes the new-bug-dialog-sk element and a button to trigger its opening.
  • index.ts: This file simply imports new-bug-dialog-sk.ts to ensure the custom element is defined and available for use.

The module relies on several other elements and libraries:

  • alogin-sk: To determine the logged-in user for CC'ing.
  • close-icon-sk: For the dialog's close button.
  • spinner-sk: To indicate activity during bug filing.
  • error-toast-sk (via errorMessage utility): To display error messages.
  • lit: For templating and component rendering.
  • jsonOrThrow: A utility for parsing JSON responses and throwing errors on failure.

Module: /modules/paramtools

The paramtools module provides a TypeScript implementation of utility functions for manipulating parameter sets and structured keys. It mirrors the functionality found in the Go module /infra/go/paramtools, which is the primary source of truth for these operations. The decision to replicate this logic in TypeScript is to enable client-side applications to perform these common tasks without needing to make server requests for simple transformations or validations. This approach improves performance and reduces server load for UI-driven interactions.

The core responsibility of this module is to provide robust and consistent ways to:

  1. Create and parse structured keys: Structured keys are a fundamental concept for identifying specific data points (e.g., traces in performance data).
  2. Manipulate ParamSet objects: ParamSets are used to represent collections of possible parameter values, often used for filtering or querying data.

Key functionalities and their “why” and “how”:

  • makeKey(params: Params | { [key: string]: string }): string:

    • Why: To create a canonical string representation of a set of key-value parameters. This canonical form is essential for consistent identification and comparison of data points. The keys within the structured key are sorted alphabetically to ensure that the same set of parameters always produces the same key, regardless of the order in which they were provided.
    • How: It takes a Params object (a dictionary of string key-value pairs). It first checks if the params object is empty, throwing an error if it is, as a key must represent at least one parameter. Then, it sorts the keys of the params object alphabetically. Finally, it constructs the string by joining each key-value pair with = and then joining these pairs with ,, prefixing and suffixing the entire string with a comma. Input: { "b": "2", "a": "1", "c": "3" } | V Sort keys: [ "a", "b", "c" ] | V Format pairs: "a=1", "b=2", "c=3" | V Join and wrap: ",a=1,b=2,c=3,"
  • fromKey(structuredKey: string, attribute?: string): Params:

    • Why: To convert a structured key string back into a Params object, making it easier to work with the individual parameters programmatically. It also handles the removal of special functions that might be embedded in the key (e.g., norm(...) for normalization).
    • How: It first calls removeSpecialFunctions to strip any function wrappers from the key. Then, it splits the key string by the comma delimiter. Each resulting segment (if not empty) is then split by the equals sign to separate the key and value. These key-value pairs are collected into a new Params object. An optional attribute parameter allows excluding a specific key from the resulting Params object, which can be useful in scenarios where certain attributes are metadata and not part of the core parameters.
  • removeSpecialFunctions(key: string): string:

    • Why: Structured keys can sometimes include functional wrappers (e.g., norm(...), avg(...)) or special markers (e.g., special_zero). This function is designed to strip these away, returning the “raw” underlying key. This is important when you need to work with the base parameters without the context of the applied function or special condition.
    • How: It uses regular expressions to detect if the key matches a pattern like function_name(,param1=value1,...). If a match is found, it extracts the content within the parentheses. The extracted string (or the original key if no function was found) is then processed by extractNonKeyValuePairsInKey.
    • extractNonKeyValuePairsInKey(key: string): string: This helper function further refines the key string. It splits the string by commas and filters out any segments that do not represent a valid key=value pair. This helps to remove extraneous parts like special_zero that might be comma-separated but aren't true parameters. The valid pairs are then re-joined and wrapped with commas.
  • validKey(key: string): boolean:

    • Why: To provide a simple client-side check to determine if a string is a “valid” basic structured key, meaning it's not a key representing a calculation (like avg(...)) or other special trace types. This is a lightweight validation, as the server performs more comprehensive checks.
    • How: It checks if the key string starts and ends with a comma. This is a convention for simple, non-functional structured keys.
  • addParamsToParamSet(ps: ParamSet, p: Params): void:

    • Why: To add a new set of parameters (from a Params object) to an existing ParamSet. ParamSets store unique values for each parameter key. This function ensures that when new parameters are added, only new values are appended to the existing lists for each key, maintaining uniqueness.
    • How: It iterates through the key-value pairs of the input Params object (p). For each key, it retrieves the corresponding array of values from the ParamSet (ps). If the key doesn‘t exist in ps, a new array is created. If the value from p is not already present in the array, it’s added.
  • paramsToParamSet(p: Params): ParamSet:

    • Why: To convert a single Params object (representing one specific combination of parameters) into a ParamSet. In a ParamSet, each key maps to an array of values, even if there's only one value.
    • How: It creates a new, empty ParamSet. Then, for each key-value pair in the input Params object, it creates a new entry in the ParamSet where the key maps to an array containing just that single value.
  • addParamSet(p: ParamSet, ps: ParamSet | ReadOnlyParamSet): void:

    • Why: To merge one ParamSet (or ReadOnlyParamSet) into another. This is useful for combining sets of available parameter options, for example, when aggregating data from multiple sources.
    • How: It iterates through the keys and their associated value arrays in the source ParamSet (ps). If a key from ps is not present in the target ParamSet (p), the entire key and its value array (cloned) are added to p. If the key already exists in p, it iterates through the values in the source array and adds any values that are not already present in the target array for that key.
  • toReadOnlyParamSet(ps: ParamSet): ReadOnlyParamSet:

    • Why: To provide a type assertion that casts a mutable ParamSet to an immutable ReadOnlyParamSet. This is useful for signaling that a ParamSet should not be modified further, typically when passing it to components or functions that expect read-only data.
    • How: It performs a type assertion. No actual data transformation occurs; it's a compile-time type hint.
  • queryFromKey(key: string): string:

    • Why: To convert a structured key into a URL query string format (e.g., a=1&b=2&c=3). This is specifically useful for frontend applications, like explore-simple-sk, where state or filters are often represented in the URL.
    • How: It first uses fromKey to parse the structured key into a Params object. Then, it leverages the URLSearchParams browser API to construct a query string from these parameters. This ensures proper URL encoding of keys and values. Input Key: ",a=1,b=2,c=3," | V fromKey -> Params: { "a": "1", "b": "2", "c": "3" } | V URLSearchParams -> Query String: "a=1&b=2&c=3"

The design choice to have these functions operate with less stringent validation than their server-side Go counterparts is deliberate. The server remains the ultimate authority on data validity. These client-side functions prioritize ease of use and performance for UI interactions, assuming that the data they operate on has either originated from or will eventually be validated by the server.

The index_test.ts file provides comprehensive unit tests for these functions, ensuring their correctness and robustness across various scenarios, including handling empty inputs, duplicate values, and special key formats. This focus on testing is crucial for maintaining the reliability of these foundational utility functions.

Module: /modules/perf-scaffold-sk

The perf-scaffold-sk module provides a consistent layout and navigation structure for all pages within the Perf application. It acts as a wrapper, ensuring that common elements like the title bar, navigation sidebar, and error notifications are present and behave uniformly across different sections of Perf.

Core Responsibilities:

  • Layout Management: Establishes the primary visual structure, dividing the page into a header, a collapsible sidebar for navigation, and a main content area.
  • Navigation: Provides a standardized set of navigation links in the sidebar, allowing users to easily access different Perf features (e.g., New Query, Favorites, Alerts).
  • Global Elements: Hosts globally relevant components like the login status (alogin-sk), theme chooser (theme-chooser-sk), and error/toast notifications (error-toast-sk).
  • Dynamic Content Injection: Handles the placement of page-specific content into the main content area and allows for specific content (like help text) to be injected into the sidebar.

Key Components and Design Decisions:

  • perf-scaffold-sk.ts: This is the heart of the module, defining the PerfScaffoldSk custom element.

    • Why: Encapsulating the scaffold logic within a custom element promotes reusability and modularity. It allows any Perf page to adopt the standard layout simply by including this element.

    • How: It uses Lit for templating and rendering the structure (<app-sk>, header, aside#sidebar, main, footer).

    • Content Redistribution: A crucial design choice is how it handles child elements. Since it doesn't use Shadow DOM for the main content area (to allow global styles to apply easily to the page content), it programmatically moves children of <perf-scaffold-sk> into the <main> section.

    • Process:

      1. When connectedCallback is invoked, existing children of <perf-scaffold-sk> are temporarily moved out.
      2. The scaffold's own template (header, sidebar, etc.) is rendered.
      3. The temporarily moved children are then appended to the newly rendered <main> element.
      4. A MutationObserver is set up to watch for any new children added to <perf-scaffold-sk> and similarly move them to <main>.
    • Sidebar Content: An exception is made for elements with the specific ID SIDEBAR_HELP_ID. These are moved into the #help div within the sidebar. This allows pages to provide context-specific help information directly within the scaffold.

      <perf-scaffold-sk>
          <!-- This will go into <main> -->
          <div>Page specific content</div>
      
          <!-- This will go into <aside>#help -->
          <div id="sidebar_help">Contextual help</div>
      </perf-scaffold-sk>
      
    • Configuration via window.perf: The scaffold reads various configuration options from the global window.perf object. This allows instances of Perf to customize links (help, feedback, chat), behavior (e.g., show_triage_link), and display information (e.g., instance URL, build tag). This makes the scaffold adaptable to different Perf deployments.

    • For example, the _helpUrl and _reportBugUrl are initialized with defaults but can be overridden by window.perf.help_url_override and window.perf.feedback_url respectively.

    • The visibility of the “Triage” link is controlled by window.perf.show_triage_link.

    • Build Information: It displays the current application build tag, fetching it via getBuildTag() from //perf/modules/window:window_ts_lib and linking it to the corresponding commit in the buildbot git repository.

    • Instance Title: It can display the name of the Perf instance, extracted from window.perf.instance_url.

  • perf-scaffold-sk.scss: Defines the styles for the scaffold.

    • Why: Separates styling concerns from the element's logic.
    • How: It uses SASS and imports common themes from //perf/modules/themes:themes_sass_lib. It defines the layout, including the sidebar width and the main content area's width (using calc(99vw - var(--sidebar-width)) to avoid horizontal scrollbars caused by 100vw including the scrollbar width). It also styles the navigation links and other elements within the scaffold.
  • perf-scaffold-sk-demo.html & perf-scaffold-sk-demo.ts: Provide a demonstration page for the scaffold.

    • Why: Allows developers to see the scaffold in action and test its appearance and behavior in isolation.
    • How: perf-scaffold-sk-demo.ts initializes a mock window.perf object with various settings and then injects an instance of <perf-scaffold-sk> with some placeholder content (including a div with id="sidebar_help") into the perf-scaffold-sk-demo.html page.

Workflow: Initializing and Rendering a Page with the Scaffold

  1. A Perf page (e.g., the “New Query” page) includes <perf-scaffold-sk> as its top-level layout element. html <!-- new_query_page.html --> <body> <perf-scaffold-sk> <!-- Content specific to the New Query page --> <query-composer-sk></query-composer-sk> <div id="sidebar_help"> <p>Tips for creating new queries...</p> </div> </perf-scaffold-sk> </body>
  2. The PerfScaffoldSk element's connectedCallback fires.
  3. perf-scaffold-sk.ts: - Temporarily moves <query-composer-sk> and <div id="sidebar_help">...</div> out of perf-scaffold-sk. - Renders its own internal template (header with title, login, theme chooser; sidebar with nav links; empty main area; footer with error toast). ` ... ... <-- Placeholder for sidebar help ... <-- Placeholder for main content

id=“help”>element within thesidebar. - AMutationObserverstarts listening for any further children added directly to`.

The final rendered structure (simplified) would look something like:

perf-scaffold-sk
  └── app-sk
      ├── header
      │   ├── h1.name (Instance Title)
      │   ├── div.spacer
      │   ├── alogin-sk
      │   └── theme-chooser-sk
      ├── aside#sidebar
      │   ├── div#links
      │   │   ├── a (New Query)
      │   │   ├── a (Favorites)
      │   │   └── ... (other nav links)
      │   ├── div#help
      │   │   └── div#sidebar_help (Content from original page)
      │   │       └── <p>Tips for creating new queries...</p>
      │   └── div#chat
      ├── main
      │   └── query-composer-sk (Content from original page)
      └── footer
          └── error-toast-sk

Module: /modules/picker-field-sk

The picker-field-sk module provides a custom HTML element that serves as a stylized text input field with an associated dropdown menu for selecting from a predefined list of options. This component is designed to offer a user-friendly way to pick a single value from potentially many choices, enhancing the user experience in forms or selection-heavy interfaces.

Core Functionality and Design:

The primary goal of picker-field-sk is to present a familiar text input that, upon interaction (focus or click), reveals a filterable list of valid options. This addresses the need for a compact and efficient way to select an item, especially when the number of options is large.

The implementation leverages the Vaadin ComboBox component (@vaadin/combo-box) for its underlying dropdown and filtering capabilities. This choice was made to utilize a well-tested and feature-rich component, avoiding the need to reimplement complex dropdown logic, keyboard navigation, and accessibility features. picker-field-sk then wraps this Vaadin component, applying custom styling and providing a simplified API tailored to its specific use case.

Key Responsibilities and Components:

  • picker-field-sk.ts: This is the heart of the module, defining the PickerFieldSk custom element which extends ElementSk.

    • Properties:
    • label: A string that serves as both the visual label above the input field and the placeholder text within it when empty. This provides context to the user about the expected input.
    • options: An array of strings representing the valid choices the user can select from. The component dynamically adjusts the width of the dropdown overlay to accommodate the longest option, ensuring readability.
    • helperText: An optional string displayed below the input field, typically used for providing additional guidance or information to the user.
    • Events:
    • value-changed: This custom event is dispatched whenever the selected value in the combo box changes. This includes selecting an item from the dropdown, typing a value that matches an option (due to autoselect), or clearing the input. The new value is available in event.detail.value. This event is crucial for parent components to react to user selections.
    • Methods:
    • focus(): Programmatically sets focus to the input field.
    • openOverlay(): Programmatically opens the dropdown list of options. This is useful for guiding the user or for integrating with other UI elements.
    • disable(): Makes the input field read-only, preventing user interaction.
    • enable(): Removes the read-only state, allowing user interaction.
    • clear(): Clears the current value in the input field.
    • setValue(val: string): Programmatically sets the value of the input field.
    • getValue(): Retrieves the current value of the input field.
    • Rendering: Uses lit-html for templating. The template renders a <vaadin-combo-box> element and binds its properties and events to the PickerFieldSk element's state.
    • Overlay Width Calculation: The calculateOverlayWidth() private method dynamically adjusts the --vaadin-combo-box-overlay-width CSS custom property. It iterates through the options to find the longest string and sets the overlay width to be slightly larger than this string, ensuring all options are fully visible without truncation. This is a key usability enhancement. User provides options --> PickerFieldSk.options setter | V calculateOverlayWidth() | V Find max option length | V Set --vaadin-combo-box-overlay-width CSS property
  • picker-field-sk.scss: Contains the SASS styles for the component.

    • It primarily targets the underlying vaadin-combo-box and its shadow parts (e.g., ::part(label), ::part(input-field), ::part(items)) to customize its appearance to match the application's theme (including dark mode support).
    • CSS custom properties like --vaadin-field-default-width, --vaadin-combo-box-overlay-width, and --lumo-text-field-size are used to control the dimensions and sizing of the Vaadin component.
    • Dark mode styles are applied by targeting .darkmode picker-field-sk, adjusting colors for labels, helper text, and input fields to ensure proper contrast and visual integration.
  • index.ts: A simple entry point that imports and thereby registers the picker-field-sk custom element, making it available for use in HTML.

  • picker-field-sk-demo.html & picker-field-sk-demo.ts: These files create a demonstration page for the picker-field-sk component.

    • picker-field-sk-demo.html includes instances of the picker-field-sk element and buttons to trigger its various functionalities (focus, fill, open overlay, disable/enable).
    • picker-field-sk-demo.ts contains JavaScript to initialize the demo elements with sample data (a large list of “speedometer” options to showcase performance with many items) and to wire up the buttons to the corresponding methods of the PickerFieldSk instances. This allows developers to visually inspect and interact with the component.

Workflow Example: User Selects an Option

  1. Initialization: A parent component instantiates <picker-field-sk> and sets its label and options properties. <picker-field-sk .label="Fruit" .options=${['Apple', 'Banana', 'Cherry']}></picker-field-sk>
  2. User Interaction: The user clicks on or focuses the picker-field-sk input. User clicks/focuses input --> vaadin-combo-box internally handles focus/click | V vaadin-combo-box displays dropdown with options
  3. Filtering (Optional): The user types into the input field. The vaadin-combo-box filters the displayed options based on the typed text.
  4. Selection: The user clicks an option from the dropdown or presses Enter when an option is highlighted. User selects "Banana" --> vaadin-combo-box updates its internal value | V vaadin-combo-box emits 'value-changed' event
  5. Event Propagation: - The vaadin-combo-box within picker-field-sk emits its native value-changed event. - The onValueChanged method in PickerFieldSk catches this event. - PickerFieldSk then dispatches its own value-changed custom event, with the selected value in event.detail.value. picker-field-sk.onValueChanged(vaadinEvent) | V Dispatch new CustomEvent('value-changed', { detail: { value: vaadinEvent.detail.value }})
  6. Parent Component Reaction: The parent component, listening for the value-changed event on the <picker-field-sk> element, receives the event and can act upon the selected value. Parent component listens for 'value-changed' --> Accesses event.detail.value | V Update application state

This layered approach, building upon the Vaadin ComboBox, provides a robust and themeable selection component while abstracting away the complexities of the underlying library for the consumers of picker-field-sk.

Module: /modules/pinpoint-try-job-dialog-sk

Pinpoint Try Job Dialog (pinpoint-try-job-dialog-sk)

The pinpoint-try-job-dialog-sk module provides a user interface element for initiating Pinpoint A/B try jobs.

Purpose:

The primary reason for this module‘s existence within the Perf application is to allow users to request additional trace data for specific benchmark runs. While Pinpoint itself supports a wider range of try job use cases, this dialog is specifically tailored for this trace generation scenario. It’s important to note that this component is considered a legacy feature, and future development should favor the newer Pinpoint frontend.

How it Works:

The dialog is designed to gather the necessary parameters from the user to construct and submit a Pinpoint A/B try job request. This process involves:

  1. Initialization: The dialog can be pre-populated with initial values such as the test path, base commit, and end commit. This often happens when a user interacts with a chart tooltip and wants to investigate a specific data point further.
  2. User Input: The user can modify the pre-filled values or enter new ones. Key inputs include:
    • Base Commit: The starting commit hash for the A/B comparison.
    • Experiment Commit: The ending commit hash for the A/B comparison.
    • Tracing Arguments: A string specifying the categories and options for the trace generation. A default value is provided, and a link to Chromium source documentation offers more details on available options.
  3. Authentication: The dialog uses alogin-sk to determine the logged-in user. The user's email is included in the try job request.
  4. Submission: Upon submission, the dialog constructs a CreateLegacyTryRequest object. This object encapsulates all the necessary information for the Pinpoint backend.
    • The testPath (e.g., master/benchmark_name/story_name) is parsed to extract the configuration (e.g., benchmark_name) and the benchmark (e.g., story_name).
    • The story is typically the last segment of the testPath.
    • The extra_test_args field is formatted to include the user-provided tracing arguments.
  5. API Interaction: The dialog sends a POST request to the /_/try/ endpoint with the JSON payload.
  6. Response Handling:
    • Success: If the request is successful, the Pinpoint backend responds with a JSON object containing the jobUrl for the newly created Pinpoint job. This URL is then displayed to the user, allowing them to navigate to the Pinpoint UI to monitor the job's progress.
    • Error: If an error occurs during the request or processing, an error message is displayed to the user.

Workflow:

User Interaction (e.g., click on chart tooltip)
    |
    V
Dialog Pre-populated with context (testPath, commits)
    |
    V
pinpoint-try-job-dialog-sk.open()
    |
    V
User reviews/modifies input fields (Base Commit, Exp. Commit, Trace Args)
    |
    V
User clicks "Send to Pinpoint"
    |
    V
[pinpoint-try-job-dialog-sk]
  - Gathers input values
  - Retrieves logged-in user via alogin-sk
  - Constructs `CreateLegacyTryRequest` JSON
  - Sends POST request to /_/try/
    |
    V
[Backend Pinpoint Service]
  - Processes the request
  - Creates A/B try job
  - Returns jobUrl (success) or error
    |
    V
[pinpoint-try-job-dialog-sk]
  - Displays spinner during request
  - On Success:
    - Displays link to the created Pinpoint job (jobUrl)
    - Hides spinner
  - On Error:
    - Displays error message
    - Hides spinner

Key Components/Files:

  • pinpoint-try-job-dialog-sk.ts: This is the core TypeScript file that defines the custom element's logic.
    • PinpointTryJobDialogSk class: Extends ElementSk and manages the dialog's state, user input, and interaction with the Pinpoint API.
    • template: Defines the HTML structure of the dialog using lit-html. This includes input fields for commits and tracing arguments, a submit button, a spinner for loading states, and a link to the created Pinpoint job.
    • connectedCallback(): Initializes the dialog, sets up event listeners (e.g., for form submission, closing the dialog on outside click), and fetches the logged-in user's information.
    • setTryJobInputParams(params: TryJobPreloadParams): Allows external components to pre-fill the dialog's input fields. This is crucial for integrating the dialog with other parts of the Perf UI, like chart tooltips.
    • open(): Displays the modal dialog.
    • closeDialog(): Closes the modal dialog.
    • postTryJob(): This is the central method for handling the job submission. It reads values from the input fields, constructs the CreateLegacyTryRequest payload, and makes the fetch call to the Pinpoint API. It also handles the UI updates based on the API response (showing the job URL or an error message).
    • TryJobPreloadParams interface: Defines the structure for the parameters used to pre-populate the dialog.
  • pinpoint-try-job-dialog-sk.scss: Contains the SASS/CSS styles for the dialog, ensuring it aligns with the application's visual theme. It styles the input fields, buttons, and the overall layout of the dialog.
  • index.ts: A simple entry point that imports and registers the pinpoint-try-job-dialog-sk custom element.
  • BUILD.bazel: Defines the build rules for the module, specifying its dependencies (e.g., elements-sk components like select-sk, spinner-sk, alogin-sk, and Material Web components) and how it should be compiled.

Design Decisions:

  • Based on bisect-dialog-sk: The dialog's structure and initial functionality were adapted from an existing bisect dialog. This likely accelerated development by reusing common patterns for dialog interactions and API calls.
  • Legacy Component: The explicit note to avoid building on top of this dialog indicates a strategic decision to migrate towards a newer Pinpoint frontend. This suggests that this component is maintained for existing functionality but is not the target for future enhancements related to Pinpoint interactions.
  • Specific Use Case: The dialog is narrowly focused on requesting additional traces. This simplifies the UI and the request payload, making it easier for users to achieve this specific task.
  • Client-Side Request Construction: The CreateLegacyTryRequest object is fully constructed on the client-side before being sent to the backend. This gives the frontend more control over the request parameters.
  • Standard HTML Dialog: The use of the <dialog> HTML element provides built-in modal behavior, simplifying the implementation of showing and hiding the dialog.
  • Error Handling: The dialog includes basic error handling by displaying messages returned from the API, improving the user experience when things go wrong.
  • Spinner for Feedback: The spinner-sk component provides visual feedback to the user while the API request is in progress.

This component serves as a bridge for users of the Perf application to leverage Pinpoint's capabilities for generating detailed trace information, even as the broader Pinpoint tooling evolves.

Module: /modules/pivot-query-sk

The pivot-query-sk module provides a custom HTML element for users to configure and interact with pivot table requests. Pivot tables are a powerful data summarization tool, and this element allows users to define how data should be grouped, what aggregate operations should be performed, and what summary statistics should be displayed.

The core of the module is the PivotQuerySk class, which extends ElementSk. This class manages the state of the pivot request and renders the UI for user interaction. It leverages other custom elements like multi-select-sk and select-sk to provide intuitive input controls.

Key Design Choices and Implementation Details:

  • Event-Driven Updates: The element emits a custom event, pivot-changed, whenever the user modifies any part of the pivot request. This allows consuming applications to react to changes in real-time. The event detail (PivotQueryChangedEventDetail) contains the updated pivot.Request object or null if the current configuration is invalid. This decouples the UI component from the application logic that processes the pivot request.
  • Data Binding and Rendering: The PivotQuerySk element uses Lit's html templating for rendering. It maintains internal state for the _pivotRequest (the current pivot configuration) and _paramset (the available options for grouping). When these properties are set or updated, the _render() method is called to re-render the component, ensuring the UI reflects the current state.
  • Handling Null Pivot Requests: The createDefaultPivotRequestIfNull() method ensures that if _pivotRequest is initially null, it's initialized with a default valid structure before any user interaction attempts to modify it. This prevents errors and provides a sensible starting point.
  • Dynamic Option Generation: The options for “group by” and “summary” are dynamically generated based on the provided _paramset and the existing _pivotRequest. The allGroupByOptions() method is particularly noteworthy as it ensures that even if the _paramset changes, any currently selected group_by keys in the _pivotRequest are still displayed as options. This prevents accidental data loss during _paramset updates. It achieves this by concatenating keys from both sources, sorting, and then filtering out duplicates.
  • Input Validation: The pivotRequest getter includes a call to validatePivotRequest (from pivotutil). This ensures that the component only returns a valid pivot.Request object. If the current configuration is invalid, it returns null. This promotes data integrity.

Responsibilities and Key Components:

  • pivot-query-sk.ts: This is the main file defining the PivotQuerySk custom element.

    • PivotQuerySk class:
    • Manages the pivot.Request object, which defines the grouping, operation, and summary statistics for a pivot table.
    • Takes a ParamSet as input, which provides the available keys for the “group by” selection. This ParamSet likely originates from the dataset being analyzed.
    • Renders UI controls (multi-selects and a select dropdown) for users to specify:
      • Group By Keys: Which parameters to use for grouping data rows (e.g., ‘config’, ‘os’). This uses multi-select-sk.
      • Operation: The primary aggregate function to apply (e.g., ‘avg’, ‘sum’, ‘count’). This uses a standard select element.
      • Summary Statistics: Optional additional aggregate functions to calculate for each group (e.g., ‘stddev’, ‘percentile’). This also uses multi-select-sk.
    • Emits a pivot-changed custom event when the user modifies the pivot request.
    • PivotQueryChangedEventDetail type: Defines the structure of the data passed in the pivot-changed event.
    • PivotQueryChangedEventName constant: The string name of the custom event.
    • Event Handlers (groupByChanged, operationChanged, summaryChanged): These methods are triggered by user interactions with the respective UI elements. They update the internal _pivotRequest and then call emitChangeEvent.
    • emitChangeEvent(): Constructs and dispatches the pivot-changed event.
    • Property Getters/Setters (pivotRequest, paramset): Provide controlled access to the element's core data, triggering re-renders when set.
  • pivot-query-sk.scss: Contains the styling for the pivot-query-sk element. It ensures a consistent look and feel, leveraging styles from themes_sass_lib and select_sass_lib. The layout is primarily flex-based to arrange the different selection components.

  • pivot-query-sk-demo.html and pivot-query-sk-demo.ts: These files provide a demonstration page for the pivot-query-sk element.

    • The HTML sets up a basic page structure and includes an instance of pivot-query-sk.
    • The TypeScript initializes the demo element with sample pivot.Request data and a ParamSet. It also includes an event listener for pivot-changed to display the selected pivot configuration as JSON, illustrating how to consume the element's output.

Workflow for User Interaction and Event Emission:

  1. Initialization:

    • The pivot-query-sk element is created.
    • The consuming application sets the paramset (available grouping keys) and optionally an initial pivotRequest.
    • The element renders its initial state based on these inputs.
  2. User Modifies a Selection (e.g., changes a “group by” option):

    • multi-select-sk (for “group by”) emits a selection-changed event.
    • PivotQuerySk.groupByChanged() is called.
    • createDefaultPivotRequestIfNull() ensures _pivotRequest is not null.
    • _pivotRequest.group_by is updated based on the new selection.
    • emitChangeEvent() is called.
  3. Event Emission:

    • emitChangeEvent():
      • Retrieves the current pivotRequest (which might be null if invalid).
      • Creates a new CustomEvent named pivot-changed.
      • The detail of the event is the current (potentially validated) pivotRequest.
      • The event is dispatched, bubbling up the DOM.
  4. Application Responds:

    • The consuming application, listening for pivot-changed events on the pivot-query-sk element or one of its ancestors, receives the event.
    • The application can then use the event.detail (the pivot.Request) to update its data display, fetch new data, or perform other actions.

This flow can be visualized as:

User Interaction (e.g., click on multi-select)
      |
      v
Internal element event (e.g., @selection-changed from multi-select-sk)
      |
      v
PivotQuerySk Event Handler (e.g., groupByChanged)
      |
      v
Update internal _pivotRequest state
      |
      v
PivotQuerySk.emitChangeEvent()
      |
      v
Dispatch "pivot-changed" CustomEvent (with pivot.Request as detail)
      |
      v
Consuming Application's Event Listener
      |
      v
Application processes the new pivot.Request

Module: /modules/pivot-table-sk

The pivot-table-sk module provides a custom HTML element, <pivot-table-sk>, designed to display pivoted data in a tabular format. This element is specifically for DataFrames that have been pivoted and contain summary values, as opposed to summary traces (which would be displayed in a plot).

Core Functionality and Design

The primary purpose of pivot-table-sk is to present complex, multi-dimensional data in an understandable and interactive table. The “why” behind its design is to offer a user-friendly way to explore summarized data that arises from pivoting operations.

Key design considerations include:

  • Data Input: It takes a DataFrame (from //perf/modules/json:index_ts_lib) and a pivot.Request (also from //perf/modules/json:index_ts_lib) as input. The pivot.Request is crucial as it dictates how the DataFrame was originally pivoted, including the group_by keys, the main operation, and the summary operations.
  • Display: The data is rendered as an HTML table. The table headers are derived from the group_by keys and the summary operations.
  • Interactivity (Sorting): Users can sort the table by clicking on column headers. The sorting mechanism is designed to be intuitive, mimicking spreadsheet behavior where subsequent sorts on different columns break ties from previous sorts.
  • Query Context: The element also displays the query parameters, the “group by” keys, the primary operation, and the summary operations that led to the current view of the data. This provides context to the user.
  • Validation: It includes a mechanism to validate if the provided pivot.Request is suitable for display as a pivot table (using validateAsPivotTable from //perf/modules/pivotutil:index_ts_lib). This prevents rendering errors or confusing displays if the input data structure isn't appropriate.

Key Components and Files

  • pivot-table-sk.ts: This is the heart of the module, defining the PivotTableSk custom element.

    • PivotTableSk class:
    • Extends ElementSk (from //infra-sk/modules/ElementSk:index_ts_lib).
    • Manages the input DataFrame (df), pivot.Request (req), and the original query string.
    • KeyValues type and keyValuesFromTraceSet function: This is a critical internal data structure. KeyValues is an object where keys are trace keys (e.g., ',arch=x86,config=8888,') and values are arrays of strings. These string arrays represent the values of the parameters specified in req.group_by, in the same order. For example, if req.group_by is ['config', 'arch'], then for the trace ',arch=arm,config=8888,', the corresponding KeyValues entry would be ['8888', 'arm']. This transformation is performed by keyValuesFromTraceSet and is essential for rendering the “key” columns of the table and for sorting by these keys.
    • SortSelection class: Represents the sorting state of a single column. It stores:
      • column: The index of the column.
      • kind: Whether the column represents ‘keyValues’ (from group_by) or ‘summaryValues’ (from summary operations).
      • dir: The sort direction (‘up’ or ‘down’).
      • It provides methods to toggleDirection, buildCompare (to create a JavaScript sort comparison function based on its state), and encode/decode for serialization.
    • SortHistory class: Manages the overall sorting state of the table.
      • It holds an array (history) of SortSelection objects.
      • The “spreadsheet-like” multi-column sorting is achieved here. When a user clicks a column to sort, that column's SortSelection is moved to the front of the history array, and its direction is toggled.
      • buildCompare in SortHistory creates a composite comparison function that iterates through the SortSelection objects in history. The first SortSelection determines the primary sort order. If it results in a tie, the second SortSelection is used to break the tie, and so on. This creates the effect of a stable sort across multiple user interactions without needing a true stable sort algorithm for each click.
      • It also provides encode/decode methods to serialize the entire sort history (e.g., for persisting sort state in a URL).
    • set() method: The primary way to provide data to the component. It initializes keyValues, sortHistory, and the main compare function. It can also accept an encodedHistory string to restore a previous sort state.
    • Rendering Logic (Templates): Uses lit-html for templating.
      • queryDefinition(): Renders the contextual information about the query and pivot operations.
      • tableHeader(), keyColumnHeaders(), summaryColumnHeaders(): Generate the table header row, including sort icons.
      • sortArrow(): Dynamically displays the correct sort icon (up arrow, down arrow, or neutral sort icon) based on the current SortHistory.
      • tableRows(), keyRowValues(), summaryRowValues(): Generate the data rows of the table, applying the current sort order.
      • displayValue(): Formats numerical values for display, converting a special sentinel value (MISSING_DATA_SENTINEL from //perf/modules/const:const_ts_lib) to ‘-’.
    • Event Emission: Emits a change event when the user sorts the table. The event detail (PivotTableSkChangeEventDetail) is the encoded SortHistory string. This allows parent components to react to sort changes and potentially persist the state.
    • Dependencies:
    • Relies on paramset-sk to display the query parameters.
    • Uses various icon elements (arrow-drop-down-icon-sk, arrow-drop-up-icon-sk, sort-icon-sk) for the sort indicators.
    • //perf/modules/json:index_ts_lib for DataFrame, TraceSet, pivot.Request types.
    • //perf/modules/pivotutil:index_ts_lib for operationDescriptions and validateAsPivotTable.
    • //perf/modules/paramtools:index_ts_lib for fromKey (to parse trace keys into parameter sets).
    • //infra-sk/modules:query_ts_lib for toParamSet (to convert a query string into a ParamSet).
  • pivot-table-sk.scss: Provides the styling for the pivot-table-sk element, including table borders, padding, text alignment, and cursor styles for interactive elements. It leverages themes from //perf/modules/themes:themes_sass_lib.

  • index.ts: A simple entry point that imports and thereby registers the pivot-table-sk custom element.

  • pivot-table-sk-demo.html & pivot-table-sk-demo.ts:

    • These files set up a demonstration page for the pivot-table-sk element.
    • pivot-table-sk-demo.ts creates sample DataFrame and pivot.Request objects and uses them to populate instances of pivot-table-sk on the demo page. This is crucial for development and visual testing. It demonstrates valid use cases, cases with invalid pivot requests, and cases with null DataFrames to ensure the component handles these scenarios gracefully.
  • Test Files (pivot-table-sk_test.ts, pivot-table-sk_puppeteer_test.ts):

    • pivot-table-sk_test.ts (Karma test): Contains unit tests for the PivotTableSk element and its internal logic, particularly the SortSelection and SortHistory classes. It verifies:
    • Correct initialization and rendering.
    • The sorting behavior when column headers are clicked (e.g., sort direction changes, correct sort icons appear, change event is emitted with the correct encoded history).
    • The buildCompare functions in SortSelection and SortHistory produce the correct sorting results for various data types and sort directions.
    • The encode and decode methods for SortSelection and SortHistory work correctly, allowing for round-tripping of sort state.
    • The keyValuesFromTraceSet function correctly transforms TraceSet data based on the pivot.Request.
    • pivot-table-sk_puppeteer_test.ts (Puppeteer test): Performs end-to-end tests by loading the demo page in a headless browser.
    • It checks if the elements render correctly on the page (smoke test).
    • It takes screenshots of the rendered component for visual regression testing.

Workflow Example: User Sorting the Table

  1. Initial State:

    • The pivot-table-sk element is initialized with a DataFrame, a pivot.Request, and an optional initial encodedHistory string.
    • pivot-table-sk creates a SortHistory object. If encodedHistory is provided, SortHistory.decode() is called. Otherwise, a default sort order is established (usually based on the order of summary columns, then key columns, all initially ‘up’).
    • SortHistory.buildCompare() generates the initial comparison function.
    • The table is rendered, sorted according to this initial comparison function. Each column header shows a default sort-icon-sk.
  2. User Clicks a Column Header (e.g., “config” key column):

    - `changeSort(columnIndex, 'keyValues')` is called within
      `pivot-table-sk`.
    - `this.sortHistory.selectColumnToSortOn(columnIndex, 'keyValues')` is
      invoked:
      - The `SortSelection` for the "config" column is found in
        `this.sortHistory.history`.
      - It's removed from its current position.
      - Its `direction` is toggled (e.g., from 'up' to 'down').
      - This updated `SortSelection` is prepended to
        `this.sortHistory.history`. `Before: [SummaryCol0(up),
    

    SummaryCol1(up), KeyCol0(config, up), KeyCol1(arch, up)] Click on KeyCol0 (config): After: [KeyCol0(config, down), SummaryCol0(up), SummaryCol1(up), KeyCol1(arch, up)] -this.compare = this.sortHistory.buildCompare(...)`is called. A new composite comparison function is generated. Now, rows will primarily be sorted by “config” (descending). Ties will be broken by “SummaryCol0” (ascending), then “SummaryCol1” (ascending), and finally “KeyCol1” (ascending).

    • ACustomEvent('change')is dispatched. Theevent.detailcontains this.sortHistory.encode(), which is a string representation of the new sort order (e.g., “dk0-su0-su1-ku1”).
    • this.\_render()is called, re-rendering the table with the new sort order. The “config” column header now shows an arrow-drop-down-icon-sk.
  3. User Clicks Another Column Header (e.g., “avg” summary column):

    - The process repeats. The `SortSelection` for the "avg" column is moved
      to the front of `this.sortHistory.history` and its direction is toggled.
      `Before: [KeyCol0(config, down), SummaryCol0(avg, up), SummaryCol1(sum,
    

    up), KeyCol1(arch, up)] Click on SummaryCol0 (avg): After: [SummaryCol0(avg, down), KeyCol0(config, down), SummaryCol1(sum, up), KeyCol1(arch, up)]` - The table is re-rendered, now primarily sorted by “avg” (descending), with ties broken by “config” (descending), then “sum” (ascending), then “arch” (ascending).

This multi-level sorting, driven by the SortHistory maintaining the sequence of user sort actions, is a key aspect of the “how” behind the pivot-table-sk's user experience. It aims to provide a powerful yet familiar way to analyze pivoted data.

Module: /modules/pivotutil

The pivotutil module provides utility functions and constants for working with pivot table requests. Its primary purpose is to ensure the validity and integrity of pivot requests before they are processed, and to offer human-readable descriptions for pivot operations. This centralization of pivot-related logic helps maintain consistency and simplifies the handling of pivot table configurations across different parts of the application.

Key Components and Responsibilities

index.ts: This is the core file of the module and contains all the exported functionalities.

  • operationDescriptions:

    • Why: Pivot operations are often represented by short, cryptic identifiers (e.g., avg, std). To improve user experience and make UIs more understandable, a mapping to human-readable names is necessary.
    • How: This is a simple JavaScript object (dictionary) where keys are the pivot.Operation enum values (imported from ../json) and values are their corresponding descriptive strings (e.g., “Mean”, “Standard Deviation”). This allows for easy lookup and display of operation names.
  • validatePivotRequest(req: pivot.Request | null): string:

    • Why: Before attempting to process a pivot request, it's crucial to ensure that the request is structurally sound and contains the minimally required information. This prevents runtime errors and provides early feedback to the user or calling code if the request is malformed.
    • How: This function performs basic validation checks on a pivot.Request object.
    • It first checks if the request itself is null. If so, it returns an error message.
    • It then verifies that the group_by property is present and is an array with at least one element. A pivot table fundamentally relies on grouping data, so this is a mandatory field.
    • If all checks pass, it returns an empty string, indicating a valid request. Otherwise, it returns a string describing the specific validation error.
    • Workflow: Input: pivot.Request | null | V Is request null? --(Yes)--> Return "Pivot request is null." | (No) V Is req.group_by null or empty? --(Yes)--> Return "Pivot must have at least one GroupBy." | (No) V Return "" (Valid)
  • validateAsPivotTable(req: pivot.Request | null): string:

    • Why: Some contexts specifically require a pivot table that displays summary values, not just a pivot plot which might only group traces without performing summary calculations. This function enforces the presence of summary operations.
    • How:
    • It first calls validatePivotRequest to ensure the basic structure of the request is valid. If validatePivotRequest returns an error, that error is immediately returned.
    • If the basic validation passes, it then checks if the summary property of the request is present and is an array with at least one element. Summary operations (like sum, average, etc.) are essential for generating the aggregated values displayed in a pivot table. Without them, the request might be valid for plotting individual traces grouped by some criteria, but not for a typical pivot table with summarized data.
    • If the summary array is missing or empty, an error message is returned. Otherwise, an empty string is returned.
    • Workflow: Input: pivot.Request | null | V Call validatePivotRequest(req) --> invalidMsg | V Is invalidMsg not empty? --(Yes)--> Return invalidMsg | (No) V Is req.summary null or empty? --(Yes)--> Return "Must have at least one Summary operation." | (No) V Return "" (Valid for pivot table)

index_test.ts: This file contains unit tests for the functions in index.ts.

  • Why: To ensure the validation logic correctly identifies valid and invalid pivot requests under various conditions. This maintains the reliability of the pivotutil module.
  • How: It uses the chai assertion library to define test cases.
    • For validatePivotRequest, it tests scenarios like:
    • null request.
    • group_by being null.
    • group_by being an empty array.
    • A completely valid request.
    • For validateAsPivotTable, it builds upon the validatePivotRequest checks and adds tests for: _ summary being null. _ summary being an empty array. * A valid request with at least one summary operation. Each test asserts whether the validation functions return an empty string (for valid inputs) or a non-empty error message string (for invalid inputs) as expected.

The design decision to separate validatePivotRequest and validateAsPivotTable allows for more granular validation. Some parts of an application might only need the basic validation (e.g., ensuring data can be grouped), while others specifically require summary operations for display in a tabular format. This separation provides flexibility. The use of descriptive error messages aids in debugging and user feedback.

Module: /modules/plot-google-chart-sk

The plot-google-chart-sk module provides a custom element for rendering interactive time-series charts using Google Charts. It is designed to display performance data, including anomalies and user-reported issues, and allows users to interact with the chart through panning, zooming, and selecting data points.

Key Responsibilities:

  • Data Visualization: Renders line charts based on DataTable objects, which are consumed from a Lit context (dataTableContext). This DataTable typically contains time-series data where the first column is a commit identifier (e.g., revision number or timestamp), the second is a date object, and subsequent columns represent different data traces.
  • Interactivity:
    • Panning: Allows users to pan the chart horizontally by clicking and dragging.
    • Zooming: Supports both horizontal and vertical zooming. Users can Ctrl-click and drag to select a region to zoom into. A reset button allows returning to the original view.
    • Delta Calculation: Enables users to Shift-click and drag vertically to measure the difference (both raw and percentage) between two Y-axis values.
    • Tooltip Display: Shows detailed information about a data point when the user hovers over it.
    • Data Point Selection: Allows users to click on a data point to select it, which can trigger other actions in the application.
  • Anomaly and Issue Display: Overlays icons on the chart to indicate anomalies (regressions, improvements, untriaged, ignored) and user-reported issues at specific data points. These are also consumed from Lit contexts (dataframeAnomalyContext and dataframeUserIssueContext).
  • Legend and Trace Management: Includes a side panel (side-panel-sk) that displays a legend for the plotted traces. Users can toggle the visibility of individual traces using checkboxes in the side panel.
  • Dynamic Updates: Responds to changes in data, selected ranges, and other properties by redrawing or updating the chart view.

Design Decisions and Implementation Choices:

  • Google Charts Integration: Leverages the @google-web-components/google-chart library for the core charting functionality. This provides a robust and feature-rich charting engine.
  • LitElement and Context API: Built as a LitElement custom element, making it easy to integrate into modern web applications. It utilizes Lit's Context API to consume shared data like the DataTable, anomaly information, and loading states from parent components or a centralized data store. This promotes a decoupled architecture.
  • Modular Sub-components:
    • v-resizable-box-sk: A dedicated component for the vertical selection box used in the “deltaY” mode. It calculates and displays the difference between the start and end points of the drag.
    • drag-to-zoom-box-sk: Handles the visual representation of the selection box during the drag-to-zoom interaction. It manages the display and dimensions of the box as the user drags.
    • side-panel-sk: Encapsulates the legend and trace visibility controls. This separation of concerns keeps the main chart component focused on plotting.
  • Event-Driven Communication: Emits custom events (e.g., selection-changed, plot-data-mouseover, plot-data-select) to notify parent components of user interactions and chart state changes. This allows for integration with other parts of an application.
  • Overlay for Anomalies and Issues: Anomalies and user issues are rendered as absolutely positioned md-icon elements on top of the chart. Their positions are calculated based on the chart‘s layout and the data point coordinates. This approach avoids modifying the Google Chart’s internal rendering and allows for more flexible styling and interaction with these markers.
  • Caching and Performance:
    • Caches the Google Chart object (this.chart) and chart layout information (this.cachedChartArea) to avoid redundant lookups.
    • Maintains a removedLabelsCache to efficiently hide and show traces without reconstructing the entire DataView each time.
  • Separate Interaction Modes: The navigationMode property (pan, deltaY, dragToZoom) manages the current mouse interaction state. This simplifies event handling by directing mouse events to the appropriate logic based on the active mode.
  • Dynamic Y-Axis Title: The determineYAxisTitle method attempts to create a meaningful Y-axis title by examining the unit and improvement_direction parameters from the trace names. It displays these only if they are consistent across all visible traces.

Key Components/Files:

  • plot-google-chart-sk.ts: The core component that orchestrates the chart display and interactions.
    • Manages the Google Chart instance.
    • Handles mouse events for panning, zooming, delta calculation, and data point interactions.
    • Consumes data (DataTable, AnomalyMap, UserIssueMap) via Lit context.
    • Renders anomaly and user issue icons as overlays.
    • Communicates with side-panel-sk to manage trace visibility.
    • Dispatches custom events for user interactions.
  • side-panel-sk.ts: Implements the side panel containing the legend and checkboxes for toggling trace visibility.
    • Generates legend entries based on the DataTable.
    • Manages the checked state of traces and communicates changes to plot-google-chart-sk.
    • Can display the calculated delta values from the v-resizable-box-sk.
  • v-resizable-box-sk.ts: A custom element for the vertical resizable selection box used during the delta calculation (Shift-click + drag).
    • Displays the selection box and calculates the raw and percentage difference between the Y-values at the start and end of the drag.
  • drag-to-zoom-box-sk.ts: A custom element for the selection box used during the drag-to-zoom interaction (Ctrl-click + drag).
    • Draws a semi-transparent rectangle indicating the area to be zoomed.
  • plot-google-chart-sk-demo.ts and plot-google-chart-sk-demo.html: Provide a demonstration page showcasing the plot-google-chart-sk element with sample data. This is crucial for development and testing.
  • index.ts: Serves as the entry point for the module, importing and registering all the custom elements defined within.

Key Workflows:

  1. Initial Chart Rendering: DataTable (from context) -> plot-google-chart-sk -> updateDataView() -> Creates google.visualization.DataView -> Sets columns based on domain (commit/date) and visible traces -> updateOptions() configures chart appearance (colors, axes, view window) -> plotElement.value.view = view and plotElement.value.options = options -> Google Chart renders. -> onChartReady(): -> Caches chart object. -> Calls drawAnomaly(), drawUserIssues(), drawXbar().

  2. Panning: User mousedown (not Shift or Ctrl) -> onChartMouseDown(): navigationMode = 'pan' User mousemove -> onWindowMouseMove(): -> Calculates deltaX based on mouse movement and current domain. -> Updates this.selectedRange. -> Calls updateOptions() to update chart's horizontal view window. -> Dispatches selection-changing event. User mouseup -> onWindowMouseUp(): -> Dispatches selection-changed event. -> navigationMode = null.

  3. Drag-to-Zoom: User Ctrl + mousedown -> onChartMouseDown(): navigationMode = 'dragToZoom' -> zoomRangeBox.value.initializeShow(): Displays the drag box. User mousemove -> onWindowMouseMove(): -> zoomRangeBox.value.handleDrag(): Updates the drag box dimensions. User mouseup -> onChartMouseUp(): -> Calculates zoom boundaries based on drag box and isHorizontalZoom. -> zoomRangeBox.value.hide(). -> showResetButton = true. -> updateBounds(): Updates chart's hAxis.viewWindow or vAxis.viewWindow. -> navigationMode = null.

  4. Delta Calculation (Shift-Click): User Shift + mousedown -> onChartMouseDown(): navigationMode = 'deltaY' -> deltaRangeBox.value.show(): Displays the vertical resizable box. User mousemove -> onWindowMouseMove(): -> deltaRangeBox.value.updateSelection(): Updates box height and calculates delta. -> Updates sidePanel.value with delta values. User Shift + mousedown (again) or regular mousedown -> onChartMouseDown(): -> Toggles deltaRangeOn. If finishing, sidePanel.value.showDelta = true. User mouseup (after dragging) -> onChartMouseUp(): -> Updates sidePanel.value with final delta values. -> navigationMode = null.

  5. Toggling Trace Visibility: User clicks checkbox in side-panel-sk -> side-panel-sk dispatches side-panel-selected-trace-change. plot-google-chart-sk listens (sidePanelCheckboxUpdate()): -> Updates this.removedLabelsCache. -> Calls updateDataView(): -> Recreates DataView, hiding/showing columns based on removedLabelsCache. -> Updates chart.

  6. Anomaly/Issue Display: anomalyMap or userIssues (from context) changes -> plot-google-chart-sk.willUpdate() -> plotElement.value.redraw() (if chart already rendered). Chart redraw triggers onChartReady(): -> drawAnomaly() / drawUserIssues(): -> Iterates through anomalies/issues for visible traces. -> Calculates screen coordinates (x, y) using chart.getChartLayoutInterface().getXLocation() and getYLocation(). -> Clones template md-icon elements from slots. -> Positions the icons absolutely within anomalyDiv or userIssueDiv.

This detailed explanation should provide a solid understanding of the plot-google-chart-sk module's purpose, architecture, and key functionalities.

Module: /modules/plot-simple-sk

The plot-simple-sk module provides a custom HTML element for rendering 2D line graphs. It's designed to be interactive, allowing users to zoom, inspect individual data points, and highlight specific traces.

Core Functionality and Design:

The primary goal of plot-simple-sk is to display time-series data or any data that can be represented as a set of (x, y) coordinates. Key design considerations include:

  1. Performance: To handle potentially large datasets and maintain a smooth user experience, the element employs several optimization techniques:

    • Dual Canvases: It uses two <canvas> elements stacked on top of each other.
      • The bottom canvas (traces) is for drawing the static parts of the plot: the lines, axes, and dots representing data points. These are pre-rendered into Path2D objects for efficient redrawing.
      • The top canvas (overlay) is for dynamic elements that change frequently, such as crosshairs, zoom selection rectangles, and hover highlights. This separation prevents unnecessary redrawing of the entire plot.
    • Path2D Objects: Trace lines and data point dots are converted into Path2D objects. This allows the browser to optimize their rendering, leading to faster redraws compared to repeatedly issuing drawing commands.
    • k-d Tree for Point Proximity: For features like displaying information on mouse hover or selecting the nearest data point on click, a k-d tree (kd.ts) is used. This data structure allows for efficient searching of the closest point in a 2D space, crucial for interactivity with potentially many data points.
    • Debounced Redraws and Calculations: Operations like rebuilding the k-d tree (recalcSearchTask) or redrawing after a zoom (zoomTask) are often scheduled using window.setTimeout. This prevents these potentially expensive operations from blocking the main thread and ensures they only happen when necessary, improving responsiveness. requestAnimationFrame is used for mouse movement updates to synchronize with the browser's repaint cycle.
  2. Interactivity:

    • Zooming:
      • Summary and Detail Views: The plot can optionally display a “summary” area above the main “detail” area. The summary shows an overview of all data, and users can drag a region on the summary to zoom the detail view to that specific x-axis range.
      • Detail View Zoom: Users can also drag a rectangle directly on the detail view to zoom into a specific x and y range.
      • Zoom Stack: The element maintains a stack of zoom levels (detailsZoomRangesStack), allowing users to progressively zoom in and potentially (though not explicitly stated as a current feature for out) navigate back through zoom levels.
    • Hover and Selection:
      • Moving the mouse near a trace highlights the closest data point and emits a trace_focused event.
      • Clicking on a trace selects the closest data point and emits a trace_selected event.
    • Crosshairs: When the shift key is held, crosshairs are displayed, indicating the mouse's current x and y position on the plot.
    • Highlighting Traces: Specific traces can be programmatically highlighted, making them stand out.
    • X-Bar and Bands: Vertical lines (xbar) or regions (bands) can be drawn on the plot to mark specific x-axis values or ranges.
    • Anomalies and User Issues: The plot can display markers for anomalies (regressions, improvements) and user-reported issues at specific data points.
  3. Appearance and Theming:

    • Responsive Sizing: The plot adapts to the width and height attributes of the custom element and uses ResizeObserver to redraw when its dimensions change.
    • Device Pixel Ratio: It accounts for window.devicePixelRatio to render crisply on high-DPI displays by drawing to a larger canvas and then scaling it down with CSS transforms.
    • CSS Variables for Theming: The element is designed to integrate with elements-sk/themes and uses CSS variables for colors (e.g., --on-background, --success, --failure), allowing its appearance to be customized by the surrounding application's theme. It listens for theme-chooser-toggle events to redraw when the theme changes.

Key Files and Responsibilities:

  • plot-simple-sk.ts: This is the heart of the module, defining the PlotSimpleSk custom element.

    • Rendering Logic: Contains all the drawing code for the traces, axes, labels, summary view, detail view, crosshairs, zoom indicators, anomalies, etc. It manages the two canvas contexts (ctx for traces, overlayCtx for overlays).
    • State Management: Manages the internal state, including the lineData (traces and their pre-rendered paths), labels (x-axis tick information), current _zoom state, detailsZoomRangesStack for detail view zooms, hoverPt, crosshair, highlighted traces, _xbar, _bands, and _anomalyDataMap.
    • Event Handling: Sets up event listeners for mouse interactions (move, down, up, leave, click) to handle zooming, hovering, and selection. It also listens for theme-chooser-toggle and ResizeObserver events.
    • API Methods: Exposes methods like addLines, deleteLines, removeAll, and properties like highlight, xbar, bands, zoom, anomalyDataMap, userIssueMap, and dots to control the plot's content and appearance.
    • Coordinate Transformations: Uses d3-scale (specifically scaleLinear) to map data coordinates (domain) to canvas pixel coordinates (range) and vice-versa. Functions like rectFromRange and rectFromRangeInvert handle these transformations for rectangular regions.
    • Path and Search Builders:
    • PathBuilder: A helper class to construct Path2D objects for trace lines and dots based on the current scales and data.
    • SearchBuilder: A helper class to prepare the data points for the KDTree by converting source coordinates to canvas coordinates.
    • Drawing Areas: Defines SummaryArea and DetailArea interfaces and manages their respective rectangles, axes, and scaling ranges.
  • kd.ts: Implements a k-d tree.

    • Purpose: Provides an efficient way (O(log n) on average for search) to find the nearest data point to a given mouse coordinate on the canvas. This is crucial for interactivity like mouse hovering and clicking to identify specific points on traces.
    • Implementation: It's a trimmed-down version of an existing k-d tree library, specifically tailored for finding the single closest 2D point. It takes an array of points (each with x and y properties), a distance metric function, and the dimensions to consider (['x', 'y']). The nearest() method is the primary interface used by plot-simple-sk.ts.
  • ticks.ts: Responsible for generating appropriate tick marks and labels for the time-based x-axis.

    • Purpose: Given an array of Date objects representing the x-axis values, it determines a sensible set of tick positions and their corresponding formatted string labels (e.g., “Jul”, “Mon, 8 AM”, “10:30 AM”).
    • Logic: It considers the total duration spanned by the dates and selects an appropriate time granularity (e.g., months, days, hours, minutes) for the labels using Intl.DateTimeFormat. It aims for a reasonable number of ticks (MIN_TICKS to MAX_TICKS) and uses a fixTicksLength function to thin out the ticks if too many are generated.
    • Output: The ticks() function returns an array of objects, each with an x (index in the original data) and a text (formatted label).
  • plot-simple-sk.scss: Contains the SASS/CSS styles for the plot-simple-sk element.

    • Layout: Defines the positioning of the canvas elements (absolute positioning for the overlay on top of the trace canvas).
    • Theming Integration: Imports themes.scss and uses CSS variables (e.g., var(--on-background), var(--background)) to ensure the plot‘s colors match the application’s theme.
  • index.ts: A simple entry point that imports plot-simple-sk.ts to ensure the custom element is defined and registered with the browser.

  • Demo Files (plot-simple-sk-demo.html, plot-simple-sk-demo.ts, plot-simple-sk-demo.scss):

    • Provide a live demonstration of the plot-simple-sk element's capabilities.
    • The HTML sets up the plot elements and buttons to trigger various actions.
    • The TypeScript file (plot-simple-sk-demo.ts) contains the logic to interact with the plot, such as adding random trace data, highlighting traces, zooming, clearing the plot, and displaying anomaly markers. It also logs events emitted by the plot.

Key Workflows:

  1. Initialization and Rendering: ElementSk constructor -> connectedCallback -> render render -> _render (lit-html template instantiation) -> canvas.getContext -> updateScaledMeasurements -> updateScaleRanges -> recalcDetailPaths -> recalcSummaryPaths -> drawTracesCanvas

  2. Adding Data (addLines): addLines -> Convert MISSING_DATA_SENTINEL to NaN -> Store in this.lineData -> updateScaleDomains -> recalcSummaryPaths -> recalcDetailPaths -> drawTracesCanvas recalcDetailPaths / recalcSummaryPaths -> For each line: PathBuilder creates linePath and dotsPath. recalcDetailPaths -> recalcSearch (schedules recalcSearchImpl) recalcSearchImpl -> SearchBuilder populates points -> new KDTree

  3. Mouse Hover and Focus: mousemove event -> this.mouseMoveRaw updated raf loop -> checks this.mouseMoveRaw -> eventToCanvasPt -> If this.pointSearch: this.pointSearch.nearest(pt) -> updates this.hoverPt -> dispatches trace_focused event -> Updates this.crosshair (based on shift key and hoverPt) -> drawOverlayCanvas

  4. Zooming via Summary Drag: mousedown on summary -> this.inZoomDrag = 'summary' -> this.zoomBegin set mousemove (while dragging) -> raf loop: -> eventToCanvasPt -> clampToRect (summary area) -> this.summaryArea.range.x.invert(pt.x) to get source x -> this.zoom = [min_x, max_x] (triggers _zoomImpl via setter task) _zoomImpl (after timeout) -> updateScaleDomains -> recalcDetailPaths -> drawTracesCanvas mouseup / mouseleave -> dispatches zoom event -> this.inZoomDrag = 'no-zoom'

  5. Zooming via Detail Area Drag: mousedown on detail -> this.inZoomDrag = 'details' -> this.zoomRect initialized mousemove (while dragging) -> raf loop: -> eventToCanvasPt -> clampToRect (detail area) -> Updates this.zoomRect.width/height -> drawOverlayCanvas (to show the dragging rectangle) mouseup / mouseleave -> dispatchZoomEvent -> doDetailsZoom doDetailsZoom -> If zoom box is large enough: this.detailsZoomRangesStack.push(rectFromRangeInvert(...)) -> _zoomImpl

  6. Drawing Process:

    • drawTracesCanvas():

      1. Clears the appropriate part of the main trace canvas (this.ctx).
      2. Draws detail area:
        • Saves context, clips to detail rect.
        • Calls drawXAxis (for detail).
        • Iterates this.lineData: draws line.detail.linePath and line.detail.dotsPath if this.dots is true.
        • Restores context.
        • Calls drawXAxis again (to draw labels outside the clipped region).
      3. If this.summary and not dragging zoom:
        • Draws summary area similarly.
      4. Calls drawYAxis (for detail).
      5. Calls drawOverlayCanvas().
    • drawOverlayCanvas():

      1. Clears the entire overlay canvas (this.overlayCtx).
      2. If this.summary:
        • Saves context, clips to summary rect.
        • Calls drawXBar, drawBands.
        • Draws detail zoom indicator box if detailsZoomRangesStack is not empty.
        • Draws summary zoom bars and shaded regions based on this._zoom.
        • Restores context.
      3. Clips to detail rect:
        • Calls drawXBar, drawBands.
        • Draws highlighted lines.
        • Draws hovered line/dots.
        • Calls drawUserIssues, drawAnomalies.
        • If dragging zoom in detail: draws this.zoomRect (dashed).
        • If not dragging: draws crosshairs and hover label.
        • Restores context.

This structured approach allows plot-simple-sk to be both feature-rich and performant for visualizing and interacting with 2D data plots.

Module: /modules/plot-summary-sk

The plot-summary-sk module provides a custom HTML element, <plot-summary-sk>, designed to display a summary plot of performance data and allow users to select a range within that plot. This is particularly useful for visualizing trends over time or commit ranges and enabling interactive exploration of the data.

At its core, plot-summary-sk leverages the Google Charts library to render an area chart. It's designed to work with a DataFrame, a data structure commonly used in Perf for holding timeseries data. The element can display data based on either commit offsets or timestamps (domain attribute).

Key Responsibilities:

  • Data Visualization: Renders an area chart representing performance data over a specified domain (commit or date).
  • Range Selection: Allows users to interactively select a range on the plot. This selection can be initiated by dragging on the chart or by programmatically setting the selection.
  • Event Emission: Emits a summary_selected custom event when the user makes or changes a selection. This event carries details about the selected range (start, end, value, and domain).
  • Dynamic Data Loading: Optionally, it can display controls to load more data in either direction (earlier or later), integrating with a DataFrameRepository to fetch and append new data.
  • Theming: Adapts to theme changes (e.g., dark mode) by redrawing the chart with appropriate styles.
  • Responsiveness: The chart redraws itself when its container is resized, ensuring it remains visually correct.

Key Components/Files:

  • plot-summary-sk.ts: This is the main file defining the PlotSummarySk LitElement.
    • Why: It encapsulates the logic for chart rendering, user interaction, data handling, and event emission.
    • How:
    • It consumes DataFrame data (from dataTableContext) and renders it using <google-chart>.
    • It manages the display of single or all traces based on the selectedTrace property.
    • It uses an internal h-resizable-box-sk element to provide the visual selection rectangle and handles the mouse events for drawing and resizing this selection.
    • It translates between the visual coordinates of the selection box and the data values (commit offsets or timestamps) of the underlying chart.
    • It listens for google-chart-ready events to ensure operations like setting a selection programmatically happen after the chart is fully initialized.
    • It provides controlTemplate for optional “load more data” buttons, which interact with a DataFrameRepository (consumed via dataframeRepoContext).
    • It uses a ResizeObserver to detect when the element is resized and triggers a chart redraw.
    • It manages colors for different traces to ensure consistent visualization.
  • h-resizable-box-sk.ts: This file defines the HResizableBoxSk LitElement, a reusable component for creating a horizontally resizable and draggable selection box.
    • Why: To decouple the complex UI interaction logic of drawing, moving, and resizing a selection rectangle from the main plot-summary-sk component. This promotes reusability and simplifies the main component's logic.
    • How:
    • It renders a div (.surface) that represents the selection.
    • It listens for mousedown events on its container to initiate an action: ‘draw’ (if clicking outside the existing selection), ‘drag’ (if clicking inside the selection), ‘left’ (if clicking on the left edge), or ‘right’ (if clicking on the right edge).
    • It listens for mousemove events on the window to update the selection‘s position and size during an action. This ensures interaction continues even if the mouse moves outside the element’s bounds.
    • It listens for mouseup events on the window to finalize the action and emits a selection-changed event with the new range.
    • It uses CSS to style the selection box and provide visual cues for dragging and resizing (e.g., cursor: move, cursor: ew-resize).
    • The selectionRange property (getter and setter) allows programmatic control and retrieval of the selection, defined by begin and end pixel offsets relative to the component.
  • plot-summary-sk.css.ts: Contains the CSS styles for the plot-summary-sk element, defined as a Lit css tagged template literal.
    • Why: To encapsulate the visual styling, ensuring the plot and its controls are laid out correctly and are visually consistent with the application's theme.
    • How: It uses flexbox for layout, positions the selection box (h-resizable-box-sk) absolutely over the chart, and styles the optional loading buttons and loading indicator.
  • plot-summary-sk-demo.ts and plot-summary-sk-demo.html: Provide a demonstration page for the plot-summary-sk element.
    • Why: To allow developers to see the component in action, test its features, and understand how to integrate it.
    • How: The HTML sets up multiple instances of plot-summary-sk with different configurations (e.g., domain, selectionType). The TypeScript file generates sample DataFrame objects, converts them to Google DataTable format, and populates the plot elements. It also listens for summary_selected events and displays their details.
  • Test Files (*.test.ts, *_puppeteer_test.ts):
    • Why: To ensure the component functions as expected and to prevent regressions.
    • How:
    • Unit tests (plot-summary-sk_test.ts, h_resizable_box_sk_test.ts) verify individual component logic, such as programmatic selection and state changes. They often mock dependencies like the Google Chart library or use test utilities to generate data.
    • Puppeteer tests (plot-summary-sk_puppeteer_test.ts) perform end-to-end testing by interacting with the component in a real browser environment. They simulate user actions like mouse drags and verify the emitted event details and visual output (via screenshots).

Key Workflows:

  1. Initialization and Data Display:

    [DataFrame via context or property]
           |
           v
    plot-summary-sk
           |
           v
    [willUpdate/updateDataView] --> Converts DataFrame to Google DataTable
           |
           v
    <google-chart> --> Renders area chart
           |
           v
    [google-chart-ready event] --> plot-summary-sk may apply cached selection
    
  2. User Selecting a Range by Drawing:

    User mousedowns on <plot-summary-sk> (outside existing selection in h-resizable-box-sk)
           |
           v
    h-resizable-box-sk (action = 'draw')
           |
           v
    User moves mouse (mousemove on window)
           |
           v
    h-resizable-box-sk --> Updates selection box dimensions
           |
           v
    User mouseups (mouseup on window)
           |
           v
    h-resizable-box-sk --> Emits 'selection-changed' (with pixel coordinates)
           |
           v
    plot-summary-sk (onSelectionChanged)
           |
           v
    Converts pixel coordinates to data values (commit/timestamp)
           |
           v
    Emits 'summary_selected' (with data values)
    
  3. User Resizing/Moving an Existing Selection:

    User mousedowns on <h-resizable-box-sk> (on edge for resize, or middle for drag)
           |
           v
    h-resizable-box-sk (action = 'left'/'right'/'drag')
           |
           v
    User moves mouse (mousemove on window)
           |
           v
    h-resizable-box-sk --> Updates selection box position/dimensions
           |
           v
    User mouseups (mouseup on window)
           |
           v
    h-resizable-box-sk --> Emits 'selection-changed'
           |
           v
    plot-summary-sk (onSelectionChanged) --> Converts & Emits 'summary_selected'
    
  4. Programmatic Selection: Application calls plotSummarySkElement.Select(beginHeader, endHeader) OR Application sets plotSummarySkElement.selectedValueRange = { begin: val1, end: val2 } | v plot-summary-sk | v Caches selectedValueRange (important if chart not ready) | v [If chart ready] --> Converts data values to pixel coordinates | v Sets selectionRange on <h-resizable-box-sk> If the chart is not ready when selectedValueRange is set, the conversion and setting of the h-resizable-box-sk selection is deferred until the google-chart-ready event fires.

The design separates the concerns of data plotting (Google Charts), interactive range selection UI (h-resizable-box-sk), and the overall orchestration and data conversion logic (plot-summary-sk). This makes the system more modular and easier to maintain. The use of LitElement and contexts allows for a reactive programming model and clean integration with other parts of the Perf application.

Module: /modules/point-links-sk

The point-links-sk module is a custom HTML element designed to display links associated with specific data points in a performance analysis context. These links often originate from ingestion files and can include commit details, build logs, or other relevant resources.

The primary purpose of this module is to provide users with quick access to contextual information related to a data point. It achieves this by:

  1. Fetching and Displaying Links: The module fetches link data from a backend API based on a commit ID and a trace ID. It then renders these links as clickable anchor elements.
  2. Generating Commit Range Links: A key feature is its ability to generate links representing the range of commits between two data points. This is particularly useful for understanding changes that might have occurred between two performance measurements.
    • If the commit hashes for a given key (e.g., “V8 Git Hash”) are different between the current and previous data points, it constructs a URL that shows the log of commits between those two specific commit hashes.
    • If the commit hashes are the same, it simply links to the individual commit, indicating no change in that specific dependency.
  3. Caching: To optimize performance and avoid redundant API calls, the module can utilize a provided cache of previously loaded commit links. If the links for a specific commit and trace ID are already in the cache, it will use those instead of re-fetching.
  4. User-Friendly Presentation: Links are presented in a list format, with a “copy to clipboard” button for each link, enhancing usability.

Key Responsibilities and Components:

  • point-links-sk.ts: This is the core file defining the PointLinksSk custom element.
    • It extends ElementSk from infra-sk.
    • load() method: This is the main public method responsible for initiating the process of fetching and displaying links. It takes the current commit ID, the previous commit ID, a trace ID, and arrays of keys to identify which links should be treated as commit ranges and which are general “useful links”. It handles the logic for checking the cache, fetching data from the API, processing commit ranges, and updating the display.
    • getLinksForPoint() and invokeLinksForPointApi() methods: These private methods handle the actual API interaction to retrieve link data. getLinksForPoint attempts to fetch from /_/links/ first and falls back to /_/details/?results=false if the initial attempt fails. It also includes workarounds for specific data inconsistencies (e.g., V8 and WebRTC URLs).
    • renderPointLinks() and renderRevisionLink() methods: These methods, along with the static template, use lit-html to generate the HTML structure for displaying the links.
    • Helper methods (getCommitIdFromCommitUrl, getRepoUrlFromCommitUrl, getFormattedCommitRangeText, extractUrlFromStringForFuchsia): These provide utility functions for parsing URLs and formatting text.
    • Data properties (commitPosition, displayUrls, displayTexts): These store the state of the component, such as the current commit and the links to be displayed.
  • point-links-sk.scss: Provides the styling for the point-links-sk element, ensuring a consistent look and feel, including styling for Material Design icons and buttons.
  • index.ts: A simple entry point that imports and thereby registers the point-links-sk custom element.
  • point-links-sk-demo.html & point-links-sk-demo.ts: These files set up a demonstration page for the point-links-sk element. The point-links-sk-demo.ts file uses fetch-mock to simulate the backend API, allowing developers to test the component's behavior in isolation. It demonstrates how to instantiate and use the point-links-sk element with different configurations.

Workflow for Loading and Displaying Links:

The typical workflow when the load() method is called can be visualized as:

Caller invokes pointLinksSk.load(currentCID, prevCID, traceID, rangeKeys, usefulKeys, cachedLinks)
    |
    V
Check if links for (currentCID, traceID) exist in `cachedLinks`
    |
    +-- YES --> Use cached links
    |           |
    |           V
    |           Render links
    |
    +-- NO ---> Fetch links for `currentCID` from API (`getLinksForPoint`)
                |
                V
                If `rangeKeys` are provided:
                |   Fetch links for `prevCID` from API (`getLinksForPoint`)
                |   For each key in `rangeKeys`:
                |       Extract current commit hash from `currentCID` links
                |       Extract previous commit hash from `prevCID` links
                |       If hashes are different:
                |           Generate "commit range" URL (e.g., .../+log/prevHash..currentHash)
                |       Else (hashes are same):
                |           Use current commit URL
                |       Add to `displayUrls` and `displayTexts`
                |
                V
                If `usefulKeys` are provided:
                |   For each key in `usefulKeys`:
                |       Add corresponding link from `currentCID` links to `displayUrls`
                |
                V
                Update cache with newly fetched/generated links for (currentCID, traceID)
                |
                V
                Render links

This module is designed to be flexible, allowing the consuming application to specify which types of links should be processed for commit ranges and which should be displayed as direct links. The inclusion of error handling (via errorMessage) and the fallback mechanism in API calls (/_/links/ then /_/details/) make it more robust.

Module: /modules/progress

The progress module provides a mechanism for initiating and monitoring the status of long-running tasks on the server. This is crucial for user experience, as it allows the client to display progress information and avoid appearing unresponsive during lengthy operations.

The core of this module is the startRequest function. This function is designed to handle asynchronous server-side processes that might take a significant amount of time to complete.

How startRequest Works:

  1. Initiation:

    • It begins by sending an initial POST request to a specified startingURL with a given body. This request typically triggers the long-running task on the server.
    • If a spinner-sk element is provided, it's activated to visually indicate that a process is underway.
  2. Polling:

    • The server's response to the initial request (and subsequent polling requests) is expected to be a JSON object of type progress.SerializedProgress. This object contains:
      • status: Indicates whether the task is “Running” or “Finished” (or potentially other states like “Error”).
      • messages: An array of key-value pairs providing more detailed information about the current state of the task (e.g., current step, progress percentage).
      • url: If the status is “Running”, this URL is used for the next polling request to get updated progress.
      • results: If the status is “Finished”, this field contains the final output of the long-running process.
    • If the status is “Running”, startRequest will schedule a setTimeout to make a GET request to the url provided in the response after a specified period. This creates a polling loop.
  3. Callback and Completion:

    • An optional callback function can be provided. This function is invoked after each successful fetch (both the initial request and every polling update), receiving the progress.SerializedProgress object. This allows the UI to update with the latest progress information.
    • The polling continues until the server responds with a status that is not “Running” (e.g., “Finished”).
    • Once the task is complete, the Promise returned by startRequest resolves with the final progress.SerializedProgress object.
    • If a spinner-sk was provided, it is deactivated.
  4. Error Handling:

    • If any network request fails (e.g., non-2xx HTTP status), the Promise returned by startRequest is rejected with an error.
    • The spinner (if provided) is also deactivated in case of an error.

Workflow Diagram:

Client UI                 startRequest Function                 Server
----------                ---------------------                 ------
   |                          |
   | -- Call startRequest --> |
   |                          | -- POST to startingURL (body) --> |
   |                          |                                   |
   |                          | <-- Response (SerializedProgress) -- |
   |                          |
   | -- (Optional) Activate -- |
   |      Spinner            |
   |                          |
   |                          | -- If status is "Running": --------> Schedule setTimeout(period)
   |                          |     |
   |                          |     V
   |                          | -- GET to progress.url -----------> |
   |                          |                                   |
   |                          | <-- Response (SerializedProgress) -- |
   |                          |     |
   |                          |     --- (Invoke callback) ---------> Client UI (Update progress)
   |                          |     |
   |                          |     --- Loop back to "If status is 'Running'"
   |                          |
   |                          | -- If status is "Finished": -------> Resolve Promise
   |                          |                                     |
   | -- (Optional) Deactivate | <-----------------------------------
   |      Spinner            |
   |                          |
   | <-- Promise Resolves ---- |
   |     (SerializedProgress)  |

Key Files:

  • progress.ts:

    • Responsibilities: Implements the core logic for initiating requests, polling for status updates, handling responses, and managing callbacks and promises. It also provides utility functions for formatting progress messages.
    • Key Components:
    • startRequest: The primary function that orchestrates the entire progress monitoring flow. It encapsulates the logic for making the initial POST request and subsequent GET requests for polling. The use of a single processFetch internal function is a design choice to reduce code duplication, as the response handling logic is identical for both the initial and polling fetches.
    • messagesToErrorString: A utility function designed to extract a user-friendly error message from the messages array within SerializedProgress. It prioritizes messages with the key “Error” but falls back to concatenating all messages if no specific error message is found. This ensures that some form of feedback is available even if the server doesn't explicitly flag an error.
    • messagesToPreString: Formats messages for display, typically within a <pre> tag, by putting each key-value pair on a new line. This is useful for presenting detailed progress logs.
    • messageByName: Allows retrieval of a specific message's value by its key from the messages array, with a fallback if the key is not found. This is useful for extracting specific pieces of information from the progress updates (e.g., the current step number).
    • Dependencies:
    • elements-sk/modules/spinner-sk: Used to visually indicate that a background task is in progress.
    • perf/modules/json: Provides the progress.SerializedProgress type definition, ensuring consistency in how progress information is structured between the client and server.
  • progress_test.ts:

    • Responsibilities: Contains unit tests for the progress.ts module.
    • Key Focus:
    • Verifies that startRequest correctly handles different server response scenarios: immediate completion, one or more polling steps, and network errors.
    • Ensures that the optional callback is invoked correctly during the polling process.
    • Tests the behavior of the message formatting utility functions (messagesToErrorString, messageByName) with various inputs.
    • Methodology: Uses fetch-mock to simulate server responses, allowing for controlled testing of the asynchronous network interactions without relying on an actual backend. This is crucial for creating reliable and fast unit tests.

The design of this module prioritizes a clear separation of concerns. startRequest focuses on the communication and polling logic, while the utility functions provide convenient ways to interpret and display the progress information received from the server. The use of Promises simplifies handling asynchronous operations, and the optional callback provides flexibility for updating the UI in real-time.

Module: /modules/query-chooser-sk

Query Chooser Element (query-chooser-sk)

The query-chooser-sk module provides a user interface element for selecting and modifying query parameters. It's designed to offer a compact way to display the currently active query and provide a mechanism to change it through a dialog.

Core Functionality and Design

The primary goal of query-chooser-sk is to present a summarized view of the current query and allow users to edit it in a more detailed interface. This is achieved by:

  1. Displaying a summary: The current query is displayed in a concise format using the paramset-sk element. This gives users a quick overview of the active filters.
  2. Providing an “Edit” button: This button triggers the display of a dialog.
  3. Embedding query-sk in a dialog: The dialog contains a query-sk element. This is where the user can interactively build or modify their query by selecting values for different parameters.
  4. Showing query match count: Alongside the query-sk element, query-count-sk is used to display how many items match the currently constructed query. This provides immediate feedback to the user as they refine their selection.
  5. Event propagation: query-chooser-sk listens for query-change events from the embedded query-sk element. When a change occurs, query-chooser-sk updates its own current_query and re-renders, effectively propagating the change. It also emits its own query-change event, allowing parent components to react to query modifications.

This design separates the concerns of displaying the current state from the more complex interaction of query building. The dialog provides a focused environment for query modification without cluttering the main UI.

Key Components and Files

  • query-chooser-sk.ts: This is the core TypeScript file defining the QueryChooserSk custom element.
    • It manages the visibility of the editing dialog.
    • It orchestrates the interaction between the summary display (paramset-sk), the query editing interface (query-sk), and the match count display (query-count-sk).
    • It defines properties like current_query, paramset, key_order, and count_url which are essential for its operation and for configuring its child elements.
    • The _editClick and _closeClick methods handle the opening and closing of the dialog.
    • The _queryChange method is crucial for reacting to changes in the embedded query-sk element and updating the current_query.
  • query-chooser-sk.html (template within query-chooser-sk.ts): This Lit HTML template defines the structure of the element.
    • It includes a div with class row to display the “Edit” button and the paramset-sk summary.
    • Another div with id dialog acts as the container for query-sk, query-count-sk, and the “Close” button. The visibility of this dialog is controlled by adding/removing the display class.
  • query-chooser-sk.scss: This file provides the styling for the element. It ensures proper layout of the button, summary, and the dialog content. It also includes theming support.
  • index.ts: A simple entry point that imports and registers the query-chooser-sk custom element.
  • query-chooser-sk-demo.html / query-chooser-sk-demo.ts: These files provide a demonstration page for the element, showcasing its usage with sample data and event handling. fetchMock is used in the demo to simulate the count_url endpoint.
  • query-chooser-sk_puppeteer_test.ts: Contains Puppeteer tests to verify the rendering and basic functionality of the element.

Workflow: Editing a Query

The typical workflow for a user interacting with query-chooser-sk is as follows:

User sees current query summary & "Edit" button
  |
  | (User clicks "Edit")
  V
Dialog appears, showing:
  - `query-sk` (for selecting parameters/values)
  - `query-count-sk` (displaying number of matches)
  - "Close" button
  |
  | (User interacts with `query-sk`, changing selections)
  V
`query-sk` emits "query-change" event
  |
  V
`query-chooser-sk` (_queryChange method):
  - Updates its `current_query` property/attribute
  - Re-renders to reflect new `current_query` in summary & `query-count-sk`
  - Emits its own "query-change" event (for parent components)
  |
  | (User is satisfied with the new query)
  V
User clicks "Close"
  |
  V
Dialog is hidden
  |
  V
`query-chooser-sk` displays the updated query summary.

The paramset attribute is crucial as it provides the available keys and values that query-sk will use to render its selection interface. The key_order attribute influences the order in which parameters are displayed within query-sk. The count_url is passed directly to query-count-sk to fetch the number of matching items for the current query.

Module: /modules/query-count-sk

The query-count-sk module provides a custom HTML element designed to display the number of results matching a given query. Its primary purpose is to offer a dynamic and responsive way to inform users about the scope of their queries in real-time, without requiring a full page reload or complex UI updates. This is particularly useful in applications where users frequently refine search criteria and need immediate feedback on the impact of those changes.

The core functionality revolves around the QueryCountSk class, which extends ElementSk. This class manages the state of the displayed count, handles asynchronous data fetching, and updates the UI accordingly.

Key Components and Design Decisions:

  • query-count-sk.ts: This is the heart of the module.
    • Asynchronous Data Fetching: When the current_query or url attributes change, the element initiates a POST request to the specified url.
    • The request body includes the current_query, and a default time window of the last 24 hours (begin and end timestamps). This design choice implies that the element is typically used for querying recent data.
    • To prevent race conditions and unnecessary network requests, any ongoing fetch operation is aborted if a new query is initiated. This is achieved using an AbortController. This is a crucial design decision for performance and responsiveness, especially when users rapidly change query parameters.
    • The component expects a JSON response with a count (number of matches) and a paramset (a read-only representation of parameters related to the query).
    • State Management: The _count property stores the fetched count as a string, and _requestInProgress is a boolean flag indicating whether a fetch operation is currently active. This flag is used to show/hide a loading spinner (spinner-sk).
    • Rendering: The component uses lit-html for efficient template rendering. The template displays the _count and the spinner-sk conditionally.
    • Event Emission: Upon successful data retrieval, a paramset-changed custom event is dispatched. This event carries the paramset received from the server. This allows other components on the page to react to changes in the available parameters based on the current query results. This decoupling is a key design aspect for building modular UIs.
    • Error Handling: Network errors or non-OK HTTP responses are caught, and an error message is displayed to the user via the errorMessage utility (likely from perf/modules/errorMessage). AbortErrors are handled gracefully by simply stopping the current operation without displaying an error, as this usually means the user initiated a new action.
  • query-count-sk.scss: Provides styling for the element, ensuring the count and spinner are displayed appropriately. The display: inline-block and flexbox layout for the internal div are chosen for simple alignment of the count and spinner.
  • query-count-sk-demo.html and query-count-sk-demo.ts: These files provide a demonstration and testing environment for the query-count-sk element.
    • The demo sets up a fetch-mock to simulate server responses, allowing for isolated testing of the component's behavior.
    • It showcases how to instantiate the element and interact with its attributes (url, current_query).
    • The presence of <error-toast-sk> in the demo suggests that this is the intended mechanism for displaying errors surfaced by errorMessage.
  • index.ts: A simple entry point that imports and registers the query-count-sk custom element, making it available for use in an HTML page.

Workflow for Displaying Query Count:

  1. Initialization:

    • The query-count-sk element is added to the DOM.
    • The url attribute (pointing to the backend endpoint) is set.
    Page         query-count-sk
     |                |
     |--(Set url)---->|
    
  2. Query Update:

    • The current_query attribute is set or updated (e.g., by user input in another part of the application).
    Page         query-count-sk
     |                |
     |--(Set current_query)-->|
    
  3. Data Fetching:

    • The attributeChangedCallback (or connectedCallback on initial load) triggers the _fetch() method.
    • If a previous fetch is in progress, it's aborted.
    • _requestInProgress is set to true, and the spinner becomes visible.
    • A POST request is made to this.url with the current_query and time range.
    query-count-sk                           Server
        |                                      |
        |--(Set _requestInProgress=true)------>| (Spinner shows)
        |                                      |
        |----(POST / {q: current_query, ...})-->|
    
  4. Response Handling:

    • Success:

      • The server responds with JSON: { count: N, paramset: {...} }.
      • _count is updated with N.
      • _requestInProgress is set to false (spinner hides).
      • The component re-renders to display the new count.
      • A paramset-changed event is dispatched with the paramset.
      query-count-sk                           Server
          |                                      |
          |<----(HTTP 200, {count, paramset})----|
          |                                      |
          |--(Update _count, _requestInProgress=false)-->| (Spinner hides, count updates)
          |                                      |
          |--(Dispatch 'paramset-changed')------>| (Other components may react)
      
    • Error (e.g., network issue, server error):

      • _requestInProgress is set to false (spinner hides).
      • An error message is displayed (e.g., via error-toast-sk).
      query-count-sk                           Server
          |                                      |
          |<----(HTTP Error or Network Error)----|
          |                                      |
          |--(Set _requestInProgress=false)------>| (Spinner hides)
          |                                      |
          |--(Display error message)------------>|
      
    • Abort:

      • If the fetch was aborted (e.g., new query initiated before completion), the catch block for AbortError is entered.
      • No UI update for count or error display happens; the new fetch operation takes precedence.

The design emphasizes responsiveness by aborting stale requests and provides a clear visual indication of ongoing activity (the spinner). The paramset-changed event promotes loose coupling between components, allowing other parts of the application to adapt based on the query results without direct dependencies on query-count-sk's internal implementation.

Module: /modules/regressions-page-sk

The regressions-page-sk module provides a user interface for viewing and managing performance regressions. It allows users to select a “subscription” (often representing a team or area of ownership, like “Sheriff Config 1”) and then displays a list of detected performance anomalies (regressions or improvements) associated with that subscription.

The core functionality revolves around fetching and displaying this data in a user-friendly way.

Key Responsibilities and Components:

  • regressions-page-sk.ts: This is the main TypeScript file that defines the RegressionsPageSk custom HTML element.

    • State Management (State interface, stateReflector): The component maintains its UI state (selected subscription, whether to show triaged items or improvements, and a flag for using a Skia-specific backend) in the state object. The stateReflector utility is crucial here. It synchronizes this internal state with the URL query parameters. This means a user can bookmark a specific view (e.g., a particular subscription with improvements shown) and share it, or refresh the page and return to the same state.
    • Why stateReflector? It provides a clean way to manage application state that needs to be persistent across page loads and shareable via URLs, without manually parsing and updating the URL.
    • Data Fetching (fetchRegressions, init):
    • The init method is called during component initialization and whenever the state changes significantly (like selecting a new subscription). It fetches the list of available subscriptions (sheriff lists) from either a legacy endpoint (/_/anomalies/sheriff_list) or a Skia-specific one (/_/anomalies/sheriff_list_skia) based on the state.useSkia flag. The fetched subscriptions are then sorted alphabetically for display in a dropdown.
    • The fetchRegressions method is responsible for fetching the actual anomaly data. It constructs a query based on the current state (selected subscription, filters for triaged/improvements, and a cursor for pagination). It also chooses between legacy and Skia-specific anomaly list endpoints. The fetched anomalies are then appended to the cpAnomalies array, and if a cursor is returned, a “Show More” button is made visible.
    • Why two sets of endpoints (legacy vs. Skia)? This suggests a migration path or different data sources/backends being supported, allowing the component to adapt based on configuration.
    • Rendering (template, _render): The component uses lit-html for templating. The template static method defines the HTML structure, which includes:
    • A dropdown (<select id="filter">) to choose a subscription.
    • Buttons to toggle the display of triaged items and improvements.
    • A <subscription-table-sk> to display details about the selected subscription and its associated alerts.
    • An <anomalies-table-sk> to display the list of anomalies/regressions.
    • Spinners (spinner-sk) to indicate loading states.
    • A “Show More” button for paginating through anomalies.
    • The _render() method (implicitly called by ElementSk when properties change) re-renders the component with the latest data.
    • Event Handling (filterChange, triagedChange, improvementChange): These methods handle user interactions like selecting a subscription or toggling filters. They update the component's state, trigger stateHasChanged (which in turn updates the URL and can re-fetch data), and then explicitly call fetchRegressions and _render to reflect the changes.
    • Legacy Regression Display (getRegTemplate, regRowTemplate): There's also code related to displaying regressions directly in a table within this component (the regressions property and getRegTemplate). However, the primary display of anomalies seems to be delegated to anomalies-table-sk. This older regression display logic might be for a previous version or a specific use case not currently active in the demo. The isRegressionImprovement static method determines if a given regression object represents an improvement based on direction and cluster type.
  • anomalies-table-sk (external dependency): This component is responsible for rendering the detailed table of anomalies. regressions-page-sk fetches the anomaly data and then passes it to anomalies-table-sk for display. This promotes modularity, separating data fetching/management from presentation.

  • subscription-table-sk (external dependency): This component displays information about the currently selected subscription, including any configured alerts. Similar to anomalies-table-sk, it receives data from regressions-page-sk.

  • regressions-page-sk.scss: Provides styling for the regressions-page-sk component, including colors for positive/negative changes and styles for spinners and buttons.

  • regressions-page-sk-demo.html and regressions-page-sk-demo.ts: These files set up a demonstration page for the regressions-page-sk component.

    • regressions-page-sk-demo.ts is particularly important for understanding how the component is intended to be used and tested. It initializes a global window.perf object with configuration settings that the main component might rely on (though direct usage isn‘t evident in regressions-page-sk.ts itself, it’s a common pattern in Perf).
    • It uses fetchMock to simulate API responses for /users/login/status, /_/subscriptions, and /_/regressions (which seems to be an older endpoint pattern compared to what regressions-page-sk.ts uses). This mocking is crucial for creating a standalone demo environment.
    • Why fetchMock? It allows developers to work on and test the UI component without needing a live backend, ensuring predictable data and behavior for demos and tests.

Workflow for Displaying Regressions:

  1. Initialization (connectedCallback, init):

    • regressions-page-sk element is added to the DOM.
    • stateReflector is set up to read initial state from URL or use defaults.
    • init() is called:
      • Fetches the list of available subscriptions (e.g., “Sheriff Config 1”, “Sheriff Config 2”).
      • Populates the subscription dropdown (<select id="filter">).
  2. User Selects a Subscription (filterChange):

    • User selects “Sheriff Config 2” from the dropdown.
    • filterChange("Sheriff Config 2") is triggered.
    • state.selectedSubscription is updated to “Sheriff Config 2”.
    • cpAnomalies is cleared, anomalyCursor is reset.
    • stateHasChanged() is called, updating the URL (e.g., ?selectedSubscription=Sheriff%20Config%202).
    • fetchRegressions() is called.
  3. Fetching Anomalies (fetchRegressions):

    • An API request is made to /_/anomalies/anomaly_list?sheriff=Sheriff%20Config%202 (or the Skia equivalent).
    • A loading spinner is shown.
    • The server responds with a list of anomalies and potentially a cursor for pagination.
  4. Displaying Anomalies:

    • The fetched anomalies are appended to this.cpAnomalies.
    • The subscriptionTable is updated with subscription details and alerts from the response.
    • The anomaliesTable (the anomalies-table-sk instance) is populated with this.cpAnomalies.
    • If a cursor was returned, the “Show More” button becomes visible.
    • Loading spinner is hidden.
    • The component re-renders.
    User Action                      Component State                     API Interaction                      UI Update
    -----------                      ---------------                     ---------------                      ---------
    Page Load
      |
      V
    regressions-page-sk.init()
      |                              state = {selectedSubscription:''}
      V
    fetch('/_/anomalies/sheriff_list') -> ["Sheriff1", "Sheriff2"]
      |                              subscriptionList = ["Sheriff1", "Sheriff2"]
      V
                                                                                                       Populate dropdown
                                                                                                       Disable filter buttons
    
    Selects "Sheriff1"
      |
      V
    regressions-page-sk.filterChange("Sheriff1")
      |                              state = {selectedSubscription:'Sheriff1', ...}
      |                              (URL updates via stateReflector)
      V
    regressions-page-sk.fetchRegressions()
      |                              anomaliesLoadingSpinner = true
      V
    fetch('/_/anomalies/anomaly_list?sheriff=Sheriff1') -> {anomaly_list: [...], anomaly_cursor: 'cursor123'}
      |                              cpAnomalies = [...], anomalyCursor = 'cursor123', showMoreAnomalies = true
      |                              anomaliesLoadingSpinner = false
      V
                                                                                                       Update anomaliesTable
                                                                                                       Update subscriptionTable
                                                                                                       Show "Show More" button
                                                                                                       Enable filter buttons
    Clicks "Show More"
      |
      V
    regressions-page-sk.fetchRegressions() (called by button click)
      |                              showMoreLoadingSpinner = true
      V
    fetch('/_/anomalies/anomaly_list?sheriff=Sheriff1&anomaly_cursor=cursor123') -> {anomaly_list: [more...], anomaly_cursor: null}
      |                              cpAnomalies = [all...], anomalyCursor = null, showMoreAnomalies = false
      |                              showMoreLoadingSpinner = false
      V
                                                                                                       Update anomaliesTable (append)
                                                                                                       Hide "Show More" button
    
  5. Toggling Filters (e.g., “Show Triaged”, triagedChange):

    • User clicks “Show Triaged”.
    • triagedChange() is triggered.
    • state.showTriaged is toggled.
    • Button text updates (e.g., to “Hide Triaged”).
    • stateHasChanged() updates the URL (e.g., ?selectedSubscription=Sheriff%20Config%202&showTriaged=true).
    • fetchRegressions() is called again, this time with triaged=true in the query.
    • The UI updates with the newly filtered list of anomalies.

The design separates concerns: regressions-page-sk handles overall page logic, state, and orchestration of data fetching, while specialized components like anomalies-table-sk and subscription-table-sk handle the rendering of specific data views. The use of stateReflector ensures the UI state is bookmarkable and shareable. The demo files with fetchMock are critical for isolated development and testing of the UI component.

Module: /modules/report-page-sk

The report-page-sk module is designed to display a detailed report page for performance anomalies. Its primary purpose is to provide users with a comprehensive view of selected anomalies, including their associated graphs and commit information, facilitating the analysis and understanding of performance regressions or improvements.

At its core, the report-page-sk element orchestrates the display of several key pieces of information. It fetches anomaly data from a backend endpoint (/_/anomalies/group_report) based on URL parameters (like revision, anomaly IDs, bug ID, etc.). This data is then used to populate an anomalies-table-sk element, which presents a tabular view of the anomalies.

A crucial design decision is the use of an AnomalyTracker class. This class is responsible for managing the state of each anomaly, including whether it's selected (checked) by the user, its associated graph, and the relevant time range for graphing. This separation of concerns keeps the main ReportPageSk class cleaner and focuses its responsibilities on rendering and user interaction.

When an anomaly is selected in the table, report-page-sk dynamically generates and displays an explore-simple-sk graph for that anomaly. The explore-simple-sk element is configured to show data around the anomaly's occurrence, typically a week before and after, to provide context. If multiple anomalies are selected, their graphs are displayed, and their heights are adjusted to fit the available space. A key feature is the synchronized X-axis across all displayed graphs, ensuring a consistent time scale for comparison.

The page also attempts to identify and display common commits related to the selected anomalies. It fetches commit details using the lookupCids function and highlights commits that appear to be “roll” commits (e.g., “Roll repo from hash to hash”). For these roll commits, it provides a link to the underlying commit or the parent commit if the roll pattern is not directly parseable from the commit message, which can be helpful for developers to trace the source of a change.

Key Components and Responsibilities:

  • report-page-sk.ts: This is the main TypeScript file defining the ReportPageSk custom element.

    • ReportPageSk class:
    • Initialization: Fetches default configurations (/_/defaults/) and then anomaly data based on URL parameters.
    • Anomaly Management: Uses an AnomalyTracker instance to manage the state of individual anomalies (selected, graphed, time range).
    • Rendering: Dynamically renders the anomalies-table-sk and explore-simple-sk graphs based on user interactions and fetched data. It uses the lit-html library for templating.
    • Event Handling: Listens for anomalies_checked events from the anomalies-table-sk to update the displayed graphs. It also handles x-axis-toggled events from explore-simple-sk to synchronize the x-axis across multiple graphs.
    • Graph Generation: When an anomaly is selected, it creates an explore-simple-sk instance, configures its query based on the anomaly's test path, and sets the appropriate time range.
    • Commit Information: Fetches commit details relevant to the anomalies and displays a list of common commits, with special handling for “roll” commits.
    • Spinner: Shows a loading spinner (spinner-sk) during data fetching operations.
    • AnomalyTracker class:
    • State Management: Stores AnomalyDataPoint objects, each containing an Anomaly, its checked status, its associated ExploreSimpleSk graph instance (if any), and its Timerange.
    • Loading Data: Populates its internal tracker from a list of anomalies and their corresponding time ranges.
    • Accessors: Provides methods to get individual anomaly data, set/unset graphs, and retrieve lists of all or selected anomalies. This abstraction is key to decoupling the graph display logic from the raw anomaly data.
    • AnomalyDataPoint interface: Defines the structure for storing information about a single anomaly within the AnomalyTracker.
  • report-page-sk.scss: Contains the SASS/CSS styles for the report-page-sk element, including styling for the common commits section and the dialog for displaying all commits (though the dialog itself is not fully implemented in the provided showAllCommitsTemplate).

  • Data Fetching Workflow:

    1. ReportPageSk element is connected to the DOM.
    2. URL parameters (e.g., rev, anomalyIDs, bugID) are read.
    3. fetchAnomalies() is called.
      • POST request to /_/anomalies/group_report with URL parameters in the body.
      • Backend responds with anomaly_list, timerange_map, and selected_keys.
    4. AnomalyTracker is loaded with this data.
    5. anomalies-table-sk is populated.
    6. Graphs for initially selected anomalies are rendered.
  • User Interaction Workflow (Selecting an Anomaly):

    1. User checks/unchecks an anomaly in anomalies-table-sk.
    2. anomalies-table-sk fires an anomalies_checked custom event with the anomaly and its checked state.
    3. ReportPageSk listens for this event.
    4. updateGraphs() is called:
      • If checked and no graph exists:
      • addGraph() is called.
      • A new explore-simple-sk instance is created and configured.
      • The graph is added to the DOM.
      • The AnomalyTracker is updated with the new graph instance.
      • If unchecked and a graph exists:
      • The graph is removed from the DOM.
      • The AnomalyTracker is updated to remove the graph reference.
    5. updateChartHeights() is called to adjust the height of all visible graphs.

The design emphasizes dynamic content loading and interactive exploration. By using separate custom elements for the table (anomalies-table-sk) and graphs (explore-simple-sk), the module maintains a good separation of concerns and leverages reusable components. The AnomalyTracker further enhances this by encapsulating the state and logic related to individual anomalies.

Module: /modules/revision-info-sk

The revision-info-sk custom HTML element is designed to display information about anomalies detected around a specific revision. This is particularly useful for understanding the impact of a code change on performance metrics.

The core functionality revolves around fetching and presenting RevisionInfo objects. A RevisionInfo object contains details like the benchmark, bot, bug ID, start and end revisions of an anomaly, the associated test, and links to explore the anomaly further.

Key Components and Workflow:

  1. revision-info-sk.ts: This is the main TypeScript file defining the RevisionInfoSk element.

    • State Management: The element maintains its state in a State object, primarily storing the revisionId. It utilizes stateReflector from infra-sk/modules/statereflector to keep the URL in sync with the element‘s state. This allows users to share links that directly open to a specific revision’s information.

      • URL change -> stateReflector updates State.revisionId -> getRevisionInfo() is called
      • User types revision ID and clicks "Get Revision Information" -> State.revisionId updated -> stateReflector updates URL -> getRevisionInfo() is called
    • Data Fetching (getRevisionInfo): When a revision ID is provided (either via URL or user input), this method is triggered.

      • It displays a spinner (spinner-sk) to indicate loading.
      • It makes a fetch request to the /_/revision/?rev=<revisionId> endpoint.
      • The JSON response, an array of RevisionInfo objects, is parsed using jsonOrThrow.
      • The fetched revisionInfos are stored, and the UI is re-rendered to display the information.
    • Rendering (template, getRevInfosTemplate, revInfoRowTemplate): Lit-html is used for templating.

      • The main template (template) includes an input field for the revision ID, a button to trigger fetching, a spinner, and a container for the revision information.
      • getRevInfosTemplate generates an HTML table if revisionInfos is populated. This table includes a header row with a “select all” checkbox and columns for bug ID, revision range, master, bot, benchmark, and test.
      • revInfoRowTemplate renders each individual RevisionInfo as a row in the table. Each row has a checkbox for selection, a link to the bug (if any), a link to explore the anomaly, and the other relevant details.
    • Multi-Graph Functionality: The element allows users to select multiple detected anomaly ranges and view them together on a multi-graph page.

      • Selection: Checkboxes (checkbox-sk) are provided for each revision info row and a “select all” checkbox. The toggleSelectAll method handles the logic for the master checkbox.
      • updateMultiGraphStatus: This method is called whenever a checkbox state changes. It checks if any revisions are selected and enables/disables the “View Selected Graph(s)” button accordingly. It also updates the selectAll state if no individual revisions are checked.
      • getGraphConfigs: This helper function takes an array of selected RevisionInfo objects and transforms them into an array of GraphConfig objects. Each GraphConfig contains the query string associated with the anomaly.
      • getMultiGraphUrl: This asynchronous method constructs the URL for the multi-graph view.
      • It calls getGraphConfigs to get the configurations for the selected revisions.
      • It calls updateShortcut (from explore-simple-sk) to generate a shortcut ID for the combined graph configurations. This typically involves a POST request to /_/shortcut/update.
      • It determines the overall time range (begin and end timestamps) encompassing all selected anomalies.
      • It gathers all unique anomaly_ids from the selected revisions to highlight them on the multi-graph page.
      • It constructs the final URL, including the begin, end timestamps, the shortcut ID, the totalGraphs, and highlight_anomalies parameters.
      • viewMultiGraph: This method is called when the “View Selected Graph(s)” button is clicked.
      • It gathers all checked RevisionInfo objects.
      • It calls getMultiGraphUrl to generate the redirect URL.
      • If a URL is successfully generated, it navigates the current window (window.open(url, '_self')) to the multi-graph page. If not, it displays an error message.
    • Styling (revision-info-sk.scss): Provides basic styling for the element, such as left-aligning table headers and styling the spinner.

  2. index.ts: Simply imports and thereby registers the revision-info-sk custom element.

  3. Demo Page (revision-info-sk-demo.html, revision-info-sk-demo.ts, revision-info-sk-demo.scss):

    • Provides a simple HTML page to showcase the revision-info-sk element.
    • The revision-info-sk-demo.ts file uses fetch-mock to mock the /_/revision/ API endpoint. This is crucial for demonstrating the element's functionality without needing a live backend. When the demo page loads and the user interacts with the element (e.g., enters a revision ID ‘12345’), the mocked response is returned.

Design Decisions and Rationale:

  • Custom Element: Encapsulating this functionality as a custom element (<revision-info-sk>) promotes reusability across different parts of the Perf application or potentially other Skia web applications.
  • State Reflection: Using stateReflector enhances user experience by allowing direct navigation to a revision's details via URL and updating the URL as the user interacts with the element. This makes sharing and bookmarking specific views straightforward.
  • Lit-html for Templating: Lit-html is chosen for its efficiency and declarative approach to building UIs, making the rendering logic concise and maintainable.
  • Asynchronous Operations: Data fetching and shortcut generation are asynchronous operations. The use of async/await makes the code easier to read and manage compared to traditional Promise chaining.
  • Dedicated Multi-Graph URL Generation: The logic for constructing the multi-graph URL is encapsulated in getMultiGraphUrl. This separates concerns and makes the process of generating the complex URL clearer. It relies on the explore-simple-sk module's updateShortcut function, promoting reuse of existing shortcut generation logic.
  • Error Handling: jsonOrThrow is used to simplify error handling for fetch requests. The viewMultiGraph method also includes basic error handling if the URL generation fails.
  • Clear Separation of Concerns: The element focuses on displaying revision information and providing navigation to related views (bug tracker, explore page, multi-graph view). It doesn't concern itself with the details of how anomalies are detected or how the multi-graph page itself functions.

Workflow for Displaying Revision Information:

User Interaction / URL Change
       |
       v
[revision-info-sk] stateReflector updates internal 'state.revisionId'
       |
       v
[revision-info-sk] getRevisionInfo() called
       |
       +--------------------------------+
       |                                |
       v                                v
[revision-info-sk] shows spinner     [revision-info-sk] makes fetch request to `/_/revision/?rev=<ID>`
       |                                |
       |                                v
       |                          [Backend] processes request, returns RevisionInfo[]
       |                                |
       |                                v
       +------------------> [revision-info-sk] receives JSON response, parses with jsonOrThrow
                                        |
                                        v
                         [revision-info-sk] stores 'revisionInfos', hides spinner
                                        |
                                        v
                         [revision-info-sk] re-renders using Lit-html templates to display table

Workflow for Viewing Multi-Graph:

User selects one or more revision info rows (checkboxes)
       |
       v
[revision-info-sk] updateMultiGraphStatus() enables "View Selected Graph(s)" button
       |
       v
User clicks "View Selected Graph(s)" button
       |
       v
[revision-info-sk] viewMultiGraph() called
       |
       v
[revision-info-sk] collects selected RevisionInfo objects
       |
       v
[revision-info-sk] calls getMultiGraphUrl(selectedRevisions)
       |
       +------------------------------------------------------+
       |                                                      |
       v                                                      v
[getMultiGraphUrl] calls getGraphConfigs() to create GraphConfig[] [getMultiGraphUrl] calls updateShortcut(GraphConfig[])
       |                                                      | (makes POST to /_/shortcut/update)
       |                                                      v
       |                                                [Backend] returns shortcut ID
       |                                                      |
       +-------------------------------------> [getMultiGraphUrl] constructs final URL (with begin, end, shortcut, anomaly IDs)
                                                              |
                                                              v
                                     [viewMultiGraph] receives the multi-graph URL
                                                              |
                                                              v
                                     [Browser] navigates to the generated multi-graph URL

Module: /modules/split-chart-menu-sk

The split-chart-menu-sk module provides a user interface element for selecting an attribute by which to split a chart. This is particularly useful in data visualization scenarios where users need to break down aggregated data into smaller, more specific views. For example, in a performance monitoring dashboard, a user might want to see performance metrics split by benchmark, specific test case (story), or sub-component (subtest).

The core functionality revolves around presenting a list of available attributes to the user in a dropdown menu. These attributes are dynamically derived from the underlying data. When an attribute is selected, the component emits an event, allowing other parts of the application to react and update the chart display accordingly.

Key Components and Design:

  • split-chart-menu-sk.ts: This is the main TypeScript file that defines the SplitChartMenuSk LitElement.

    • Data Consumption: The component utilizes the Lit context API (@consume) to access data from two sources: dataframeContext and dataTableContext.
    • dataframeContext provides the DataFrame (from //perf/modules/json:index_ts_lib and //perf/modules/dataframe:dataframe_context_ts_lib). The DataFrame is the source from which the list of available attributes for splitting is derived. This design decouples the menu from the specifics of data fetching and management, allowing it to focus solely on the UI aspect of attribute selection. The getAttributes function (from //perf/modules/dataframe:traceset_ts_lib) is used to extract these attributes.
    • dataTableContext provides DataTable (also from //perf/modules/dataframe:dataframe_context_ts_lib). While consumed, its direct usage within this specific component‘s rendering logic isn’t immediately apparent in the provided render method, but it might be used by other parts of the application or for future enhancements.
    • User Interaction:
    • A Material Design outlined button (<md-outlined-button>) labeled “Split By” serves as the trigger to open the menu.
    • The menu itself is a Material Design menu (<md-menu>), which is populated with <md-menu-item> elements, one for each attribute retrieved from the DataFrame.
    • The menuOpen state property controls the visibility of the menu. Clicking the button toggles this state. The menu also closes itself via the @closed event.
    • Event Emission: When a user clicks on a menu item, the bubbleAttribute method is called. This method dispatches a custom event named split-chart-selection.
    • The event detail (SplitChartSelectionEventDetails) contains the selected attribute (a string).
    • The event is configured to bubble (bubbles: true) and pass through shadow DOM boundaries (composed: true), making it easy for ancestor elements to listen and react to the selection. This event-driven approach is crucial for decoupling the menu from the chart component or any other component that needs to know about the selected split attribute.
    • Styling: Styles are imported from split-chart-menu-sk.css.ts (style). This keeps the component's presentation concerns separate from its logic. The styles ensure the component is displayed as an inline block and sets a default background color, also styling the Material button.
  • split-chart-menu-sk.css.ts: This file defines the CSS styles for the component using Lit‘s css tagged template literal. The primary styling focuses on the host element’s positioning and background, and customizing the Material Design button's border radius.

  • index.ts: This file simply imports and registers the split-chart-menu-sk custom element, making it available for use in HTML.

Workflow: Selecting a Split Attribute

  1. Initialization:

    • The split-chart-menu-sk component is rendered.
    • It consumes the DataFrame from the dataframeContext.
    • The getAttributes() method is called (implicitly via the render method's map function) to populate the list of attributes for the menu.
  2. User Interaction:

    • User clicks the “Split By” button.
    • menuClicked handler is invoked -> this.menuOpen becomes true.
    • The <md-menu> component becomes visible, displaying the list of attributes.
    User          split-chart-menu-sk        DataFrame
    |                    |                       |
    |---Clicks "Split By"->|                       |
    |                    |---Toggles menuOpen=true-->|
    |                    |                       |
    |                    |<--Displays Menu-------|
    |                    |                       |
    
  3. Attribute Selection:

    - User clicks on an attribute in the menu (e.g., "benchmark").
    - The `click` handler on `<md-menu-item>` calls
      `this.bubbleAttribute("benchmark")`.
    - `bubbleAttribute` creates a `CustomEvent('split-chart-selection', {
    

    detail: { attribute: “benchmark” } })`. - The event is dispatched.

    ```
    User          split-chart-menu-sk        (Parent Component)
    |                    |                       |
    |---Clicks "benchmark"->|                       |
    |                    |---Calls bubbleAttribute("benchmark")-->|
    |                    |                       |
    |                    |---Dispatches "split-chart-selection" event--> (Listens for event)
    |                    |                       |                       |
    |                    |                       |                       |---Handles event, updates chart
    ```
    
  4. Menu Closes:

    • The <md-menu> component emits a closed event.
    • The menuClosed handler is invoked -> this.menuOpen becomes false.

This design ensures that split-chart-menu-sk is a self-contained, reusable UI component whose sole responsibility is to provide a way to select a splitting attribute and communicate that selection to the rest of the application via a well-defined event. The use of context for data consumption and custom events for output makes it highly decoupled and easy to integrate.

The demo page (split-chart-menu-sk-demo.html and split-chart-menu-sk-demo.ts) demonstrates how to use the component and listen for the split-chart-selection event. The Puppeteer test (split-chart-menu-sk_puppeteer_test.ts) provides a basic smoke test and a visual regression test by taking a screenshot.

Module: /modules/subscription-table-sk

The subscription-table-sk module provides a custom HTML element designed to display information about a “subscription” and its associated “alerts”. This is particularly useful in contexts where users need to understand the configuration of automated monitoring or alerting systems.

The core functionality is encapsulated within the subscription-table-sk.ts file, which defines the SubscriptionTableSk custom element. This element is built using Lit, a library for creating fast, lightweight web components.

Why and How:

The primary goal is to present complex subscription and alert data in a user-friendly and interactive manner. Instead of a static display, this component allows for toggling the visibility of the detailed alert configurations. This design choice avoids overwhelming the user with too much information upfront, providing a cleaner initial view focused on the subscription summary.

The SubscriptionTableSk element takes Subscription and Alert[] objects as input. The Subscription object contains general information like name, contact email, revision, bug tracking details (component, hotlists, priority, severity, CC emails). The Alert[] array holds detailed configurations for individual alerts, including their query parameters, step algorithm, radius, and other specific settings.

Key Responsibilities and Components:

  • subscription-table-sk.ts:
    • SubscriptionTableSk class: This is the heart of the module. It extends ElementSk, a base class for Skia custom elements.
    • Data Handling: It stores the subscription and alerts data internally.
    • Rendering Logic (template static method): It uses Lit's html tagged template literal to define the structure and content of the element. It conditionally renders the subscription details and the alerts table based on the available data and the showAlerts state.
      • Subscription details are always visible if a subscription is loaded.
      • The alerts table is only rendered if showAlerts is true. This state is toggled by a button.
    • load(subscription: Subscription, alerts: Alert[]) method: This public method is the primary way to feed data into the component. It updates the internal state and triggers a re-render.
    • toggleAlerts() method: This method flips the showAlerts boolean flag and triggers a re-render, effectively showing or hiding the alerts table.
    • formatRevision(revision: string) method: A helper function to display the revision string as a clickable link, pointing to a specific configuration file URL. This improves usability by allowing users to quickly navigate to the source of the configuration.
    • paramset-sk integration: For displaying the alert query, it utilizes the paramset-sk element. The toParamSet utility function (from infra-sk/modules/query) is used to convert the query string into a format suitable for paramset-sk, which then renders it as a structured set of key-value pairs. This enhances readability of complex query strings.
    • Styling (subscription-table-sk.scss): This file defines the visual appearance of the element. It uses SCSS and imports styles from shared libraries (themes_sass_lib, buttons_sass_lib, select_sass_lib) to maintain a consistent look and feel with other Skia elements. The styles focus on clear presentation of information, with distinct sections for subscription details and the alerts table.

Workflow: Displaying Subscription and Alerts

  1. Initialization: An instance of subscription-table-sk is added to the DOM. <subscription-table-sk></subscription-table-sk>
  2. Data Loading: External code (e.g., in a demo page or a larger application) calls the load() method on the element instance, passing in the Subscription object and an array of Alert objects. element.load(mySubscriptionData, myAlertsData);
  3. Initial Render:
    • The SubscriptionTableSk element updates its internal subscription and alerts properties.
    • showAlerts is set to false by default upon loading new data.
    • The _render() method is called (implicitly by Lit or explicitly).
    • The template function generates the HTML:
      • Subscription details (name, email, revision, etc.) are displayed.
      • A button labeled “Show [N] Alert Configurations” is displayed.
      • The alerts table is not rendered yet.
  4. User Interaction (Toggling Alerts): - The user clicks the “Show [N] Alert Configurations” button. - The click event triggers the toggleAlerts() method. - showAlerts becomes true. - _render() is called again. - The template function now also renders the <table id="alerts-table">. - The table header is displayed. - For each Alert object in ele.alerts: - A table row (<tr>) is created. - Cells (<td>) display various alert properties (step algorithm, radius, k, etc.). - The alert query is passed to a <paramset-sk> element for structured display. - The button label changes to “Hide Alert Configurations”.
  5. Further Toggling: Clicking the button again will hide the table, and the label will revert.

Diagram: Data Flow and Rendering

External Code ---> subscriptionTableSkElement.load(subscription, alerts)
                     |
                     V
SubscriptionTableSk Internal State:
  - this.subscription = subscription
  - this.alerts = alerts
  - this.showAlerts = false (initially or after load)
                     |
                     V
_render() ------> Lit Template Evaluation
                     |
     -------------------------------------
    |                                     |
    V (if this.subscription is not null)  V (if this.showAlerts is true)
Render Subscription Details             Render Alerts Table
  - Name, Email, Revision (formatted link)  - Iterate through this.alerts
  - Bug info, Hotlists, CCs                 - For each alert:
  - "Show/Hide Alerts" Button                 - Display properties in <td>
                                              - Use <paramset-sk> for query

Demo Page (subscription-table-sk-demo.html, subscription-table-sk-demo.ts)

The demo page serves as an example and testing ground.

  • subscription-table-sk-demo.html: Sets up the basic HTML structure, including instances of subscription-table-sk (one for light mode, one for dark mode to test theming) and buttons to interact with them. It also includes an error-toast-sk for displaying potential errors.
  • subscription-table-sk-demo.ts: Contains JavaScript to:
    • Import and register the subscription-table-sk element.
    • Define sample Subscription and Alert data.
    • Add event listeners to the “Populate Tables” button, which calls the load() method on the subscription-table-sk instances with the sample data.
    • Add event listeners to the “Toggle Alerts Table” button, which calls the toggleAlerts() method on the instances.

This setup allows developers to see the component in action and verify its functionality with predefined data.

Module: /modules/test-picker-sk

The test-picker-sk module provides a custom HTML element, <test-picker-sk>, designed to guide users in selecting a valid trace or test for plotting. It achieves this by presenting a series of dependent input fields, where the options available in each field are dynamically updated based on selections made in previous fields. This ensures that users can only construct valid combinations of parameters.

Core Functionality and Design:

The primary goal of test-picker-sk is to simplify the process of selecting a specific data series (a “trace” or “test”) from a potentially large and complex dataset. This is often necessary in performance analysis tools where data is categorized by multiple parameters (e.g., benchmark, bot, specific test, sub-test variations).

The design enforces a specific order for filling out these parameters. This hierarchical approach is crucial because the valid options for a parameter often depend on the values chosen for its preceding parameters.

Key Components and Responsibilities:

  • test-picker-sk.ts: This is the heart of the module, defining the TestPickerSk custom element.

    • FieldInfo class: This internal class is a simple data structure used to manage the state of each individual input field within the picker. It stores a reference to the PickerFieldSk element, the parameter name (e.g., “benchmark”, “bot”), and the currently selected value.
    • Dynamic Field Generation (addChildField): When a value is selected in a field, and if there are more parameters in the hierarchy, a new PickerFieldSk input is dynamically added to the UI. The options for this new field are fetched from the backend. This progressive disclosure prevents overwhelming the user with too many options at once.
    • Backend Communication (callNextParamList): The element interacts with a backend endpoint (/_/nextParamList/). This endpoint is responsible for:
    • Providing the list of valid options for the next input field based on the current selections.
    • Returning a count of how many unique traces/tests match the current partial or complete selection.
    • State Management (_fieldData, _currentIndex): The _fieldData array holds FieldInfo objects for each parameter field. _currentIndex tracks which field is currently active or the next to be added.
    • Event Handling (value-changed, plot-button-clicked):
    • It listens for value-changed events from its child picker-field-sk elements. When a value changes, it triggers logic to update subsequent fields and the match count.
    • It emits a plot-button-clicked custom event when the user clicks the “Add Graph” button. This event includes the fully constructed query string representing the selected trace.
    • Query Population (populateFieldDataFromQuery): This method allows the picker to be initialized with a pre-existing query string. It will populate the fields sequentially based on the query parameters. If a parameter in the hierarchy is missing or empty in the query, the population stops at that point.
    • Plotting Logic (onPlotButtonClick, PLOT_MAXIMUM): The “Add Graph” button is enabled only when the number of matching traces is within a manageable range (greater than 0 and less than or equal to PLOT_MAXIMUM). This prevents users from attempting to plot an overwhelming number of traces.
    • Rendering and UI Updates: The component uses Lit library for templating and re-renders itself when its internal state changes (e.g., new fields added, count updated, request in progress). It also manages the enabled/disabled state of input fields during backend requests.
  • picker-field-sk (Dependency): While not part of this module, test-picker-sk heavily relies on the picker-field-sk element. Each parameter in the test picker is represented by an instance of picker-field-sk. This child component is responsible for displaying a label, an input field, and a dropdown menu of selectable options.

  • test-picker-sk.scss: Defines the visual styling for the test-picker-sk element and its internal components, ensuring a consistent look and feel. It styles the layout of the fields, the match count display, and the plot button.

Workflow: User Selecting a Test

  1. Initialization (initializeTestPicker):

    • test-picker-sk is given an ordered list of parameter names (e.g., ['benchmark', 'bot', 'test']) and optional default parameters.
    • test-picker-sk -> Backend (/_/nextParamList/): Requests options for the first parameter (e.g., “benchmark”) with an empty query.
    User Interface:                 Backend:
    [test-picker-sk]
        |
        initializeTestPicker(['benchmark', 'bot', 'test'], {})
        |
        ---> POST /_/nextParamList/ (q="")
                                        |
                                        (Processes request, queries data source)
                                        |
             <--- {paramset: {benchmark: ["b1", "b2"]}, count: 100}
        |
        (Renders first PickerFieldSk for "benchmark" with options "b1", "b2")
        [Benchmark: [select ▼]] [Matches: 100] [Add Graph (disabled)]
    
  2. User Selects a Value:

    • The user selects “b1” for “benchmark”.
    • The picker-field-sk for “benchmark” emits a value-changed event.
    • test-picker-sk -> Backend: Requests options for the next parameter (“bot”), now including the selection benchmark=b1 in the query.
    User Interface:
    [Benchmark: [b1      ▼]]
        | (value-changed: {value: "b1"})
    [test-picker-sk]
        |
        ---> POST /_/nextParamList/ (q="benchmark=b1")
                                        |
                                        (Processes request, filters based on benchmark=b1)
                                        |
             <--- {paramset: {bot: ["botX", "botY"]}, count: 20}
        |
        (Renders PickerFieldSk for "bot" with options "botX", "botY")
        [Benchmark: [b1      ▼]] [Bot: [select ▼]] [Matches: 20] [Add Graph (disabled)]
    
  3. Process Repeats: This continues for each parameter in the hierarchy.

  4. Final Selection and Plotting:

    • Once all necessary parameters are selected (or the user chooses to stop), the match count reflects the number of specific traces.
    • If the count is within the PLOT_MAXIMUM, the “Add Graph” button enables.
    • User clicks “Add Graph”.
    • test-picker-sk emits plot-button-clicked with the final query (e.g., benchmark=b1&bot=botX&test=testZ).
    User Interface:
    [Benchmark: [b1      ▼]] [Bot: [botX    ▼]] [Test: [testZ   ▼]] [Matches: 5] [Add Graph (enabled)]
        | (User clicks "Add Graph")
    [test-picker-sk]
        |
        emits 'plot-button-clicked' (detail: {query: "benchmark=b1&bot=botX&test=testZ"})
    

Why this Approach?

  • Guided Selection: Prevents users from creating invalid or non-existent trace combinations.
  • Performance: By fetching options incrementally, the backend doesn't need to return massive lists of all possible values for all parameters at once. Queries to the backend are progressively filtered.
  • User Experience: The interface is less cluttered as fields appear only when needed. The match count provides immediate feedback on the specificity of the selection.

The test-picker-sk-demo.html and test-picker-sk-demo.ts files provide a runnable example of the component, mocking the backend /_/nextParamList/ endpoint to showcase its functionality without needing a live backend. This is essential for development and testing. The Puppeteer and Karma tests (test-picker-sk_puppeteer_test.ts, test-picker-sk_test.ts) ensure the component behaves as expected under various conditions.

Module: /modules/themes

The /modules/themes module is responsible for defining the visual styling and theming for the application. It builds upon the base theming provided by infra-sk and introduces application-specific overrides and additions.

Why and How:

The primary goal of this module is to establish a consistent and branded look and feel across the application. Instead of defining all styles from scratch, it leverages the infra-sk theming library as a foundation. This promotes code reuse and ensures that common UI elements have a familiar appearance.

The approach taken is to:

  1. Import Base Styles: The themes.scss file begins by importing the core styles from ../../../infra-sk/themes. This brings in the foundational design system, including color palettes, typography, spacing, and component styles.
  2. Import External Resources: It also imports the Material Icons font library directly from Google Fonts (https://fonts.googleapis.com/icon?family=Material+Icons). This makes a wide range of standard icons readily available for use within the application's UI.
  3. Define Application-Specific Overrides and Additions: The core principle is to only define deltas from the base infra-sk theme and global changes from elements-sk components. This means that themes.scss focuses on styling aspects that are unique to this specific application or require modifications to the default infra-sk appearance.

Key Components and Files:

  • themes.scss: This is the central SCSS (Sassy CSS) file for the module.

    • Responsibility: It orchestrates the application's theme by importing base styles, external resources, and defining application-specific styling rules.
    • Implementation Details:
    • @import '../../../infra-sk/themes';: This line incorporates the foundational theme from the infra-sk library. The relative path indicates that infra-sk is expected to be a sibling or ancestor directory in the project structure.
    • @import url('https://fonts.googleapis.com/icon?family=Material+Icons');: This directive pulls in the Material Icons font stylesheet, enabling the use of standard Google Material Design icons throughout the application.
    • body { margin: 0; padding: 0; }: This is an example of a global override. It resets the default browser margins and padding on the <body> element, providing a cleaner baseline for layout. This is a common practice to ensure consistent spacing across different browsers. Other application-specific styles would follow this pattern, targeting specific elements or defining new CSS classes.
  • BUILD.bazel: This file defines how the themes.scss file is processed and made available to the rest of the application.

    • Responsibility: It uses the sass_library rule (defined in //infra-sk:index.bzl) to compile the SCSS into CSS and declare it as a reusable library.
    • Implementation Details:
    • load("//infra-sk:index.bzl", "sass_library"): Imports the necessary Bazel rule for handling SASS compilation.
    • sass_library(name = "themes_sass_lib", ...): Defines a SASS library target named themes_sass_lib.
      • srcs = ["themes.scss"]: Specifies that themes.scss is the source file for this library.
      • visibility = ["//visibility:public"]: Makes this compiled CSS library accessible to any other part of the project.
      • deps = ["//infra-sk:themes_sass_lib"]: Declares a dependency on the infra-sk SASS library. This is crucial because themes.scss imports styles from infra-sk. The build system needs to know about this dependency to ensure infra-sk styles are available during the compilation of themes.scss.

Workflow (Styling Application):

Browser Request --> HTML Document
                     |
                     v
                 Link to Compiled CSS (from themes_sass_lib)
                     |
                     v
Application of Styles:
  1. Base browser styles
  2. infra-sk/themes.scss styles (imported)
  3. Material Icons styles (imported)
  4. modules/themes/themes.scss overrides & additions (applied last, taking precedence)
                     |
                     v
                 Rendered Page with Application-Specific Theme

In essence, this module provides a layered approach to theming. It starts with a robust base, incorporates external resources like icon fonts, and then applies specific customizations to achieve the desired visual identity for the application. The BUILD.bazel file ensures that these SASS files are correctly processed and made available as CSS to the application during the build process.

Module: /modules/trace-details-formatter

This module provides a mechanism for formatting trace details and converting trace strings into query strings. The core idea is to offer a flexible way to represent and interpret trace information, accommodating different formatting conventions, particularly for Chrome-specific trace structures.

The “why” behind this module stems from the need to handle various trace formats. Different systems or parts of the application might represent trace identifiers (which are essentially a collection of parameters) in distinct ways. This module centralizes the logic for translating between these representations. For example, a compact string representation of a trace might be used in URLs or displays, while a more structured ParamSet is needed for querying data.

The “how” is achieved through an interface TraceFormatter and concrete implementations. This allows for different formatting strategies to be plugged in as needed. The GetTraceFormatter() function acts as a factory, returning the appropriate formatter based on the application's configuration (window.perf.trace_format).

Key Components/Files:

  • traceformatter.ts: This is the central file containing the core logic.

    • TraceFormatter interface: Defines the contract for all trace formatters. It mandates two primary methods:
    • formatTrace(params: Params): string: Takes a Params object (a key-value map representing trace parameters) and returns a string representation of the trace. This is useful for displaying trace identifiers in a user-friendly or system-specific format.
    • formatQuery(trace: string): string: Takes a string representation of a trace and converts it into a query string (e.g., “key1=value1&key2=value2”). This is crucial for constructing API requests to fetch data related to a specific trace.
    • DefaultTraceFormatter class: Provides a basic implementation of TraceFormatter.
    • Its formatTrace method generates a string like “Trace ID: ,key1=value1,key2=value2,...”. This is a generic way to represent the trace parameters.
    • Its formatQuery method currently returns an empty string, indicating that this default formatter doesn't have a specific logic for converting its trace string representation back into a query.
    • ChromeTraceFormatter class: Implements TraceFormatter specifically for traces originating from Chrome's performance infrastructure.
    • Why ChromeTraceFormatter? Chrome's performance data often uses a hierarchical, slash-separated string to identify traces (e.g., master/bot/benchmark/test/subtest_1). This formatter handles this specific convention.
    • keys array: This private property (['master', 'bot', 'benchmark', 'test', 'subtest_1', 'subtest_2', 'subtest_3']) defines the expected order of parameters in the Chrome-style trace string. This order is significant for both formatting and parsing.
    • formatTrace(params: Params): string: It iterates through the predefined keys and constructs a slash-separated string from the corresponding values in the input params. Input Params: { master: "m", bot: "b", benchmark: "bm", test: "t" } keys: [ "master", "bot", "benchmark", "test", ... ] Output String: "m/b/bm/t"
    • formatQuery(trace: string): string: This is the inverse operation. It takes a slash-separated trace string, splits it, and maps the parts back to the predefined keys to build a ParamSet. It then converts this ParamSet into a standard URL query string. - Handling Statistics (Ad-hoc logic for Chromeperf/Skia bridge): A special piece of logic exists within formatQuery related to window.perf.enable_skia_bridge_aggregation. If a trace's ‘test’ value ends with a known statistic suffix (e.g., _avg, _count), this suffix is used to determine the stat parameter in the output query, and the suffix is removed from the ‘test’ parameter. If no such suffix is found, a default stat value of ‘value’ is added. This logic is a temporary measure to bridge formatting differences between Chromeperf and Skia systems and is intended to be removed once Chromeperf is deprecated. Input Trace String (enable_skia_bridge_aggregation = true): "master/bot/benchmark/test_name_max/subtest" Splits into: ["master", "bot", "benchmark", "test_name_max", "subtest"] Processed ParamSet: { master: ["master"], bot: ["bot"], benchmark: ["benchmark"], test: ["test_name"], stat: ["max"], subtest_1: ["subtest"] } Output Query: "master=master&bot=bot&benchmark=benchmark&test=test_name&stat=max&subtest_1=subtest"
    • STATISTIC_SUFFIX_TO_VALUE_MAP: A map used by ChromeTraceFormatter to translate common statistic suffixes (like “avg”, “count”) found in test names to their corresponding “stat” parameter values (like “value”, “count”).
    • traceFormatterRecords: A record (object map) that associates TraceFormat enum values (like '' for default, 'chrome' for Chrome-specific) with their corresponding TraceFormatter instances. This acts as a registry for available formatters.
    • GetTraceFormatter() function: This is the public entry point for obtaining a trace formatter. It reads window.perf.trace_format (a global configuration setting) and returns the appropriate formatter instance from traceFormatterRecords. If the format is not found, it defaults to DefaultTraceFormatter.
    Global Config: window.perf.trace_format = "chrome"
         |
         v
    GetTraceFormatter()
         |
         v
    traceFormatterRecords["chrome"]
         |
         v
    Returns new ChromeTraceFormatter() instance
    
  • traceformatter_test.ts: Contains unit tests for the ChromeTraceFormatter, specifically focusing on the formatQuery method and its logic for handling statistic suffixes under different configurations of window.perf.enable_skia_bridge_aggregation.

This module depends on:

  • infra-sk/modules:query_ts_lib: For the fromParamSet function, used to convert a ParamSet object into a URL query string.
  • perf/modules/json:index_ts_lib: For type definitions like Params, ParamSet, and TraceFormat.
  • perf/modules/paramtools:index_ts_lib: For the makeKey function, used by DefaultTraceFormatter to create a string representation of a Params object.
  • perf/modules/window:window_ts_lib: To access global configuration values like window.perf.trace_format and window.perf.enable_skia_bridge_aggregation.

Module: /modules/triage-menu-sk

The triage-menu-sk module provides a user interface element for managing and triaging anomalies in bulk. It's designed to streamline the process of handling multiple performance regressions or improvements detected in data.

The core purpose of this module is to allow users to efficiently take action on a set of selected anomalies. Instead of interacting with each anomaly individually, this menu provides centralized controls for common triage operations. This is crucial for workflows where many anomalies might be identified simultaneously, requiring a quick and consistent way to categorize or address them.

Key responsibilities and components:

  • triage-menu-sk.ts: This is the heart of the module, defining the TriageMenuSk custom element.

    • Anomaly Aggregation: It receives a list of Anomaly objects and associated trace_names. This allows it to operate on multiple anomalies at once.
    • Action Buttons: It renders buttons for common triage actions:
    • “New Bug”: Triggers the new-bug-dialog-sk element, allowing the user to create a new bug report associated with the selected anomalies.
    • “Existing Bug”: Triggers the existing-bug-dialog-sk element, enabling the user to link the selected anomalies to an already existing bug.
    • “Ignore”: Marks the selected anomalies as “Ignored”. This is useful for anomalies that are deemed not actionable or are false positives.
    • Nudging Functionality:
    • The NudgeEntry class and related logic (generateNudgeButtons, nudgeAnomaly, makeNudgeRequest) allow users to adjust the perceived start and end points of an anomaly. This is a subtle but important feature for refining the automated anomaly detection. The UI presents a set of buttons (e.g., -2, -1, 0, +1, +2) that shift the anomaly's boundaries.
    • The _allowNudge flag controls whether the nudge buttons are visible, allowing for contexts where nudging might not be appropriate (e.g., when multiple, disparate anomalies are selected).
    • State Management: It maintains the state of the selected anomalies (_anomalies, _trace_names) and the nudge options (_nudgeList).
    • Communication with Backend: The makeEditAnomalyRequest and makeNudgeRequest methods handle sending HTTP POST requests to the /_/triage/edit_anomalies endpoint. This endpoint is responsible for persisting the triage decisions (bug associations, ignore status, nudge adjustments) in the backend database.
    • The editAction parameter in makeEditAnomalyRequest can take values like IGNORE, RESET (to de-associate bugs), or implicitly associate with a bug ID when called from the bug dialogs.
    • Event Emission: It emits an anomaly-changed custom event. This event signals to parent components (likely a component displaying a list or plot of anomalies) that one or more anomalies have been modified and their representation needs to be updated. The event detail includes the affected traceNames, the editAction performed, and the updated anomalies.
  • Integration with Dialogs:

    • It directly embeds and interacts with new-bug-dialog-sk and existing-bug-dialog-sk. When the user clicks “New Bug” or “Existing Bug”, this element calls the respective open() methods on these dialog components.
    • It passes the currently selected anomalies and trace names to these dialogs using their setAnomalies methods, so the dialogs know which anomalies the bug report will be associated with.
  • triage-menu-sk.html (Implicit via Lit template in .ts): Defines the visual structure of the menu, including the layout of the action buttons and the nudge buttons. The rendering is dynamic based on the number of selected anomalies and whether nudging is allowed.

  • triage-menu-sk.scss: Provides the styling for the menu, ensuring it integrates visually with the surrounding application.

Key Workflow Example (Ignoring Anomalies):

  1. User Selects Anomalies: In a parent component (e.g., a plot or a list), the user selects one or more anomalies.
  2. triage-menu-sk Receives Data: The parent component calls triageMenuSkElement.setAnomalies(selectedAnomalies, correspondingTraceNames, nudgeOptions).
  3. Menu Updates: triage-menu-sk re-renders, enabling the “Ignore” button (and potentially others). User Action (Selects Anomalies) --> Parent Component | v triage-menu-sk.setAnomalies() | v UI Renders (Buttons enabled)
  4. User Clicks “Ignore”: User Click ("Ignore") --> triage-menu-sk.ignoreAnomaly() | v makeEditAnomalyRequest(anomalies, traces, "IGNORE") | v POST /_/triage/edit_anomalies | (Backend processes) v HTTP 200 OK | v Dispatch "anomaly-changed" event
  5. Backend Interaction: makeEditAnomalyRequest is called. It constructs a JSON payload with the anomaly keys, trace names, and the action “IGNORE”. This payload is sent to /_/triage/edit_anomalies.
  6. Event Notification: Upon a successful response from the backend, triage-menu-sk updates the local state of the anomalies (setting bug_id to -2 for ignored anomalies) and dispatches the anomaly-changed event.
  7. Parent Component Reacts: The parent component listens for anomaly-changed and updates its display to reflect that the anomalies are now ignored (e.g., by changing their color, removing them from an active list).

The design decision to have triage-menu-sk orchestrate calls to the backend and then emit a generic anomaly-changed event decouples it from the specifics of how anomalies are displayed. Parent components only need to know that anomalies have changed and can react accordingly. The use of dedicated dialog components (new-bug-dialog-sk, existing-bug-dialog-sk) encapsulates the complexity of bug reporting, keeping the triage menu itself focused on initiating these actions.

Module: /modules/triage-page-sk

Triage Page (triage-page-sk)

The triage-page-sk module provides the user interface for viewing and triaging regressions in performance data. It allows users to filter regressions based on time range, commit status (all, regressions, untriaged), and alert configurations. The primary goal is to present a clear overview of regressions and facilitate the process of identifying their cause and impact.

Responsibilities and Key Components

The module is responsible for:

  • Fetching and Displaying Regression Data: It communicates with a backend endpoint (/_/reg/) to retrieve regression information for a specified time range and filter criteria. This data is then rendered in a tabular format, showing commits along with any associated regressions.
  • State Management: The component's state (selected time range, filters) is reflected in the URL. This allows users to bookmark specific views or share links to particular triage scenarios. The stateReflector utility from infra-sk/modules/statereflector is used for this purpose.
  • User Interaction for Filtering: It provides UI elements (select dropdowns, date range pickers) for users to define what data they want to see. Changes to these filters trigger new data fetches.
  • Triage Workflow: When a user initiates a triage action on a specific regression, a dialog (<dialog>) containing the cluster-summary2-sk element is displayed. This dialog allows the user to view details of the regression and assign a triage status (e.g., “positive”, “negative”, “acknowledged”).
  • Communicating Triage Decisions: Once a triage status is submitted, the module sends this information to a backend endpoint (/_/triage/) to persist the decision.
  • Displaying Triage Status: Each regression in the table is visually represented by a triage-status-sk element, which shows its current triage state.

Key Files

  • triage-page-sk.ts: This is the core TypeScript file defining the TriagePageSk custom element.
    • Why: It encapsulates all the logic for data fetching, rendering, state management, and handling user interactions. It leverages Lit for templating and rendering the UI.
    • How:
    • It defines a State interface to manage the component's configuration (begin/end timestamps, subset filter, alert filter).
    • The connectedCallback initializes the stateReflector to synchronize the component's state with the URL.
    • updateRange() is a crucial method that fetches regression data from the /_/reg/ endpoint whenever the state changes (e.g., date range or filter selection). It uses the fetch API for network requests.
    • The template function (using lit/html) defines the HTML structure of the component, including the filter controls, the main table displaying regressions, and the triage dialog.
    • Event handlers like commitsChange, filterChange, rangeChange, triage_start, and triaged manage user input and interactions with child components.
    • The triage_start method is triggered when a user wants to triage a specific regression. It prepares the data for the cluster-summary2-sk element and displays the triage dialog.
    • The triaged method is called when the user submits a triage decision from the cluster-summary2-sk dialog. It sends a POST request to /_/triage/ with the triage information.
    • Helper methods like stepUpAt, stepDownAt, alertAt, etc., are used to determine how to render cells in the regression table based on the data received.
    • calc_all_filter_options dynamically generates the list of available alert filters based on categories returned from the backend.
  • triage-page-sk.scss: Contains the SASS/CSS styles for the triage-page-sk element.
    • Why: To ensure the component has a consistent and appropriate visual appearance within the application.
    • How: It defines styles for the layout of the header, filter sections, the regression table, and the triage dialog. It imports shared styles for buttons, selects, and theming.
  • triage-page-sk-demo.html / triage-page-sk-demo.ts: Provide a demonstration page for the triage-page-sk element.
    • Why: To allow developers to see the component in action and test its basic functionality in isolation.
    • How: The HTML file includes an instance of <triage-page-sk>. The TypeScript file simply imports the main component to register it.

Key Workflows

1. Initial Page Load and Data Fetch:

User navigates to page / URL with state parameters
      |
      V
triage-page-sk.connectedCallback()
      |
      V
stateReflector initializes state from URL (or defaults)
      |
      V
triage-page-sk.updateRange()
      |
      V
FETCH /_/reg/ with current state (begin, end, subset, alert_filter)
      |
      V
Backend responds with RegressionRangeResponse (header, table, categories)
      |
      V
triage-page-sk.reg is updated
      |
      V
triage-page-sk.calc_all_filter_options() (if categories present)
      |
      V
triage-page-sk._render() displays the regression table

2. User Changes Filter or Date Range:

User interacts with <select> (commits/filter) or <day-range-sk>
      |
      V
Event handler (e.g., commitsChange, filterChange, rangeChange) updates this.state
      |
      V
this.stateHasChanged() (triggers stateReflector to update URL)
      |
      V
triage-page-sk.updateRange()
      |
      V
FETCH /_/reg/ with new state
      |
      V
Backend responds with updated RegressionRangeResponse
      |
      V
triage-page-sk.reg is updated
      |
      V
triage-page-sk._render() re-renders the regression table with new data

3. User Initiates Triage:

User clicks on a regression in the table (within a <triage-status-sk> element)
      |
      V
<triage-status-sk> emits 'start-triage' event with details (alert, full_summary, cluster_type)
      |
      V
triage-page-sk.triage_start(event)
      |
      V
this.dialogState is populated with event.detail
      |
      V
this._render() (updates the <cluster-summary2-sk> properties within the dialog)
      |
      V
this.dialog.showModal() (displays the triage dialog)

4. User Submits Triage:

User interacts with <cluster-summary2-sk> in the dialog and clicks "Save" (or similar)
      |
      V
<cluster-summary2-sk> emits 'triaged' event with details (columnHeader, triage status)
      |
      V
triage-page-sk.triaged(event)
      |
      V
Constructs TriageRequest body (cid, triage, alert, cluster_type)
      |
      V
this.dialog.close()
      |
      V
this.triageInProgress = true; this._render() (shows spinner)
      |
      V
FETCH POST /_/triage/ with TriageRequest
      |
      V
Backend responds (e.g., with a bug link if applicable)
      |
      V
this.triageInProgress = false; this._render() (hides spinner)
      |
      V
(Optional) If json.bug exists, window.open(json.bug)
      |
      V
(Implicit) The <triage-status-sk> for the triaged item may update its display, or a full data refresh might be triggered if necessary to show the updated status.

Design Decisions

  • State Reflection in URL: The decision to reflect the component's state (date range, filters) in the URL is crucial for shareability and bookmarking. It allows users to return to a specific view of regressions or share it with colleagues.
  • Component-Based Architecture: The page is built using custom elements (triage-page-sk, commit-detail-sk, day-range-sk, triage-status-sk, cluster-summary2-sk). This promotes modularity, reusability, and separation of concerns. Each component handles a specific piece of functionality.
  • Asynchronous Operations: Data fetching and triage submissions are asynchronous operations handled using fetch and Promises. Spinners (spinner-sk) are used to provide visual feedback to the user during these operations.
  • Dedicated Triage Dialog: Instead of inline editing, a modal dialog (<dialog>) is used for the triage process. This provides a focused interface for the user to review cluster details and make a triage decision without cluttering the main regression table.
  • Dynamic Filter Options: The “Which alerts to display” filter options are dynamically populated based on the categories returned from the backend. This ensures that the filter options are relevant to the current dataset.
  • Use of Lit for Templating: Lit is used for its efficient rendering and declarative templating, making it easier to manage the UI structure and updates.

The triage-page-sk serves as the central hub for users to actively engage with and manage performance regressions, making it a critical component in the performance monitoring workflow.

Module: /modules/triage-status-sk

The triage-status-sk module provides a custom HTML element designed to visually represent and interact with the triage status of a “cluster” within the Perf application. A cluster, in this context, likely refers to a group of related performance measurements or anomalies that require user attention and classification (triaging).

Core Functionality & Design:

The primary purpose of this element is to offer a concise and interactive way for users to understand the current triage state of a cluster and to initiate the triaging process.

  1. Visual Indication: The element displays a button. The appearance of this button (specifically, an icon within it) changes based on the cluster's triage status: “positive,” “negative,” or “untriaged.” This provides an immediate visual cue to the user.

    • Why: Direct visual feedback is crucial for quickly assessing the state of many items in a list or dashboard. Instead of reading text, users can rely on familiar icons.
    • How: It leverages the tricon2-sk element to display the appropriate icon based on the triage.status property. The styling for these states is defined in triage-status-sk.scss, ensuring visual consistency with the application's theme (including dark mode).
  2. Initiating Triage: Clicking the button does not directly change the triage status within this element. Instead, it emits a custom event named start-triage.

    • Why: This follows a common pattern in web components where individual components are responsible for a specific piece of UI and interaction, but delegate more complex actions or state management to parent components or application-level logic. This keeps the triage-status-sk element focused and reusable. The actual triaging process likely involves a dialog or a more complex UI, which is beyond the scope of this simple button.
    • How: The _start_triage method is invoked on button click. This method constructs a detail object containing all relevant information about the cluster (full_summary, current triage status, alert configuration, cluster_type, and a reference to the element itself) and dispatches the start-triage CustomEvent.

Key Components & Files:

  • triage-status-sk.ts: This is the heart of the module, defining the TriageStatusSk custom element class which extends ElementSk.
    • Properties: It manages several key pieces of data as properties:
    • triage: An object of type TriageStatus (defined in perf/modules/json) holding the status (‘positive’, ‘negative’, ‘untriaged’) and a message string. This is the primary driver for the element's appearance.
    • full_summary: Potentially detailed information about the cluster, of type FullSummary.
    • alert: Information about any alert configuration associated with the cluster, of type Alert.
    • cluster_type: A string (‘high’ or ‘low’), likely indicating the priority or type of the cluster.
    • Rendering: It uses lit-html for templating (TriageStatusSk.template). The template renders a <button> containing a tricon2-sk element. The class of the button and the value of the tricon2-sk are bound to ele.triage.status, dynamically changing the appearance.
    • Event Dispatch: The _start_triage method is responsible for creating and dispatching the start-triage event.
  • triage-status-sk.scss: Defines the visual styling for the triage-status-sk element. It includes specific styles for the different triage states (.positive, .negative, .untriaged) and their hover states, ensuring they integrate with the application's themes (including dark mode variables like --positive, --negative, --surface).
  • index.ts: A simple entry point that imports and thereby registers the triage-status-sk custom element, making it available for use in HTML.
  • triage-status-sk-demo.html & triage-status-sk-demo.ts: These files provide a demonstration page for the triage-status-sk element.
    • The HTML sets up instances of the element in different theme contexts (default and dark mode).
    • The TypeScript file demonstrates how to listen for the start-triage event and how to programmatically set the triage property of the element. This is crucial for developers to understand how to integrate and use the component.
  • BUILD.bazel: Defines how the module is built and its dependencies. It specifies tricon2-sk as a UI dependency and includes necessary SASS and TypeScript libraries.
  • triage-status-sk_puppeteer_test.ts: Contains Puppeteer-based tests to ensure the element renders correctly and behaves as expected in a browser environment. This is important for maintaining code quality and preventing regressions.

Workflow Example: User Initiates Triage

User sees a triage-status-sk button (e.g., showing an 'untriaged' icon)
    |
    V
User clicks the button
    |
    V
[triage-status-sk.ts] _start_triage() method is called
    |
    V
[triage-status-sk.ts] Creates a 'detail' object with:
                     - triage: { status: 'untriaged', message: '...' }
                     - full_summary: { ... }
                     - alert: { ... }
                     - cluster_type: 'low' | 'high'
                     - element: (reference to itself)
    |
    V
[triage-status-sk.ts] Dispatches a 'start-triage' CustomEvent with the 'detail' object
    |
    V
[Parent Component/Application Logic] Listens for 'start-triage' event
    |
    V
[Parent Component/Application Logic] Receives event.detail
    |
    V
[Parent Component/Application Logic] Uses the received data to:
                                     - Open a triage dialog
                                     - Populate the dialog with cluster details
                                     - Allow user to select a new triage status
                                     - (Potentially) update the original triage-status-sk element's
                                       'triage' property after the dialog interaction is complete.

This design allows triage-status-sk to be a focused, presentational component, while the more complex logic of handling the triage process itself is managed elsewhere in the application. This promotes separation of concerns and reusability.

Module: /modules/triage2-sk

The triage2-sk module provides a custom HTML element for selecting a triage status. This element is designed to be a simple, reusable UI component for indicating whether a particular item is “positive”, “negative”, or “untriaged”. Its primary purpose is to offer a standardized way to represent and interact with triage states across different parts of the Perf application.

The core of the module is the triage2-sk custom element, defined in triage2-sk.ts. This element leverages the Lit library for templating and rendering. It presents three buttons, each representing one of the triage states:

  • Positive: Indicated by a check circle icon (<check-circle-icon-sk>).
  • Negative: Indicated by a cancel icon (<cancel-icon-sk>).
  • Untriaged: Indicated by a help icon (<help-icon-sk>).

The “why” behind this design is to provide a clear visual representation of the current triage status and an intuitive way for users to change it. By using distinct icons and styling for each state, the element aims to reduce ambiguity.

Key Implementation Details:

  • triage2-sk.ts: This is the main TypeScript file defining the TriageSk class, which extends ElementSk.

    • State Management: The current triage state is managed by the value attribute (and corresponding property). It can be one of “positive”, “negative”, or “untriaged”. If no value is provided, it defaults to “untriaged”.
    • Event Emission: When the user clicks a button to change the triage state, the element dispatches a custom event named change. The detail property of this event contains the new triage status as a string (e.g., “positive”). This allows parent components to react to changes in the triage status. User clicks "Positive" button | V triage2-sk sets its 'value' attribute to "positive" | V triage2-sk dispatches a 'change' event with detail: "positive"
    • Rendering: The template static method uses Lit's html tagged template literal to define the structure of the element. It dynamically sets the selected attribute on the appropriate button based on the current value.
    • Attribute Observation: The element observes the value attribute. When this attribute changes (either programmatically or through user interaction), the attributeChangedCallback is triggered, which re-renders the component and dispatches the change event.
    • Type Safety: The isStatus function ensures that the value property is always one of the allowed Status types, defaulting to “untriaged” if an invalid value is encountered. This contributes to the robustness of the component.
  • triage2-sk.scss: This file contains the SASS styles for the triage2-sk element.

    • Theming: It defines styles for both a legacy color scheme and a theme-based color scheme (including dark mode). This ensures the component integrates visually with the rest of the application, regardless of the active theme. The styling differentiates the selected button and provides hover effects for better user feedback. The fill colors of the icons change based on the triage state (e.g., green for positive, red for negative).
  • index.ts: This file serves as the entry point for the module, exporting the TriageSk class and ensuring the custom element is defined.

  • Demo and Testing:

    • triage2-sk-demo.html and triage2-sk-demo.ts: Provide a simple demonstration page showcasing the element in various states and how to listen for the change event. This is useful for manual testing and visual inspection.
    • triage2-sk_test.ts: Contains Karma unit tests that verify the event emission and value changes of the component.
    • triage2-sk_puppeteer_test.ts: Includes Puppeteer-based end-to-end tests that check the rendering of the component in a browser environment and capture screenshots for visual regression testing.

The design choice of using custom elements and Lit allows for a modular and maintainable component that can be easily integrated into larger applications. The clear separation of concerns (logic in TypeScript, styling in SASS, and structure in the template) follows common best practices for web component development.

Module: /modules/tricon2-sk

The tricon2-sk module provides a custom HTML element <tricon2-sk> designed to visually represent triage states. This component is crucial for user interfaces where quick identification of an item's status (e.g., in a bug tracker, code review system, or monitoring dashboard) is necessary.

The core idea is to offer a standardized, reusable icon that clearly communicates whether an item is “positive,” “negative,” or “untriaged.” This avoids inconsistencies and reduces cognitive load for users who frequently interact with such systems.

Key Components and Responsibilities:

  • tricon2-sk.ts: This is the heart of the module. It defines the TriconSk class, which extends ElementSk (a base class for custom elements in the Skia infrastructure).

    • Purpose: To render one of three specific icons based on its value attribute.
    • Implementation:
    • It utilizes the lit-html library for templating, allowing for efficient rendering and updates.
    • A static template function determines which icon to display (check-circle-icon-sk for “positive”, cancel-icon-sk for “negative”, and help-icon-sk for “untriaged” or any other value). This design centralizes the icon selection logic.
    • The value attribute is the primary interface for controlling the displayed icon. Changes to this attribute trigger a re-render via attributeChangedCallback and _render().
    • The connectedCallback ensures that the value property is properly initialized if set before the element is attached to the DOM.
    • Dependencies: It imports specific icon components (check-circle-icon-sk, cancel-icon-sk, help-icon-sk) from the elements-sk module, promoting modularity and reuse of existing icon assets.
  • tricon2-sk.scss: This file handles the styling of the tricon2-sk element and its internal icons.

    • Purpose: To define the colors of the icons based on their state and to ensure they adapt correctly to different themes (e.g., light and dark mode).
    • Implementation:
    • It uses SASS for more organized and maintainable styles.
    • Crucially, it defines CSS variables (e.g., --green, --red, --brown) for the icon fill colors. This allows themes (defined in themes.scss) to override these colors easily.
    • Specific styles are also provided for when the element is within a .body-sk context and when .darkmode is applied to .body-sk. This ensures the icons maintain appropriate contrast and visibility across different UI themes. The fallback hardcoded colors (#388e3c, etc.) provide a default styling if CSS variables are not defined by a theme.
  • index.ts: This file serves as the main entry point for the module when it's imported. Its sole responsibility is to import tricon2-sk.ts, which in turn registers the <tricon2-sk> custom element. This is a common pattern for organizing custom element definitions.

  • tricon2-sk-demo.html and tricon2-sk-demo.ts: These files create a demonstration page for the <tricon2-sk> element.

    • Purpose: To showcase the different states of the tricon2-sk element and how it appears in various theming contexts (default, with colors.css theming, and with themes.css in both light and dark modes). This is invaluable for development, testing, and documentation.
    • How it works: The HTML file directly uses the <tricon2-sk> element with different value attributes. The accompanying TypeScript file simply imports the index.ts of the tricon2-sk module to ensure the custom element is defined before the browser tries to render it.
  • tricon2-sk_puppeteer_test.ts: This file contains automated UI tests for the tricon2-sk element using Puppeteer.

    • Purpose: To verify that the element renders correctly in different states and to capture screenshots for visual regression testing.
    • How it works: It loads the demo page (tricon2-sk-demo.html) in a headless browser, checks if the expected number of tricon2-sk elements are present (a basic smoke test), and then takes a screenshot of the page. This ensures that changes to the component's appearance are caught early.

Workflow: Displaying a Triage Icon

  1. Usage: An application includes the <tricon2-sk> element in its HTML, setting the value attribute:

    <tricon2-sk value="positive"></tricon2-sk>
    
  2. Element Initialization (tricon2-sk.ts):

    • The TriconSk class is instantiated.
    • connectedCallback is called, ensuring the value property is synchronized with the attribute.
    • _render() is called.
  3. Template Selection (tricon2-sk.ts):

    • The static template function is invoked.
    • Based on this.value (e.g., “positive”), it returns the corresponding HTML template: html<check-circle-icon-sk></check-circle-icon-sk>.
  4. Icon Rendering:

    • The selected icon component (e.g., <check-circle-icon-sk>) renders itself.
  5. Styling (tricon2-sk.scss):

              - CSS rules are applied. For example, if the value is "positive":
                `tricon2-sk { check-circle-icon-sk { fill: var(--green); // Initially
    
        attempts to use the CSS variable } }`    - If themes are active (e.g.,`.body-sk.darkmode`), more specific rules
    
    might override the fill color: `.body-sk.darkmode tricon2-sk {
    

    check-circle-icon-sk { fill: #4caf50; // Specific dark mode color } }`

Diagram: Attribute Change leading to Icon Update

[User/Application sets/changes 'value' attribute on <tricon2-sk>]
       |
       v
[<tricon2-sk> element]
       |
       +---------------------+
       | attributeChangedCallback() is triggered |
       +---------------------+
              |
              v
       [this._render()]
              |
              v
       [TriconSk.template(this)]  <-- Reads current 'this.value'
              |
              +-------------+-------------+
              | (value is   | (value is   | (value is other)
              | "positive") | "negative") |
              v             v             v
      [Returns    [Returns    [Returns
       <check-...>] <cancel-...>] <help-...>]
              |
              v
[lit-html updates the DOM with the new icon template]
              |
              v
[Browser renders the new icon with appropriate CSS styles]

The design decision to use distinct, imported icon components (check-circle-icon-sk, etc.) rather than, for example, a single SVG sprite or dynamically generating SVG paths, promotes better separation of concerns. Each icon can be managed and updated independently. The use of CSS variables for theming is a standard and flexible approach, allowing consuming applications to easily adapt the icon colors to their specific look and feel without modifying the component's core logic or styles directly.

Module: /modules/trybot

The trybot module provides utilities for processing and analyzing results from Perf trybots. Trybots are automated systems that run performance tests on code changes before they are submitted. This module focuses on calculating and presenting metrics that help developers understand the performance impact of their changes.

The core functionality revolves around aggregating and averaging stddevRatio values across different parameter combinations. The stddevRatio is a key metric representing the change in performance relative to the standard deviation of the baseline. A positive stddevRatio generally indicates a performance regression, while a negative value suggests an improvement.

The primary goal is to help developers quickly identify which aspects of their change (represented by key-value parameters like model=GCE or test=MyBenchmark) are contributing most significantly to performance changes, both positive and negative. By grouping results by these parameters and calculating average stddevRatio, the module provides a summarized view that highlights potential problem areas or confirms expected improvements.

Key Components and Files:

  • calcs.ts: This file contains the logic for performing calculations on trybot results.

    • byParams(res: TryBotResponse): AveForParam[]: This is the central function of the module.

    • Why: Developers need a way to understand the overall performance impact of their changes across various configurations (e.g., different devices, tests, or operating systems). Simply looking at individual trace results can be overwhelming. This function provides a summarized view by grouping results by their parameters.

    • How:

      1. It takes a TryBotResponse object, which contains a list of individual test results (res.results). Each result includes a stddevRatio and a set of params (key-value pairs describing the test configuration).

      2. It iterates through each result and then through each key-value pair within that result's params.

      3. For each unique key=value string (e.g., “model=GCE”), it maintains a running total of stddevRatio values, the count of traces contributing to this total (n), and counts of traces with positive (high) or negative (low) stddevRatio. This aggregation happens in the runningTotals object.

        Input TryBotResponse.results:
        [
          { params: {arch: "arm", os: "android"}, stddevRatio: 1.5 },
          { params: {arch: "x86", os: "linux"}, stddevRatio: -0.5 },
          { params: {arch: "arm", os: "ios"}, stddevRatio: 2.0 }
        ]
        
        -> runningTotals intermediate state (simplified):
           "arch=arm": { totalStdDevRatio: 3.5, n: 2, high: 2, low: 0 }
           "os=android": { totalStdDevRatio: 1.5, n: 1, high: 1, low: 0 }
           "arch=x86": { totalStdDevRatio: -0.5, n: 1, high: 0, low: 1 }
           "os=linux": { totalStdDevRatio: -0.5, n: 1, high: 0, low: 1 }
           "os=ios": { totalStdDevRatio: 2.0, n: 1, high: 1, low: 0 }
        
      4. After processing all results, it calculates the average stddevRatio for each key=value pair by dividing totalStdDevRatio by n.

      5. It constructs an array of AveForParam objects. Each object represents a key=value parameter and includes its calculated average stddevRatio, the total number of traces (n) that matched this parameter, and the counts of high and low stddevRatio traces.

      6. Finally, it sorts this array in descending order based on the aveStdDevRatio. This crucial step brings the parameters associated with the largest (potentially negative) performance regressions to the top, making them easy to identify.

    • AveForParam interface: Defines the structure for the output of byParams. It holds the aggregated average stddevRatio for a specific keyValue pair, along with counts of traces.

    • runningTotal interface: An internal helper interface used during the aggregation process within byParams to keep track of sums and counts before the final average is computed.

  • calcs_test.ts: This file contains unit tests for the functions in calcs.ts.

    • Why: To ensure the correctness of the calculation logic, especially for edge cases (e.g., empty input) and the core averaging and sorting functionality.
    • How: It uses chai for assertions. Tests cover scenarios like:
    • Empty input to byParams should return an empty list.
    • Correct calculation of average stddevRatio for multiple traces sharing common parameters. For example, if two traces have test=1, their stddevRatio values should be averaged for the test=1 entry in the output.
    • Ensuring the output is correctly sorted by aveStdDevRatio in descending order.

Key Workflows/Processes:

Calculating Average StdDevRatio by Parameter:

TryBotResponse
     |
     v
byParams(response)
     |
     | 1. Initialize `runningTotals` (empty map)
     |
     | 2. For each `result` in `response.results`:
     |    |
     |    |-> For each `param` (key-value pair) in `result.params`:
     |         |
     |         |--> Generate `runningTotalsKey` (e.g., "model=GCE")
     |         |--> Retrieve or create `runningTotal` entry for `runningTotalsKey`
     |         |--> Update `totalStdDevRatio`, `n`, `high`, `low` in the entry
     |
     | 3. Initialize `ret` (empty array of AveForParam)
     |
     | 4. For each `runningTotalKey` in `runningTotals`:
     |    |
     |    |-> Calculate `aveStdDevRatio` = `runningTotal.totalStdDevRatio` / `runningTotal.n`
     |    |-> Create `AveForParam` object
     |    |-> Push to `ret`
     |
     | 5. Sort `ret` by `aveStdDevRatio` (descending)
     |
     v
Array of AveForParam

This workflow allows users to quickly pinpoint which configuration parameters (like specific device models, operating systems, or test names) are associated with the most significant average performance changes in a given trybot run. The sorting ensures that the most impactful parameters are immediately visible.

Module: /modules/trybot-page-sk

The trybot-page-sk module provides a user interface for analyzing performance regressions. It allows users to select either a specific commit from the repository or a trybot run (representing a potential code change) and then analyze performance metrics associated with that selection. The core purpose is to help developers identify and understand performance impacts before or after code submission.

Key Responsibilities and Components:

  • User Input and Selection:

    • The page is organized into two main tabs: “Commit” and “TryBot”. This separation allows users to focus on either analyzing historical performance data or evaluating the impact of pending changes.
    • Commit Analysis: Users can select a specific commit using the commit-detail-picker-sk element. This allows them to investigate performance regressions that might have been introduced by a particular code change.
    • TryBot Analysis: (The “TryBot” tab is present in the UI template but its functionality for selecting trybot runs, CLs, and patch numbers is not fully detailed in the provided trybot-page-sk.ts. It appears to be a planned feature or a more complex interaction than commit selection.) The underlying TryBotRequest interface includes fields like cl and patch_number, indicating the intent to support this.
    • Once a commit (or eventually a trybot run) is selected, users define the scope of the analysis by specifying a query using query-sk. This query filters the performance traces to be considered (e.g., focusing on specific benchmarks, configurations, or architectures).
    • The paramset-sk and query-count-sk elements provide feedback on the current query, showing the matching parameters and the number of traces that fit the criteria. This helps users refine their query to target the relevant data.
  • Data Fetching and Processing:

    • When the user clicks the “Run” button, the run method is invoked. This method constructs a TryBotRequest object based on the user's selections (commit number, query, or eventually CL/patch details).
    • It sends this request to the /_/trybot/load/ backend endpoint. This endpoint is responsible for fetching the relevant performance data (trace values, headers, parameter sets) for the specified commit/trybot and query. The startRequest utility handles the asynchronous request and displays progress using a spinner-sk.
    • The response (TryBotResponse) contains the performance data, including:
    • results: An array of individual trace results, each containing parameter values (params), actual metric values (values), and a stddevRatio (how many standard deviations the trace's value is from the median of its historical data).
    • paramset: The complete set of parameters found across all returned traces.
    • header: Information about the data points in each trace, likely including timestamps.
    • The received data is then processed. Notably, the byParams function (from ../trybot/calcs) is used to aggregate results by parameter key-value pairs, calculating average standard deviation ratios, counts, and high/low values for each group. This helps identify which parameters are most strongly correlated with performance changes.
  • Results Display and Visualization:

    • The results are presented in two tabs: “Individual” and “By Params”.
    • Individual Tab:
      • Lists individual traces that match the query, showing their parameters, standard deviation ratio, and an option to plot them.
      • To avoid overwhelming the user, only the head and tail of long lists are displayed.
      • Clicking the plot icon (timeline-icon-sk) for a trace renders its values over time on a plot-simple-sk element. Users can CTRL-click to plot multiple traces on the same graph for comparison.
      • The table intelligently displays parameter values, showing “〃” if a value is the same as the row above it and “∅” if a parameter doesn't exist for a trace.
    • By Params Tab:
      • Displays the aggregated results from the byParams calculation. For each parameter key-value pair (e.g., “config=gles”), it shows the average standard deviation ratio, the number of traces (N) in that group, and the highest/lowest individual trace values.
      • This view helps quickly identify which specific parameter values are associated with significant performance deviations.
      • Similar to the individual tab, users can click a plot icon to visualize a group of traces. Up to maxByParamsPlot traces from the selected group (sorted by stddevRatio) are plotted on a separate plot-simple-sk.
      • When a trace is focused on the “By Params” plot (e.g., by hovering), its full trace ID and its parameter set are displayed below the plot using by-params-traceid and by-params-paramset respectively. paramset-sk is used to display the parameters, highlighting the ones belonging to the focused trace.
  • State Management:

    • The component uses stateReflector to synchronize its internal state (this.state, which is a TryBotRequest object) with the URL. This means that the selected commit, query, and analysis type (“commit” or “trybot”) are reflected in the URL query parameters. This allows users to bookmark or share specific analysis views.
    • Changes to the commit selection, query, or tab selection trigger stateHasChanged(), which updates the URL via stateReflector and re-renders the component.
  • Styling and Structure:

    • The trybot-page-sk.scss file defines the visual appearance and layout of the component, including styles for the query section, results tables, and plot areas.
    • The component is built using Lit templates, enabling reactive updates to the DOM when the underlying state changes.

Workflow Example (Commit Analysis):

  1. User Selects Tab: User ensures the “Commit” tab is selected. [tabs-sk] --selects index 0--> [TrybotPageSk.tabSelected] --> state.kind = "commit" --> stateHasChanged()

  2. User Selects Commit: User interacts with commit-detail-picker-sk. [commit-detail-picker-sk] --commit-selected event--> [TrybotPageSk.commitSelected] --> state.commit_number = selected_commit_offset --> stateHasChanged() --> _render() (UI updates to show query section)

  3. User Enters Query: User types into query-sk.

    [query-sk] --query-change event--> [TrybotPageSk.queryChange]
        --> state.query = new_query_string
        --> stateHasChanged()
        --> _render() (paramset-sk summary updates)
    
    [query-sk] --query-change-delayed event--> [TrybotPageSk.queryChangeDelayed]
        --> [query-count-sk].current_query = new_query_string (triggers count update)
    
  4. User Clicks “Run”: [Run Button] --click--> [TrybotPageSk.run] --> spinner-sk.active = true --> startRequest('/_/trybot/load/', state, ...) --> HTTP POST to backend with { kind: "commit", commit_number: X, query: "Y" } <-- Backend responds with TryBotResponse (trace data, paramset, header) --> results = TryBotResponse --> byParams = byParams(results) --> spinner-sk.active = false --> _render() (results tables and plot areas become visible and populated)

  5. User Interacts with Results:

    - **Plotting Individual Trace:** `[Timeline Icon in Individual Table]
    

    --click--> [TrybotPageSk.plotIndividualTrace(event, index)] --> individualPlot.addLines(...) --> displayedTrace = true --> _render() (individual plot becomes visible) - **Plotting By Params Group:**[Timeline Icon in By Params Table] --click--> [TrybotPageSk.plotByParamsTraces(event, index)] --> Filters results.results for matching key=value --> byParamsPlot.addLines(...) --> byParamsParamSet.paramsets = [ParamSet of plotted traces] --> displayedByParamsTrace = true --> _render() (by params plot and its paramset become visible) - **Focusing Trace on By Params Plot:**[by-params-plot] --trace_focused event--> [TrybotPageSk.byParamsTraceFocused] --> byParamsTraceID.innerText = focused_trace_name --> byParamsParamSet.highlight = fromKey(focused_trace_name) --> _render() (updates highlighted params in by-params-paramset)`

The design emphasizes providing both a high-level overview of potential regression areas (via “By Params”) and the ability to drill down into individual trace performance. The use of stddevRatio as a primary metric helps quantify the significance of observed changes.

Module: /modules/user-issue-sk

User Issue Management Element (user-issue-sk)

The user-issue-sk module provides a custom HTML element for associating and managing Buganizer issues with specific data points in the Perf application. This allows users to directly link performance regressions or anomalies to their corresponding bug reports, enhancing traceability and collaboration.

Why: Tracking issues related to performance data is crucial for effective debugging and resolution. This element centralizes the issue linking process within the Perf UI, providing a seamless experience for users to add, view, and remove bug associations.

How:

The core functionality revolves around the UserIssueSk LitElement class. This class manages the display and interaction logic for associating a Buganizer issue with a data point identified by its trace_key and commit_position.

Key Responsibilities and Components:

  • User Authentication: The element first checks if a user is logged in using alogin-sk. This is essential because only logged-in users can add or remove issue associations. If a user is not logged in, they can only view existing issue links.
  • State Management:
    • bug_id: This property determines the element's display.
    • bug_id === 0: Indicates no Buganizer issue is associated with the data point. The element will display an “Add Bug” button (if the user is logged in).
    • bug_id > 0: An existing Buganizer issue is linked. The element will display a link to the bug and, if the user is logged in, a “close” icon to remove the association.
    • bug_id === -1: This is a special state where the element renders nothing, effectively hiding itself. This might be used in scenarios where issue linking is not applicable.
    • _text_input_active: A boolean flag that controls the visibility of the input field for entering a new bug ID.
  • Rendering Logic: The render() method dynamically chooses between two main templates based on the bug_id and login status:
    • addIssueTemplate(): Shown when bug_id === 0 and the user is logged in. It initially displays an “Add Bug” button. Clicking this button reveals an input field for the bug ID and confirm/cancel icons.
    • showLinkTemplate(): Shown when bug_id > 0. It displays a formatted link to the Buganizer issue (using AnomalySk.formatBug). If the user is logged in, a “close” icon is also displayed to allow removal of the issue link.
  • API Interaction:
    • addIssue(): Triggered when a user submits a new bug ID. It makes a POST request to the /_/user_issue/save endpoint with the trace_key, commit_position, and the new issue_id.
    • removeIssue(): Triggered when a logged-in user clicks the “close” icon next to an existing bug link. It makes a POST request to the /_/user_issue/delete endpoint with the trace_key and commit_position.
  • Event Dispatching: After successfully adding or removing an issue, the element dispatches a custom event named user-issue-changed. This event bubbles up and carries a detail object containing the trace_key, commit_position, and the new bug_id. This allows parent components or other parts of the application to react to changes in issue associations (e.g., by refreshing a list of user-reported issues).
  • Error Handling: Uses the errorMessage utility from perf/modules/errorMessage to display feedback to the user in case of API errors or invalid input.

Key Files:

  • user-issue-sk.ts: This is the heart of the module. It defines the UserIssueSk LitElement, including its properties, styles, templates, and logic for interacting with the backend API and handling user input. The design focuses on conditional rendering based on the bug_id and user login status. The API calls are standard fetch requests.
  • index.ts: A simple entry point that imports and registers the user-issue-sk custom element, making it available for use in HTML.
  • BUILD.bazel: Defines the build dependencies for the element, including alogin-sk for authentication, anomaly-sk for bug link formatting, icon elements for the UI, and Lit libraries for web component development.

Workflows:

  1. Adding a New Issue: User (logged in) sees “Add Bug” button User clicks “Add Bug” -> activateTextInput() is called -> _text_input_active becomes true -> Element re-renders to show input field, check icon, close icon User types bug ID into input field -> changeHandler() updates _input_val User clicks check icon -> addIssue() is called -> Input validation (is _input_val > 0?) -> POST request to /_/user_issue/save with trace_key, commit_position, input_val -> On success: -> bug_id is updated with _input_val -> _input_val reset to 0 -> _text_input_active set to false -> user-issue-changed event is dispatched -> Element re-renders to show the new bug link and remove icon -> On failure: -> errorMessage is displayed -> hideTextInput() is called (resets state)

  2. Viewing an Existing Issue: Element is initialized with bug_id > 0 -> render() calls showLinkTemplate() -> A link to perf.bug_host_url + bug_id is displayed. -> If user is logged in, a “close” icon is also displayed.

  3. Removing an Existing Issue: User (logged in) sees bug link and “close” icon User clicks “close” icon -> removeIssue() is called -> POST request to /_/user_issue/delete with trace_key, commit_position -> On success: -> bug_id is set to 0 -> _input_val reset to 0 -> _text_input_active set to false -> user-issue-changed event is dispatched -> Element re-renders to show “Add Bug” button -> On failure: -> errorMessage is displayed

The design prioritizes a clear separation of concerns: display logic is handled by LitElement's templating system, state is managed through properties, and backend interactions are encapsulated in dedicated asynchronous methods. The use of custom events allows for loose coupling with other components that might need to react to changes in issue associations.

Module: /modules/window

The window module is designed to provide utility functions related to the browser's window object, specifically focusing on parsing and interpreting configuration data embedded within it. This approach centralizes the logic for accessing and processing global configurations, making it easier to manage and test.

A key responsibility of this module is to extract and process build tag information. This information is often embedded in the window.perf.image_tag global variable, which is expected to be an SkPerfConfig object (defined in //perf/modules/json:index_ts_lib). The getBuildTag function is the primary component for this task.

The getBuildTag function takes an image tag string as input (or defaults to window.perf?.image_tag). Its core purpose is to parse this string and categorize the build tag. The function employs a specific parsing logic based on the structure of the image tag:

  1. Initial Validation:

    • The function first splits the input tag string by the @ character.
    • If there are fewer than two parts (i.e., no @ or @ is the first/last character), it's considered an invalid tag.
    • It then checks if the second part (after @) starts with tag:. If not, it's also an invalid tag.
    Input Tag String
          |
          V
    Split by '@'
          |
          V
    Check for at least 2 parts AND second part starts with "tag:"
          |
          +-- No --> Invalid Tag
          |
          V
    Proceed to type determination
    
  2. Tag Type Determination: Based on the prefix of the raw tag (the part after tag:):

                      - **Git Tag**: If the raw tag starts with `tag:git-`, it's classified as a
                        'git' type. The function extracts the first 7 characters of the Git
                        hash. `rawTag starts with "tag:git-" | V Type: 'git' Tag: First 7 chars
    
                of Git hash`    - **Louhi Build Tag**: If the raw tag has a specific length (>= 38
    
            characters) and contains`louhi`at a particular position (substring
            from index 25 to 30), it's classified as a 'louhi' type. The function
            extracts a 7-character identifier (substring from index 31 to 38) which
            typically represents a hash or version.`rawTag length >= 38 AND
    
        rawTag[25:30] == "louhi" | V Type: 'louhi' Tag: rawTag[31:38]` - **Regular Tag**: If neither of the above conditions is met, it's
    
    considered a generic 'tag' type. The function returns the portion of the
    string after`tag:`. `Neither Git nor Louhi | V Type: 'tag' Tag: rawTag
    

    after “tag:”`

This structured approach ensures that different build tag formats can be reliably identified and their relevant parts extracted. The decision to differentiate between ‘git’, ‘louhi’, and generic ‘tag’ types allows downstream consumers of this information to handle them appropriately. For instance, a ‘git’ tag might be used to link to a specific commit, while a ‘louhi’ tag might indicate a specific build from an internal CI system.

The module also extends the global Window interface to declare the perf: SkPerfConfig property. This is a TypeScript feature that provides type safety when accessing window.perf, ensuring that developers are aware of its expected structure.

The window_test.ts file provides unit tests for the getBuildTag function, covering various scenarios including valid git tags, Louhi build tags, arbitrary tags, and different forms of invalid tags. These tests are crucial for verifying the correctness of the parsing logic and ensuring that changes to the function do not introduce regressions. The use of chai for assertions is a standard practice for testing in this environment.

Module: /modules/word-cloud-sk

The word-cloud-sk module provides a custom HTML element designed to visualize key-value pairs and their relative frequencies. This is particularly useful for displaying data from clusters or other datasets where understanding the distribution of different attributes is important.

The core idea is to present this frequency information in an easily digestible format, combining textual representation with a simple bar graph for each item. This allows users to quickly grasp the prevalence of certain key-value pairs within a dataset.

Key Components and Responsibilities:

  • word-cloud-sk.ts: This is the heart of the module, defining the WordCloudSk custom element which extends ElementSk.

    • Why: It encapsulates the logic for rendering the word cloud. By extending ElementSk, it leverages common functionalities provided by the infra-sk library for custom elements.
    • How: It uses the lit-html library for templating. The items property, an array of ValuePercent objects (defined in //perf/modules/json:index_ts_lib), is the primary input. Each ValuePercent object contains a value (the key-value string) and a percent (its frequency).
    • The rendering logic iterates through the items and creates a table row for each. Each row displays the key-value string, its percentage as text, and a horizontal bar whose width is proportional to the percentage.
    • The connectedCallback ensures that if the items property is set before the element is fully connected to the DOM, it's properly upgraded and the element is rendered.
    • The _render() method is called whenever the items property changes, ensuring the display is updated.
  • word-cloud-sk.scss: This file contains the SASS styles for the word-cloud-sk element.

    • Why: It provides the visual appearance of the word cloud, ensuring it's readable and visually distinct.
    • How: It defines styles for the table, table cells, and the percentage bar. It uses CSS variables for theming (e.g., --light-gray, --on-surface, --primary), allowing the component to adapt to different themes (like light and dark mode) defined in //perf/modules/themes:themes_sass_lib and //elements-sk/modules:colors_sass_lib.
    • Specific styles are applied for font family, size, padding, borders, and the background color and height of the percentage bar.
  • word-cloud-sk-demo.html and word-cloud-sk-demo.ts: These files provide a demonstration page for the word-cloud-sk element.

    • Why: They serve as a live example of how to use the component and allow for easy visual testing and development.
    • How: word-cloud-sk-demo.html includes multiple instances of the <word-cloud-sk> tag, some within sections with different theming (e.g., dark mode). word-cloud-sk-demo.ts then selects these instances and populates their items property with sample data. This demonstrates how the component can be instantiated and how data is passed to it.
  • index.ts: This file simply imports and thereby registers the word-cloud-sk custom element.

    • Why: It acts as the entry point for the element, ensuring it's defined when the module is imported.

Workflow: Data Display

The primary workflow involves providing data to the word-cloud-sk element and its subsequent rendering:

  1. Instantiation: An instance of <word-cloud-sk> is created in HTML.

    <word-cloud-sk></word-cloud-sk>
    
  2. Data Provision: The items property of the element is set with an array of ValuePercent objects.

    // In JavaScript/TypeScript:
    const wordCloudElement = document.querySelector('word-cloud-sk');
    wordCloudElement.items = [
      { value: 'arch=x86', percent: 100 },
      { value: 'config=565', percent: 60 },
      // ... more items
    ];
    
  3. Rendering (_render() called in word-cloud-sk.ts):

    • The WordCloudSk element iterates through the _items array.
    • For each item:
      • A table row (<tr>) is generated.
      • The item.value is displayed in the first cell (<td>).
      • The item.percent is displayed as text (e.g., “60%”) in the second cell.
      • A <div> element is created in the third cell. Its width style is set to item.percent pixels, creating a visual bar representation of the percentage.

    The overall structure rendered looks like this (simplified):

    <table>
      <tr> <!-- For item 1 -->
        <td class="value">[item1.value]</td>
        <td class="textpercent">[item1.percent]%</td>
        <td class="percent">
          <div style="width: [item1.percent]px"></div>
        </td>
      </tr>
      <tr> <!-- For item 2 -->
        <td class="value">[item2.value]</td>
        <td class="textpercent">[item2.percent]%</td>
        <td class="percent">
          <div style="width: [item2.percent]px"></div>
        </td>
      </tr>
      <!-- ... more rows -->
    </table>
    

This process ensures that whenever the input data changes, the visual representation of the word cloud is automatically updated. The use of CSS variables for styling allows the component to seamlessly integrate into applications with different visual themes.

Module: /nanostat

Nanostat

nanostat is a command-line tool designed to compare and analyze the results of Skia's nanobench benchmark. It takes two JSON files generated by nanobench as input, representing “old” and “new” benchmark runs, and provides a statistical summary of the performance changes between them. This is particularly useful for developers to understand the performance impact of their code changes.

Why it exists

When making changes to a codebase, especially one as performance-sensitive as a graphics library like Skia, it's crucial to measure the impact on performance. Nanobench produces detailed raw data, but interpreting this data directly can be cumbersome. nanostat was created to:

  1. Automate Statistical Analysis: Apply statistical tests (Mann-Whitney U test or Welch's T-test) to determine if observed differences in benchmark results are statistically significant or likely due to random variation.
  2. Summarize Changes: Present a concise, human-readable summary of performance changes, highlighting significant regressions or improvements.
  3. Facilitate Quick Comparisons: Enable developers to quickly compare benchmark runs before and after a code change, streamlining the performance analysis workflow.
  4. Provide Filtering and Sorting: Offer options to filter out insignificant changes, remove outliers, and sort results based on various criteria (e.g., by the magnitude of change or by test name).

How it works

The core workflow of nanostat involves several steps:

  1. Input: It accepts two file paths as command-line arguments, pointing to the “old” and “new” nanobench JSON output files.

    nanostat [options] old.json new.json
    
  2. Parsing: The loadFileByName function in main.go is responsible for opening and parsing these JSON files. It uses the perf/go/ingest/format.ParseLegacyFormat function to interpret the nanobench output structure and then perf/go/ingest/parser.GetSamplesFromLegacyFormat to extract the raw sample values for each benchmark test. Each file's data is converted into a parser.SamplesSet, which is a map where keys are test identifiers and values are slices of performance measurements (samples).

  3. Statistical Analysis: The samplestats.Analyze function (from the perf/go/samplestats module) is the heart of the comparison. It takes the two parser.SamplesSet (before and after samples) and a samplestats.Config object as input. The configuration includes:

    • Alpha: The significance level (default 0.05). A p-value below alpha indicates a significant difference.
    • IQRR: A boolean indicating whether to apply the Interquartile Range Rule to remove outliers from the sample data before analysis.
    • All: A boolean determining if all results (significant or not) should be displayed.
    • Test: The type of statistical test to perform (Mann-Whitney U test or Welch's T-test).
    • Order: The function used to sort the output rows.

    For each common benchmark test found in both input files, samplestats.Analyze calculates statistics for both sets of samples (mean, percentage deviation) and then performs the chosen statistical test to compare the two distributions. This yields a p-value.

  4. Filtering and Sorting: Based on the config, samplestats.Analyze filters out rows where the change is not statistically significant (if config.All is false). The remaining rows are then sorted according to config.Order.

  5. Output Formatting: The formatRows function in main.go takes the analyzed and sorted samplestats.Row data and prepares it for display.

    • It identifies “important keys” from the benchmark parameters (e.g., config, name, test). These are keys whose values differ across the benchmark results, helping to distinguish them.
    • It constructs a header line for the output table.
    • For each row of results, it formats the old and new means, standard deviations, the percentage delta, the p-value, sample sizes, and the important key values.
    • If a change is not significant (p-value > alpha), the delta is shown as “~” unless the --all flag is used.
    • The formatted strings are then printed to stdout using text/tabwriter to create a well-aligned table.

    Example output line:

             old          new  delta     stats            name
      2.15 ±  5%   2.00 ±  2%   -7%   (p=0.001, n=10+ 8)  tabl_digg.skp
    

Key Components and Files

  • main.go: This is the entry point of the application.

    • Responsibilities:
    • Parses command-line arguments and flags (-alpha, -sort, -iqrr, -all, -test).
    • Validates user input and displays usage information if necessary.
    • Calls loadFileByName to load and parse the input JSON files.
    • Constructs the samplestats.Config based on the provided flags.
    • Invokes samplestats.Analyze to perform the statistical comparison.
    • Calls formatRows to format the results for display.
    • Uses text/tabwriter to print the formatted output to the console.
    • Key functions:
    • actualMain(stdout io.Writer): Contains the main logic, allowing stdout to be replaced for testing.
    • loadFileByName(filename string) parser.SamplesSet: Reads a nanobench JSON file, parses it, and extracts the performance samples. It leverages perf/go/ingest/format and perf/go/ingest/parser.
    • formatRows(config samplestats.Config, rows []samplestats.Row) []string: Takes the analysis results and formats them into a slice of strings, ready for tabular display. It intelligently includes relevant parameter keys in the output.
  • main_test.go: Contains unit tests for nanostat.

    • Responsibilities:
    • Ensures that nanostat produces the expected output for various command-line flag combinations and input files.
    • Uses golden files (testdata/*.golden) to compare actual output against expected output.
    • Key functions:
    • TestMain_DifferentFlags_ChangeOutput(t *testing.T): The main test function that sets up different test cases.
    • check(t *testing.T, name string, args ...string): A helper function that runs nanostat with specified arguments, captures its output, and compares it against a corresponding golden file.
  • README.md: Provides user-facing documentation on how to install and use nanostat, including examples and descriptions of command-line options.

  • Makefile: Contains targets for building, testing, and regenerating test data (golden files). The regenerate-testdata target is crucial for updating the golden files when the tool's output format or logic changes.

  • BUILD.bazel: Defines how to build and test the nanostat binary and its library using the Bazel build system. It lists dependencies on other Skia modules, such as:

    • //go/paramtools: Used in formatRows to work with parameter sets from benchmark results.
    • //perf/go/ingest/format: Used for parsing the legacy nanobench JSON format.
    • //perf/go/ingest/parser: Used to extract sample data from the parsed format.
    • //perf/go/samplestats: Provides the core statistical analysis functions (samplestats.Analyze, samplestats.Order, samplestats.Test).

Dependencies and Design Choices

  • perf/go/samplestats: nanostat heavily relies on this module for the actual statistical computations. This promotes code reuse and separation of concerns, keeping nanostat focused on command-line parsing, file I/O, and output formatting.
  • perf/go/ingest/format and perf/go/ingest/parser: These modules handle the complexities of interpreting the nanobench JSON structure, abstracting this detail away from nanostat's main logic.
  • Command-line Flags: The tool offers a range of flags to customize its behavior (-alpha, -iqrr, -all, -sort, -test). This flexibility allows users to tailor the analysis to their specific needs. For example, the -iqrr flag allows for more robust analysis by removing potential outlier data points that could skew results. The -test flag allows users to choose between parametric (T-test) and non-parametric (U-test) statistical tests, depending on the assumptions they are willing to make about their data's distribution.
  • Tabular Output: Using text/tabwriter provides a clean, aligned, and easy-to-read output format, which is essential for quickly scanning and understanding the performance changes.
  • Golden File Testing: The use of golden files in main_test.go is a good practice for testing command-line tools. It makes it easy to verify that changes to the code don't unintentionally alter the output format or the results of the analysis. The Makefile target regenerate-testdata simplifies updating these files when intended changes occur.

Module: /pages

The /pages module is responsible for defining the HTML structure and initial JavaScript and CSS for all the user-facing pages of the Skia Performance application. Each page represents a distinct view or functionality within the application, such as viewing alerts, exploring performance data, or managing regressions.

The core design philosophy is to keep the HTML files minimal and delegate the rendering and complex logic to custom HTML elements (Skia Elements). This promotes modularity and reusability of UI components.

Key Components and Responsibilities:

  • HTML Files (e.g., alerts.html, newindex.html):
    • These files serve as the entry point for each page.
    • They define the basic HTML structure (<head>, <body>).
    • Crucially, they include a perf-scaffold-sk custom element. This element acts as a common layout wrapper for all pages, providing consistent navigation, header, footer, and potentially other shared UI elements.
    • Inside the perf-scaffold-sk, they embed the primary custom element specific to that page's functionality (e.g., <alerts-page-sk>, <explore-sk>).
    • They include Go template placeholders like {%- template "googleanalytics" . -%} and {% .Nonce %} for server-side rendering of common snippets and security nonces.
    • A window.perf = {%.context %}; script tag is used to pass initial data or configuration from the server (Go backend) to the client-side JavaScript. This context likely contains information needed by the page-specific custom element to initialize itself.
  • TypeScript Files (e.g., alerts.ts, newindex.ts):
    • These files are the JavaScript entry points for each page.
    • Their primary responsibility is to import the necessary custom elements. This ensures that the browser knows how to render elements like <perf-scaffold-sk> and the page-specific custom element (e.g., ../modules/alerts-page-sk).
    • By importing these elements, their associated JavaScript logic is executed, making them functional.
  • SCSS Files (e.g., alerts.scss, newindex.scss):
    • These files provide page-specific styling.
    • Currently, they all primarily @import 'body';, which means they inherit base body styles from body.scss.
    • If a page required unique styling beyond what the custom elements or body.scss provide, those styles would be defined here.
  • body.scss:
    • This file defines global, minimal styles for the <body> element, such as removing default margins and padding. This ensures a consistent baseline across all pages.
  • BUILD.bazel:
    • This file defines how each page is built using the sk_page rule from //infra-sk:index.bzl.
    • For each page, it specifies:
    • html_file: The entry HTML file.
    • ts_entry_point: The entry TypeScript file.
    • scss_entry_point: The entry SCSS file.
    • sk_element_deps: A list of dependencies on other modules that provide the custom HTML elements used by the page. This is crucial for ensuring that elements like perf-scaffold-sk and page-specific elements (e.g., alerts-page-sk) are compiled and available.
    • sass_deps: Dependencies for SCSS, typically including :body_sass_lib which refers to the body.scss file.
    • Other build-related configurations like assets_serving_path, nonce, and production_sourcemap.

Workflow for a Page Request:

  1. User navigates to a URL (e.g., /alerts).
  2. The server (Go backend) maps this URL to the corresponding HTML file (e.g., alerts.html).
  3. The Go backend processes the HTML template, injecting data for {% .context %}, the {% .Nonce %}, and other templates like “googleanalytics” and “cookieconsent”.
  4. The processed HTML is sent to the browser. User Request ----> Go Backend ----> Template Processing (alerts.html + context) ----> HTML Response (URL Routing) (Injects window.perf data, nonce)
  5. The browser parses the HTML.
  6. When the browser encounters <script src="alerts.js"></script> (or the equivalent generated by the build system), it fetches and executes alerts.ts.
  7. alerts.ts imports ../modules/perf-scaffold-sk and ../modules/alerts-page-sk. This registers these custom elements with the browser. Browser Receives HTML -> Parses HTML -> Encounters <script> for alerts.ts | -> Fetches and Executes alerts.ts | -> import '../modules/perf-scaffold-sk'; -> import '../modules/alerts-page-sk'; (Custom elements are now defined)
  8. The browser then renders the custom elements (<perf-scaffold-sk> and <alerts-page-sk>). The JavaScript logic within these custom elements takes over, potentially fetching more data via AJAX using the initial window.perf context if needed, and populating the page content. Custom Elements Registered -> Browser renders <perf-scaffold-sk> and <alerts-page-sk> | -> JavaScript within these elements executes (e.g., reads window.perf, makes AJAX calls, builds UI)
  9. The SCSS file (alerts.scss) is also linked in the HTML (via the build system), and its styles (including those from body.scss) are applied.

This structure allows for a clean separation of concerns:

  • HTML provides the basic skeleton and server-side data injection points.
  • TypeScript/JavaScript (via custom elements) handles all dynamic behavior, UI rendering, and interaction logic.
  • SCSS handles the styling.

The help.html page is slightly different as it directly embeds more static content (help text and examples) within its HTML structure using Go templating ({% range ... %}). However, it still utilizes the perf-scaffold-sk for consistent page layout and imports its JavaScript for any scaffold-related functionalities.

The newindex.html and multiexplore.html pages additionally include a div with id="sidebar_help" within the perf-scaffold-sk. This suggests that the perf-scaffold-sk might have a designated area or slot where page-specific help content can be injected, or that the page-specific JavaScript (explore-sk.ts or explore-multi-sk.ts) might dynamically populate or interact with this sidebar content.

Module: /res

Resource Module (/res)

High-Level Overview

The /res module serves as a centralized repository for static assets required by the application. Its primary purpose is to provide a consistent and organized location for resources such as images, icons, and potentially other static files that are part of the user interface or overall application branding. By co-locating these assets, the module simplifies resource management, facilitates easier updates, and ensures that all parts of the application can reliably access necessary visual or static elements.

Design Decisions and Implementation Choices

The decision to have a dedicated /res module stems from the need to separate static content from dynamic code. This separation offers several benefits:

  1. Organization: Grouping all static assets in one place makes the project structure cleaner and easier to navigate. Developers know exactly where to look for or add new resources.
  2. Maintainability: When assets need to be updated (e.g., a new logo, a changed icon), modifications are localized to this module, reducing the risk of inadvertently affecting other parts of the codebase.
  3. Build Process Optimization: Build tools can often be configured to handle static assets differently (e.g., copying them directly to the output directory, optimizing images). Having a dedicated module simplifies the configuration of such processes.
  4. Caching and Delivery: Web servers and content delivery networks (CDNs) can be more effectively configured to cache and serve static assets when they are located in a well-defined directory.

The internal structure of /res is designed to categorize different types of assets. For instance, images are placed within a dedicated img subdirectory. This categorization aids in discoverability and allows for type-specific processing or handling if needed in the future.

Key Components/Files/Submodules

  • /res/img (Submodule/Directory):
    • Responsibility: This submodule is dedicated to storing all image assets used by the application. This includes logos, icons, background images, and any other visual elements that are not dynamically generated.
    • Why: Separating images into their own directory within /res keeps the root of the resource module clean and allows for specific image-related build optimizations or management strategies. For example, image compression tools or sprite generation scripts could target this directory specifically.
    • Key Files:
    • /res/img/favicon.ico:
      • Responsibility: This specific file provides the “favorite icon” or “favicon” for the application. Web browsers display this icon in various places, such as the browser tab, bookmarks bar, and address bar history. It's a small but important branding element that helps users quickly identify the application among many open tabs or saved links.
      • Why: The .ico format is the traditional and most widely supported format for favicons, ensuring compatibility across different browsers and platforms. Placing it directly in the img directory makes it easily discoverable by build tools and web servers, which often look for favicon.ico in standard locations. Its presence here ensures that the application has a visual identifier in browser contexts.

Workflows and Processes

A typical workflow involving the /res module might look like this:

  1. Asset Creation/Acquisition: A designer creates a new icon or a new version of the application logo.

    Designer           Developer
       |                   |
    [New Image Asset] --> [Receives Asset]
    
  2. Asset Placement: The developer places the new image file (e.g., new_icon.png) into the appropriate subdirectory within /res, likely /res/img/.

    Developer
       |
    [Places new_icon.png into /res/img/]
    
  3. Referencing the Asset: Application code (e.g., HTML, CSS, JavaScript) that needs to display this icon will reference it using a path relative to how the assets are served.

    Application Code (e.g., HTML)
       |
    <img src="/path/to/res/img/new_icon.png">
    

    (Note: The exact /path/to/ depends on how the web server or build system exposes the /res directory.)

  4. Build Process: During the application build, files from the /res module are typically copied to a public-facing directory in the build output.

    Build System
       |
    [Reads /res/img/new_icon.png] --> [Copies to /public_output/img/new_icon.png]
    
  5. Client Request: When a user accesses the application, their browser requests the asset. User's Browser Web Server | | [Requests /public_output/img/new_icon.png] ----> [Serves new_icon.png] | | [Displays new_icon.png] <------------------------+

This workflow highlights how the /res module acts as the source of truth for static assets, which are then processed and served to the end-user. The favicon.ico follows a similar, often more implicit, path as browsers automatically request it from standard locations.

Module: /samplevariance

The samplevariance module is a command-line tool designed to analyze the variance of benchmark samples, specifically those generated by nanobench and stored in Google Cloud Storage (GCS). Nanobench typically produces multiple samples (e.g., 10) for each benchmark execution. This tool facilitates the examination of these samples across a large corpus of historical benchmark runs.

The primary motivation for this tool is to identify benchmarks exhibiting high variance in their results. High variance can indicate instability in the benchmark itself, the underlying system, or the measurement process. By calculating statistics like the ratio of the median to the minimum value for each set of samples, samplevariance helps pinpoint traces that warrant further investigation.

The core workflow involves:

  1. Initialization: Parsing command-line flags to determine the GCS location of benchmark data, output destination (stdout or a file), filtering criteria for traces, and the number of top results to display.
  2. File Discovery: Listing all relevant JSON files from the specified GCS bucket and prefix.
  3. Data Processing (Concurrent): Distributing the discovered filenames to a pool of worker goroutines. Each worker:
    • Downloads a JSON file from GCS.
    • Parses the legacy nanobench format to extract benchmark results.
    • Filters traces based on the user-provided criteria.
    • For each matching trace, calculates the median and minimum of its samples.
    • Computes the ratio of median to minimum.
    • Stores this information as a sampleInfo struct.
  4. Aggregation and Sorting: Collecting all sampleInfo structs from the workers and sorting them in descending order based on the calculated median/min ratio. This brings the traces with the highest variance to the top.
  5. Output: Writing the sorted results to a CSV file (or stdout), including the trace identifier, minimum value, median value, and the median/min ratio.
[Flags] -> initialize() -> (ctx, bucket, objectPrefix, traceFilter, outputWriter)
                               |
                               v
filenamesFromBucketAndObjectPrefix(ctx, bucket, objectPrefix) -> [filenames]
                               |
                               v
samplesFromFilenames(ctx, bucket, traceFilter, [filenames])
  |
  |--> [gcsFilenameChannel] -> Worker Goroutine 1 -> traceInfoFromFilename() -> [sampleInfo] --\
  |                                                                                           |
  |--> [gcsFilenameChannel] -> Worker Goroutine 2 -> traceInfoFromFilename() -> [sampleInfo] ----> [aggregatedSamples] (mutex protected)
  |                                                                                           |
  |--> ... (up to workerPoolSize)                                                             |
  |                                                                                           |
  |--> [gcsFilenameChannel] -> Worker Goroutine N -> traceInfoFromFilename() -> [sampleInfo] --/
                               |
                               v
                             Sort([aggregatedSamples])
                               |
                               v
                             writeCSV([sortedSamples], topN, outputWriter) -> CSV Output

Key components and their responsibilities:

  • main.go: This is the entry point of the application and orchestrates the entire process.

    • main(): Drives the overall workflow: initialization, fetching filenames, processing samples, sorting, and writing the output.
    • initialize(): Handles command-line argument parsing. It sets up the GCS client, determines the input GCS path (defaulting to yesterday‘s data if not specified), parses the trace filter query, and configures the output writer (stdout or a specified file). The choice to default to yesterday’s data provides a convenient way to monitor recent benchmark stability without requiring explicit date specification.
    • filenamesFromBucketAndObjectPrefix(): Interacts with GCS to list all object names (filenames) under the specified bucket and prefix. It uses GCS client library features to efficiently retrieve only the names, minimizing data transfer.
    • samplesFromFilenames(): Manages the concurrent processing of benchmark files. It creates a channel (gcsFilenameChannel) to distribute filenames to a pool of worker goroutines (workerPoolSize). An errgroup is used to manage these goroutines and propagate any errors. A mutex protects the shared samples slice where results from workers are aggregated. This concurrent design is crucial for performance when dealing with a large number of benchmark files.
    • traceInfoFromFilename(): This function is executed by each worker goroutine. It takes a single GCS filename, reads the corresponding object from the bucket, parses the JSON content using format.ParseLegacyFormat (from perf/go/ingest/format) and parser.GetSamplesFromLegacyFormat (from perf/go/ingest/parser). For each trace that matches the traceFilter (a query.Query object from go/query), it sorts the sample values, calculates the median (using stats.Sample.Quantile from go-moremath/stats) and minimum, and then computes their ratio. The use of established libraries for parsing and statistical calculation ensures correctness and leverages existing, tested code.
    • writeCSV(): Formats the processed sampleInfo data into CSV format and writes it to the designated output writer. It includes a header row and then iterates through the sampleInfo slice, writing each entry. It also handles the --top flag to limit the number of output rows.
    • sampleInfo: A simple struct to hold the calculated statistics (trace ID, median, min, ratio) for a single benchmark trace's samples.
    • sampleInfoSlice: A helper type that implements sort.Interface to allow sorting sampleInfo slices by the ratio field in descending order. This is key to presenting the most variant traces first.
  • main_test.go: Contains unit tests for the writeCSV function. These tests verify that the CSV output is correctly formatted under different conditions, such as when writing all samples, a limited number of top samples, or when the number of samples is less than the requested top N. This ensures the output formatting logic is robust.

The design decision to use a worker pool (workerPoolSize) for processing files in parallel significantly speeds up the analysis, especially when dealing with numerous benchmark result files often found in GCS. The use of golang.org/x/sync/errgroup simplifies error handling in concurrent operations. Filtering capabilities (via the --filter flag and go/query) allow users to narrow down the analysis to specific subsets of benchmarks, making the tool more flexible and targeted. The output as a CSV file makes it easy to import the results into spreadsheets or other data analysis tools for further examination.

Module: /scripts

The /scripts module provides tooling to support the data ingestion pipeline for Skia Perf. The primary focus is on automating the process of transferring processed data to the designated cloud storage location for further analysis and visualization within the Skia performance monitoring system.

The key responsibility of this module is to ensure reliable and timely delivery of performance data. This is achieved by interacting with Google Cloud Storage (GCS) using the gsutil command-line tool.

The main component within this module is the upload_extracted_json_files.sh script.

upload_extracted_json_files.sh

This shell script is responsible for uploading JSON files, which are assumed to be the output of a preceding data extraction or processing phase, to a specific Google Cloud Storage bucket (gs://skia-perf/nano-json-v1/).

Design Rationale and Implementation Details:

  • Why a shell script? Shell scripting is a straightforward and widely available tool for automating command-line operations, making it suitable for tasks like file transfers to cloud storage. It avoids the need for more complex programming language environments for this specific, relatively simple task.
  • Why gsutil? gsutil is the standard command-line tool for interacting with Google Cloud Storage. It provides robust features for uploading, downloading, and managing data in GCS buckets.
  • Why -m (parallel uploads)? The -m flag in gsutil cp enables parallel uploads. This is a crucial performance optimization, especially when dealing with a potentially large number of JSON files. By uploading multiple files concurrently, the overall time taken for the transfer is significantly reduced.
  • Why cp -r (recursive copy)? The -r flag ensures that the entire directory structure under downloads/ is replicated in the destination GCS path. This is important for maintaining the organization of the data and potentially for downstream processing that might rely on the file paths.
  • Why the specific GCS path structure (gs://skia-perf/nano-json-v1/$(date -u --date +1hour +%Y/%m/%d/%H))?
    • gs://skia-perf/nano-json-v1/: This is the base path in the GCS bucket designated for “nano” format JSON files, version 1. This structured naming helps in organizing different types and versions of data within the bucket.
    • $(date -u --date +1hour +%Y/%m/%d/%H): This part dynamically generates a timestamped subdirectory structure.
    • date -u: Ensures the date is in UTC, providing a consistent timezone regardless of where the script is run.
    • --date +1hour: This is a deliberate choice to place the data into the next hour's ingestion slot. This likely provides a buffer, ensuring that all data generated within a given hour is reliably captured and processed for that hour, even if the script runs slightly before or after the hour boundary. It helps prevent data from being missed or attributed to the wrong time window due to minor timing discrepancies in script execution.
    • +%Y/%m/%d/%H: Formats the date and time into a hierarchical path (e.g., 2023/10/27/15). This organization is beneficial for:
      • Data partitioning: Makes it easy to query or process data for specific time ranges.
      • Data lifecycle management: Facilitates policies for archiving or deleting older data based on these time-based folders.
      • Browseability: Improves human readability and navigation within the GCS bucket.

Workflow:

The script executes a simple, linear workflow:

  1. Source: Identifies the downloads/ directory in the current working directory as the source of JSON files. [Local Filesystem] | ./downloads/ (contains *.json files)
  2. Destination Path Generation: Dynamically constructs the target GCS path using the current UTC time, advanced by one hour, and formatted as YYYY/MM/DD/HH. date command ---> YYYY/MM/DD/HH (e.g., 2023/10/27/15) | Target GCS Path: gs://skia-perf/nano-json-v1/YYYY/MM/DD/HH/
  3. Upload: Uses gsutil to recursively copy all contents from downloads/ to the generated GCS path, utilizing parallel uploads for efficiency. ./downloads/* ---(gsutil -m cp -r)---> gs://skia-perf/nano-json-v1/YYYY/MM/DD/HH/

This script assumes that the downloads/ directory exists in the location where the script is executed and contains the JSON files ready for upload. It also presumes that the user running the script has the necessary gsutil tool installed and configured with appropriate permissions to write to the specified GCS bucket.

Module: /secrets

The /secrets module is responsible for managing the creation and configuration of secrets required for various Skia Perf services to operate. These secrets primarily involve Google Cloud service accounts and OAuth credentials for email sending. The scripts in this module automate the setup of these credentials, ensuring that services have the necessary permissions to interact with Google Cloud APIs and other resources.

The design philosophy emphasizes secure and automated credential management. Instead of manual creation and configuration of secrets, these scripts provide a repeatable and version-controlled way to provision them. This reduces the risk of human error and ensures that services are configured with the principle of least privilege. For instance, service accounts are granted only the specific roles they need to perform their tasks.

Key Components and Scripts:

1. Service Account Creation Scripts:

  • create-flutter-perf-service-account.sh: This script provisions a Google Cloud service account specifically for the Flutter Perf instance. It leverages a common script (../../kube/secrets/add-service-account.sh) to handle the underlying gcloud commands.

    • Why: Flutter Perf needs its own identity to interact with Google Cloud services like Pub/Sub (for message queuing) and Cloud Trace (for application performance monitoring). Separating this into its own service account adheres to the principle of least privilege and allows for more granular permission management.
    • How: It calls the add-service-account.sh script, passing in parameters like the project ID, the desired service account name (“flutter-perf-service-account”), a descriptive display name, and the necessary IAM roles (roles/pubsub.editor, roles/cloudtrace.agent).
  • create-perf-cockroachdb-backup-service-account.sh: This script creates a dedicated service account for the Perf CockroachDB backup cronjob.

    • Why: The backup process requires permissions to write data to Google Cloud Storage. A dedicated service account ensures that only the backup job has these specific permissions, enhancing security. If the backup job's credentials were compromised, the blast radius would be limited to storage object administration.
    • How: Similar to the Flutter Perf service account, it utilizes ../../kube/secrets/add-service-account.sh. It specifies the service account name (“perf-cockroachdb-backup”) and the roles/storage.objectAdmin role, which grants permissions to manage objects in Cloud Storage buckets.
  • create-perf-ingest-sa.sh: This script is responsible for creating the perf-ingest service account. This account is used by the Perf ingestion service, which processes and stores performance data.

    • Why: The ingestion service needs to publish messages to Pub/Sub topics, send trace data to Cloud Trace, and read data from specific Google Cloud Storage buckets (gs://skia-perf, gs://cluster-telemetry-perf). A dedicated service account with these precise permissions is crucial for security and operational clarity. It also leverages Workload Identity, a more secure way for Kubernetes workloads to access Google Cloud services.
    • How:
    • It sources configuration (../kube/config.sh) and utility functions (../bash/ramdisk.sh) for environment setup.
    • Creates the service account (perf-ingest) using gcloud iam service-accounts create.
    • Assigns necessary IAM roles:
      • roles/pubsub.editor: To publish messages to Pub/Sub.
      • roles/cloudtrace.agent: To send trace data.
    • Configures Workload Identity by binding the Kubernetes service account (default/perf-ingest in the skia-public namespace) to the Google Cloud service account. This allows pods running as perf-ingest in Kubernetes to impersonate the perf-ingest Google Cloud service account without needing to mount service account key files directly. Kubernetes Pod (default/perf-ingest) ----> Impersonates ----> Google Cloud SA (perf-ingest@skia-public.iam.gserviceaccount.com) | +----> Accesses GCP Resources (Pub/Sub, Cloud Trace, GCS)
    • Grants objectViewer permissions on specific GCS buckets using gsutil iam ch.
    • Creates a JSON key file for the service account (perf-ingest.json).
    • Creates a Kubernetes secret named perf-ingest from this key file using kubectl create secret generic. This secret can then be used by deployments that might not be able to use Workload Identity directly or for other specific use cases.
    • Operations are performed in a temporary ramdisk (/tmp/ramdisk) to avoid leaving sensitive key files on persistent storage.
  • create-perf-sa.sh: This script creates the primary skia-perf service account. This is a general-purpose service account for the main Perf application.

    • Why: The main Perf application requires permissions for Pub/Sub, Cloud Trace, and reading from the gs://skia-perf bucket. Similar to perf-ingest, this service account uses Workload Identity for enhanced security when running within Kubernetes.
    • How: The process is very similar to create-perf-ingest-sa.sh:
    • Sources configuration and sets up a ramdisk.
    • Creates the skia-perf service account.
    • Assigns roles/cloudtrace.agent and roles/pubsub.editor.
    • Configures Workload Identity, binding the Kubernetes service account (default/skia-perf) to the skia-perf Google Cloud service account.
    • Grants objectViewer on the gs://skia-perf GCS bucket.
    • Creates a JSON key and stores it as a Kubernetes secret named skia-perf.

2. Email Secrets Creation:

  • create-email-secrets.sh: This script facilitates the creation of Kubernetes secrets necessary for Perf to send emails via Gmail. This typically involves an OAuth 2.0 flow.
    • Why: Perf needs to send email notifications (e.g., for alerts). Using Gmail programmatically requires proper authentication, which is achieved through OAuth 2.0. Storing these credentials as Kubernetes secrets makes them securely available to the Perf application pods.
    • How: This script guides the user through a semi-automated process:
    • It takes the email address to be authenticated as an argument (e.g., alertserver@skia.org).
    • It converts the email address into a Kubernetes-friendly secret name format (e.g., alertserver-skia-org).
    • It prompts the user to download the client_secret.json file (obtained from the Google Cloud Console after enabling the Gmail API and creating OAuth 2.0 client credentials) to /tmp/ramdisk.
    • It then instructs the user to run the three_legged_flow Go program (which must be built and installed separately from ../go/email/three_legged_flow). This program initiates the OAuth 2.0 three-legged authentication flow. User Action: Run three_legged_flow --> Browser opens for Google Auth --> User authenticates as specified email | v three_legged_flow generates client_token.json
    • Once client_token.json (containing the authorization token and refresh token) is generated in /tmp/ramdisk, the script uses kubectl create secret generic to create a Kubernetes secret named perf-${EMAIL}-secrets. This secret contains both client_secret.json and client_token.json.
    • Crucially, it then removes the client_token.json file from the local filesystem because it contains a sensitive refresh token. The source of truth for this token becomes the Kubernetes secret.
    • The use of /tmp/ramdisk ensures that sensitive downloaded and generated files are stored in memory and are less likely to be inadvertently persisted.

The common pattern across these scripts is the use of gcloud for Google Cloud resource management and kubectl for interacting with Kubernetes to store the secrets. The use of a ramdisk for temporary storage of sensitive files like service account keys and OAuth tokens is a security best practice. Workload Identity is preferred for service accounts running in GKE, reducing the need to manage and distribute service account key files.