blob: 1fcb1e757dcc72d673268c349459bc6365ab6dbc [file] [log] [blame] [view]
# Module: /
## Skia Perf Technical Documentation
### 1. High-Level Overview
**Project Objectives:** Skia Perf is a performance monitoring system designed to
ingest, store, analyze, and visualize performance data for various projects,
with a primary focus on Skia and related systems (e.g., Flutter, Android,
Chrome). Its core objectives are:
1. **Centralized Performance Data Storage:** Provide a robust and scalable
repository for performance metrics collected from diverse benchmark runs.
2. **Interactive Data Exploration:** Offer web-based dashboards that allow
users to query, explore, and visualize performance trends over time, across
different configurations and code revisions.
3. **Automated Regression Detection:** Implement algorithms to automatically
identify statistically significant performance regressions (and
improvements) as new data is ingested.
4. **Alerting and Notification:** Notify relevant stakeholders (developers,
performance engineers) about detected regressions.
5. **Triage and Investigation Support:** Provide tools to help users triage
regressions, associate them with code changes, and track their resolution.
6. **Integration with Developer Workflows:** Connect performance data with
version control systems (Git) and issue trackers.
**Functionality:** Perf consists of several key components that work together:
- **Data Ingestion:** Processes performance data files (typically JSON)
uploaded to Google Cloud Storage (GCS). These files are parsed, validated,
and their metrics are stored against corresponding Git commit hashes.
- **Data Storage:** Uses SQL databases (primarily CockroachDB, with support
for Spanner) to store metadata (commits, alert configurations, regression
statuses) and specialized trace data.
- **Clustering & Regression Detection:** Employs k-means clustering and step
detection algorithms on time-series trace data to identify regressions or
significant performance shifts. This can be run continuously or triggered by
new data events.
- **Frontend UI:** A web application (built with Go on the backend and
TypeScript/Lit/Web Components on the frontend) that provides interactive
dashboards for:
- Plotting performance metrics over commit ranges.
- Querying traces based on various parameters.
- Configuring alerts.
- Triaging detected regressions.
- Viewing commit details and associated performance changes.
- **Alerting System:** Allows users to define alerts based on specific queries
and thresholds. When regressions matching these alerts are found,
notifications can be sent (e.g., email, issue tracker integration).
- **Command-Line Tools:** Provides `perfserver` (to run the different
services) and `perf-tool` (for administrative tasks, data inspection, and
database backups/restores).
### 2. Project-Specific Terminology
- **Trace:** A single time series of performance measurements for a specific
test under a specific configuration (e.g., memory usage for `draw_a_circle`
on `arch=x86,config=8888`). Trace IDs are structured key-value strings like
`,arch=x86,config=8888,test=draw_a_circle,units=ms,`.
- **CommitNumber:** An internal, monotonically increasing integer assigned by
Perf to each Git commit as it's processed. This provides a linear sequence
for ordering data.
- **Tile:** A logical grouping of commits. Trace data is stored in relation to
these tiles. The `tile_size` (number of commits per tile) is configurable
and affects how data is sharded and queried.
- **ParamSet:** A collection of unique parameter key-value pairs observed in
the data within a certain commit range (or tile range). Used to populate UI
query builders.
- **DataFrame:** A tabular data structure, similar to R's dataframes or Pandas
DataFrames, used on the backend and frontend. It holds trace values indexed
by commit and trace ID, along with header information (commit details).
- **Cluster / Clustering:** The process of grouping similar traces together
using k-means clustering. This is a core part of regression detection, as a
significant change in the centroid of a cluster can indicate a regression.
- **Regression (Statistic):** A numerical value (StepSize / LeastSquaresError)
calculated for a cluster's centroid after fitting a step function. It
measures how much the centroid's behavior resembles a step change. High
absolute values are "Interesting."
- **Alert (Configuration):** A user-defined configuration that specifies a
query to select traces, a detection algorithm, grouping parameters, and
notification settings for finding regressions.
- **Ingestion Format:** A specific JSON structure (documented in `FORMAT.md`)
that Perf expects for input data files.
- **Shortcut:** A saved URL or configuration, often represented by a short,
hashed ID, for quickly accessing a specific view or set of traces.
- **Triage:** The process of reviewing a detected regression, determining if
it's a genuine issue or an expected change/noise, and marking it accordingly
(e.g., "Bug," "Ignore").
### 3. Overall Architecture
Perf follows a services-oriented architecture, where the main `perfserver`
executable can run in different modes (frontend, ingest, cluster, maintenance).
Data flows from external benchmark systems into Perf, where it's processed,
stored, analyzed, and finally presented to users.
**Data Flow and Main Components:**
```
External Benchmark Systems
|
V
[Data Files (JSON) in Perf Ingestion Format]
| (Uploaded to Google Cloud Storage - GCS)
V
GCS Bucket (e.g., gs://skia-perf/nano-json-v1/)
| (Pub/Sub event on new file arrival)
V
Perf Ingest Service(s) (`perfserver ingest` mode)
| - Parses JSON files (see /go/ingest/parser)
| - Validates data (see /go/ingest/format)
| - Associates data with Git commits (see /go/git)
| - Writes trace data to TraceStore (SQL, tiled) (see /go/tracestore)
| - Updates ParamSets (for UI query builders)
| - (Optionally) Emits Pub/Sub events for "Event Driven Alerting"
V
SQL Database (CockroachDB / Spanner)
| - Trace Data (values, parameters, indexed by commit/tile)
| - Commit Information (hashes, timestamps, messages)
| - Alert Configurations
| - Regression Records (details of detected regressions, triage status)
| - Shortcuts, User Favorites, etc.
|
+<--> Perf Cluster Service(s) (`perfserver cluster` or `perfserver frontend --do_clustering` mode)
| - Loads Alert configurations
| - Queries TraceStore for relevant data
| - Performs clustering (k-means) (see /go/clustering2, /go/ctrace2)
| - Fits step functions to cluster centroids (see /go/stepfit)
| - Calculates Regression statistic
| - Stores "Interesting" clusters/regressions in the database
| - Sends notifications (email, issue tracker) (see /go/notify)
|
+<--> Perf Frontend Service (`perfserver frontend` mode)
| - Serves HTML, CSS, JS (see /pages, /modules)
| - Handles API requests from the UI (see /go/frontend, /API.md)
| - Queries database for trace data, alert configs, regressions
| - Formats data for UI display (often as DataFrames)
| - Manages user authentication (via X-WEBAUTH-USER header)
|
+<--> Perf Maintenance Service (`perfserver maintenance` mode)
- Git repository synchronization
- Database schema migrations (see /migrations)
- Old data cleanup
- Cache refreshing (e.g., ParamSet cache)
```
**Rationale for Key Architectural Choices:**
- **Decoupled Ingestion via GCS and Pub/Sub:**
- **Why:** This decouples data producers from Perf's internal processing.
Producers only need to drop files in a GCS bucket. Pub/Sub provides a
scalable and reliable way to notify ingesters about new files, allowing
multiple ingester instances to pull work.
- **How:** Ingesters subscribe to a Pub/Sub topic. GCS is configured to
publish a message to this topic when a new file is finalized in the
designated ingestion bucket/prefix.
- **SQL Database for Structured Data:**
- **Why:** SQL databases like CockroachDB and Spanner provide
transactional consistency, scalability, and powerful querying
capabilities needed for metadata, alert configurations, and regression
tracking. CockroachDB offers PostgreSQL compatibility, which is widely
used. Spanner provides horizontal scalability for very large datasets.
- **How:** Go's `database/sql` package is used, with schema defined and
managed by `/go/sql` and migration scripts in `/migrations`.
- **Specialized TraceStore:**
- **Why:** Performance trace data is time-series and can be voluminous. A
generic relational model might not be optimal for the typical queries
(fetching traces over commit ranges for specific parameter sets). The
tiled approach with inverted indexes for parameters is designed for more
efficient retrieval.
- **How:** The `TraceStore` (/go/tracestore) implementation uses SQL
tables but structures them to represent tiles of commits. `ParamSets`
and `Postings` tables act as inverted indexes for fast lookup of traces
matching specific key-value parameters.
- **Monolithic Executable (`perfserver`) with Modes:**
- **Why:** Simplifies deployment and reduces the number of distinct
binaries to manage. A single executable can be configured to run as a
frontend, an ingester, a clusterer, or a maintenance task.
- **How:** `perfserver` uses command-line flags and subcommands to
determine its operational mode. Configuration files (`/configs/*.json`)
further dictate behavior within each mode.
- **K-Means Clustering for Regression Detection:**
- **Why:** K-means is a well-understood clustering algorithm suitable for
grouping traces with similar performance characteristics. Changes in
these groups over time can signal regressions. Traces are normalized
before clustering to make them comparable despite different scales.
- **How:** Implemented in `/go/clustering2` and `/go/kmeans`. `ctrace2`
handles trace normalization.
- **Frontend/Backend Separation:**
- **Why:** Standard practice for web applications. Allows independent
development and scaling of the UI and the backend logic.
- **How:** Backend (Go) serves JSON APIs. Frontend (TypeScript/Lit)
consumes these APIs to render interactive views.
- **Event-Driven Alerting (Optional):**
- **Why:** For very large and sparse datasets (like Android), continuous
clustering over all alerts can be resource-intensive and slow.
Event-driven alerting processes only the data relevant to recently
updated traces, reducing latency and computational load.
- **How:** Ingesters publish Pub/Sub events containing IDs of updated
traces. Clusterers subscribe to these events and run relevant alert
configurations only for the affected data.
### 4. Module Responsibilities and Key Components
This section focuses on significant modules beyond simple file/directory
descriptions.
- **`/go/config`**:
- **Responsibility:** Defines and validates the structure for instance
configuration files (`InstanceConfig`). This is the central place where
all settings for a Perf deployment (database, ingestion sources, Git
repo, UI features, notification settings) are specified.
- **Why:** Configuration files allow Perf to be deployed for different
projects with different data sources and requirements without code
changes. A strongly-typed Go struct ensures that configurations are
well-defined and can be validated.
- **How:** `InstanceConfig` is a Go struct with fields for various aspects
of the system. JSON files in `/configs` are unmarshaled into this
struct. The module provides functions to load and validate these
configurations.
- **`/go/ingest`**:
- **Responsibility:** Orchestrates the entire data ingestion pipeline.
This includes watching for new files, parsing them according to the
`format.Format` specification, extracting performance metrics and
metadata, associating them with Git commits, and writing the data to the
`TraceStore`.
- **Why:** This module is the entry point for all performance data into
the Perf system. It needs to be robust, handle various data formats
(though primarily the standard JSON format), and ensure data integrity.
- **Key Sub-components:**
- `ingest/format`: Defines the expected structure of input JSON files
(`format.Format` Go struct) and provides validation. This ensures data
consistency.
- `ingest/parser`: Contains logic to parse the `format.Format` structure
and extract individual trace measurements and their associated
parameters.
- `ingest/process`: Coordinates the steps: reading from a source (e.g.,
GCS via `/go/file`), parsing, resolving commit information (via
`/go/git`), and writing to the `TraceStore`.
- **Workflow:**
* A `Source` (e.g., `GCSSource` via PubSub) indicates a new file.
* `process` reads the file.
* `parser` and `format` validate and extract `Result`s.
* For each `Result`, its `git_hash` is resolved to a `CommitNumber` using
`/go/git`.
* Traces are constructed and written to `/go/tracestore`.
- **`/go/tracestore`**:
- **Responsibility:** Manages the storage and retrieval of performance
trace data. This is a critical component for efficient querying.
- **Why:** Trace data is time-series and multi-dimensional. The
`TraceStore` is designed to efficiently retrieve trace values for
specific parameter combinations over ranges of commits.
- **How:** It uses a "tiled" storage approach. Commits are grouped into
tiles.
- `TraceValues` table: Stores the actual metric values, often sharded by
tile.
- `ParamSets` table: Stores unique key-value pairs found in trace
identifiers within each tile.
- `Postings` table: An inverted index mapping (tile, param_key,
param_value) to a list of trace IDs that contain that key-value pair
within that tile. This structure allows queries like "get all traces
where `config=8888` and `arch=x86`" to be resolved efficiently by
intersecting posting lists. `SQLTraceStore` is the primary
implementation using the SQL database.
- **`/go/git`**:
- **Responsibility:** Interacts with Git repositories to fetch commit
information (hashes, authors, timestamps, messages). It also caches this
information in the SQL database to avoid repeated Git operations.
- **Why:** Perf needs to correlate performance data with specific code
changes. This module provides the link between `git_hash` values in
ingested data and Perf's internal `CommitNumber` sequence.
- **How:** It can use either a local Git checkout (via `git` CLI) or a
Gitiles service API. It maintains a `Commits` table in the SQL database,
mapping commit hashes to `CommitNumber`s and storing other metadata. It
periodically updates its local Git repository clone or queries Gitiles
for new commits.
- **`/go/regression`**:
- **Responsibility:** Handles the detection, storage, and management of
performance regressions.
- **Why:** This is a core function of Perf. It provides the logic to
identify when performance has changed significantly and to track the
status of these findings.
- **How:**
- It uses clustering results (from `/go/clustering2`) and step-fit
analysis (from `/go/stepfit`) to identify "Interesting" clusters.
- `Store` interface (implemented by `sqlregression2store`): Persists
information about detected regressions, including the cluster summary,
owning alert, commit hash, regression statistic, and triage status
(`New`, `Ignore`, `Bug`).
- The "Alerting" algorithm described in `DESIGN.md` (comparing new
interesting clusters with existing ones based on trace fingerprints) is
implemented here to manage the lifecycle of a regression.
- **Key Workflow for Alerting/Regression Tracking:** `Run clustering
(e.g., hourly or event-driven) | V Identify "Interesting" new clusters
(high |Regression| score) | V For each new Interesting Cluster: Compare
fingerprint (top N traces) with existing relevant Clusters in DB | +--
No match? --> New Regression: Store in DB with status "New". | +-- Match
found? --> Update existing Regression if new one has better |Regression|
score. Keep triage status of existing.`
- **`/go/frontend`**:
- **Responsibility:** Implements the backend for the Perf web user
interface. It handles HTTP requests, interacts with data stores
(TraceStore, AlertStore, RegressionStore, etc.), processes data, and
serves JSON responses to the frontend.
- **Why:** This module connects the user's browser interactions to Perf's
data and analytical capabilities.
- **How:** It uses Go's standard `net/http` package to define HTTP
handlers for various API endpoints (e.g., fetching data for plots,
listing alerts, updating triage statuses). It authenticates users based
on the `X-WEBAUTH-USER` header. It often fetches data, converts it into
`DataFrame` structures, and then serializes these to JSON for the
frontend.
- **`/modules` (Frontend TypeScript)**:
- **Responsibility:** Contains the TypeScript source code for all frontend
custom elements (web components) and UI logic. These modules are
compiled into JavaScript and CSS that run in the user's browser.
- **Why:** This is where the user interface is built. Modularity (one
component per file/directory) makes the frontend codebase manageable.
Custom elements (often using Lit) provide encapsulation and reusability.
- **How:** Each subdirectory typically defines one or more custom elements
(e.g., `plot-simple-sk`, `alert-config-sk`, `query-sk`). These elements
handle rendering, user interaction, and making API calls to the Go
backend.
- `perf-scaffold-sk`: Provides the main page layout (header, sidebar,
content area).
- `explore-simple-sk` / `explore-sk`: Core components for querying data
and displaying plots.
- `json/index.ts`: Contains TypeScript interfaces mirroring Go backend
structs for type-safe API communication. This is crucial for ensuring
frontend and backend data structures are compatible. It's often
generated from Go source using `/go/ts/ts.go`.
- **`/pages`**:
- **Responsibility:** Defines the top-level HTML structure for each
distinct page of the Perf application (e.g., alerts page, exploration
page).
- **Why:** These files serve as the entry points for specific views. They
are kept minimal, primarily including the `perf-scaffold-sk` and the
main page-specific custom element.
- **How:** Each HTML file (e.g., `alerts.html`) includes the
`perf-scaffold-sk` and the relevant page element (e.g.,
`<alerts-page-sk>`). An associated TypeScript file (e.g., `alerts.ts`)
imports the necessary custom element definitions. Server-side Go
templates inject initial context data (`window.perf = {%.context %};`)
into the HTML.
- **`DESIGN.md`**:
- **Significance:** This document is the primary source for understanding
the high-level architecture, design rationale, and core algorithms of
Perf, particularly for clustering and alerting.
- **Key Concepts Explained:**
- **Clustering:** Details the use of k-means clustering on normalized
traces, the Euclidean distance metric, and the calculation of the
"Regression" statistic (StepSize / LeastSquaresError) to identify
"Interesting" clusters.
- **Alerting Algorithm:** Explains how Perf identifies and tracks unique
regressions over time by fingerprinting clusters and comparing new
interesting clusters to existing ones. It outlines the schema for the
`clusters` table (though the actual schema is in `/go/sql` and may have
evolved to `Regressions` table).
- **Event Driven Alerting:** Describes an alternative to continuous
clustering, triggered by PubSub events when new data arrives. This is
beneficial for large, sparse datasets.
- **`FORMAT.md`**:
- **Significance:** Defines the precise JSON structure that Perf ingesters
expect for input data files.
- **Key Elements:** Specifies fields like `git_hash`, `key` (for global
parameters), and `results` (an array of measurements). Each result can
have its own `key` (for test-specific parameters like `test` name and
`units`) and either a single `measurement` or a more complex
`measurements` object for statistics (min, max, median). This document
is crucial for data producers who need to integrate with Perf.
- **`BUILD.bazel` (Root)**:
- **Significance:** Defines how the Perf application is built using Bazel.
It specifies container images (`perfserver`, `backendserver`) that
package the Go executables and necessary static resources (configs,
frontend assets).
- **How:** Uses `skia_app_container` rules to assemble Docker images. It
copies the `perfserver` and `perf-tool` binaries, configuration files
from `/configs`, and compiled frontend assets (HTML, JS, CSS from
`/pages` built output) into the image. The `entrypoint` for the
`perfserver` image is the `perfserver` executable itself.
### 5. Key Workflows Illustrated (Pseudographic Diagrams)
**A. New Alert Creation via UI and API:**
```
User (in Perf UI, e.g., on /alerts page)
|
| Fills out Alert configuration form (<alert-config-sk> element)
| Clicks "Save"
|
V
Frontend JS (<alert-config-sk>)
|
| 1. If new alert, GET /_/alert/new
| (Server responds with a pre-populated Alert JSON with id: -1)
|
| 2. Modifies this Alert JSON based on form input
|
| 3. POST modified Alert JSON to /_/alert/update
| (Authorization: Bearer token if auth is enabled)
|
V
Perf Backend (`/go/frontend/service.go` - UpdateAlertHandler)
|
| Receives Alert JSON
| If alert.ID == -1, it's a new alert.
| Validates Alert configuration
| Persists Alert to SQL Database (via `alerts.Store`)
| Responds 200 OK
|
V
SQL Database (Alerts Table)
|
| New Alert record is created or existing one updated.
```
**Rationale:**
- The `GET /_/alert/new` step is a convenience. It provides the frontend with
a valid `Alert` structure, including any instance-default values,
simplifying new alert creation logic on the client.
- Using `id: -1` to signify a new alert during the `POST` to `/_/alert/update`
is a common pattern to allow a single endpoint to handle both creation and
updates. The backend inspects the ID to determine the correct action.
- The API interactions are documented in `API.md`.
**B. Data Ingestion and Event-Driven Regression Detection:**
```
Benchmark System
|
| Produces performance_data.json (Perf Ingestion Format)
| Uploads to GCS: gs://[bucket]/[path]/YYYY/MM/DD/HH/performance_data.json
|
V
Google Cloud Storage
|
| File "OBJECT_FINALIZE" event
| Publishes message to PubSub Topic (e.g., "perf-ingestion-topic")
|
V
Perf Ingest Service(s) (Subscribed to "perf-ingestion-topic")
|
| 1. Receives PubSub message (contains GCS file path)
| 2. Downloads performance_data.json from GCS
| 3. Parses JSON, validates data (see /go/ingest/format, /go/ingest/parser)
| 4. Looks up git_hash in /go/git to get CommitNumber
| 5. Writes trace data to TraceStore (SQL tables)
| 6. If Event Driven Alerting enabled for this instance:
| Constructs a list of Trace IDs updated by this file
| Publishes message (containing gzipped Trace IDs) to another PubSub Topic (e.g., "trace-update-topic")
|
V
Perf Cluster Service(s) (Subscribed to "trace-update-topic")
|
| 1. Receives PubSub message (with updated Trace IDs)
| 2. For each Alert Configuration (/go/alerts):
| If Alert's query matches any of the updated Trace IDs:
| Run clustering & regression detection for THIS Alert,
| focusing on the commit range and data relevant to the updated traces.
| (Reduces scope compared to full continuous clustering)
| 3. If regressions found:
| Store in SQL Database (Regressions table)
| Send notifications (email, issue tracker)
```
**Rationale:**
- **GCS as Entry Point:** As described in `FORMAT.md` and `DESIGN.md`, GCS is
the standard way data enters Perf. The YYYY/MM/DD/HH path structure is a
convention.
- **Pub/Sub for Decoupling and Scalability:** Ingesters don't need to poll
GCS. Pub/Sub handles event delivery, and multiple ingesters can process
files in parallel.
- **Event-Driven Clustering Optimization:** `DESIGN.md` explicitly states this
is for large/sparse datasets. Sending only updated Trace IDs significantly
narrows the scope of clustering for each event, making it faster and less
resource-intensive than re-clustering everything. PubSub's 10MB message
limit is considered for gzipped trace ID lists.
This documentation provides a comprehensive starting point for a software
engineer to understand the Skia Perf project. It covers its purpose,
architecture, core concepts, and the rationale behind key design and
implementation choices, referencing existing documentation and source code
structure where appropriate.
# Module: /cockroachdb
The `/cockroachdb` module provides a set of shell scripts designed to facilitate
interaction with a CockroachDB instance, specifically one named
`perf-cockroachdb`, which is presumed to be running within a Kubernetes cluster.
These scripts abstract away some of the complexities of `kubectl` commands,
offering streamlined access for common database operations.
The primary motivation behind these scripts is to simplify development and
administrative workflows. Instead of requiring users to remember and type
lengthy `kubectl` commands with specific flags and resource names, these scripts
provide convenient, single-command access points.
**Key Components and Responsibilities:**
- **`admin.sh`**: This script focuses on providing access to the CockroachDB
administrative web interface.
- **Why**: The web UI is a crucial tool for monitoring database health,
performance, and managing cluster settings. Direct access via `kubectl
port-forward` can be cumbersome to set up repeatedly.
- **How**: It executes `kubectl port-forward` to map the local port `8080`
to the port `8080` of the `perf-cockroachdb-0` pod. Crucially, it then
immediately attempts to open this local address in Google Chrome,
providing an instant user experience. This assumes Google Chrome is
installed and available in the system's PATH. `User runs admin.sh | V
Script executes: kubectl port-forward perf-cockroachdb-0 8080 | V Local
port 8080 now forwards to CockroachDB pod's port 8080 | V Script
executes: google-chrome http://localhost:8080 | V CockroachDB Admin UI
opens in Chrome`
- **`connect.sh`**: This script is designed to provide a SQL shell connection
to the CockroachDB instance.
- **Why**: Developers and administrators frequently need to execute SQL
queries directly against the database for debugging, data manipulation,
or schema inspection. Setting up an interactive `kubectl run` command
with the correct image and arguments can be error-prone.
- **How**: It uses `kubectl run` to create a temporary, interactive pod
named `androidx-cockroachdb`. This pod uses the
`cockroachdb/cockroach:v19.2.5` Docker image. The `--rm` flag ensures
the pod is deleted after the session ends, and `--restart=Never`
prevents it from being restarted. The crucial part is the command passed
to the pod: `sql --insecure --host=perf-cockroachdb-public`. This starts
the CockroachDB SQL client, connecting insecurely to the database
service exposed at `perf-cockroachdb-public`. `User runs connect.sh | V
Script executes: kubectl run androidx-cockroachdb -it --image=... --rm
--restart=Never -- sql --insecure --host=perf-cockroachdb-public | V
Temporary pod 'androidx-cockroachdb' is created | V CockroachDB SQL
client starts inside the pod, connecting to 'perf-cockroachdb-public' |
V User has an interactive SQL shell | V User exits shell -> Pod
'androidx-cockroachdb' is deleted`
- **`skia-infra-public-port-forward.sh`**: This script sets up a port forward
for direct database connections, typically for use with a local CockroachDB
SQL client or other database tools.
- **Why**: While `connect.sh` provides an in-cluster SQL shell, sometimes
a direct connection from the local machine is preferred, for instance,
to use graphical SQL clients or specific client libraries that are not
available within the temporary pod created by `connect.sh`. The
`perf-cockroachdb` instance is likely within a private network in the
Kubernetes cluster (namespace `perf`), and this script makes it
accessible locally.
- **How**: It leverages a helper script `../../kube/attach.sh
skia-infra-public` (the details of which are outside this module's scope
but presumably handles Kubernetes context or authentication for the
`skia-infra-public` cluster). This helper script is then used to execute
`kubectl port-forward` specifically for the `perf-cockroachdb-0` pod
within the `perf` namespace. It maps local port `25000` to the pod's
CockroachDB port `26257`. The script also helpfully prints instructions
for the user on how to connect using the `cockroach sql` command once
the port forward is active. The `set -e` command ensures the script
exits immediately if any command fails, and `set -x` enables command
tracing for debugging. `User runs skia-infra-public-port-forward.sh | V
Script prints connection instructions | V Script executes:
../../kube/attach.sh skia-infra-public kubectl port-forward -n perf
perf-cockroachdb-0 25000:26257 | V Port forward is established:
local:25000 -> perf-cockroachdb-0:26257 (in 'perf' namespace) | V User
can now run 'cockroach sql --insecure --host=127.0.0.1:25000' in another
terminal`
These scripts collectively aim to make interacting with the `perf-cockroachdb`
instance as straightforward as possible by encapsulating the necessary `kubectl`
commands and providing context-specific instructions or actions. They rely on
the Kubernetes cluster being correctly configured and accessible, and on
`kubectl` and potentially `google-chrome` being available on the user's system.
# Module: /configs
The `/configs` directory houses JSON configuration files for various instances
of the Perf performance monitoring system. Each file defines the specific
behavior and data sources for a particular Perf deployment. These configurations
are crucial for tailoring Perf to different projects and environments, enabling
developers and performance engineers to monitor and analyze performance data
effectively.
The core idea is to provide a declarative way to set up a Perf instance. Instead
of hardcoding settings, these JSON files act as blueprints. Each file serializes
to and from a Go struct named `config.InstanceConfig`. This struct serves as the
canonical schema for all instance configurations, and its Go documentation
provides detailed explanations of each field. This approach ensures consistency
and makes it easier to manage and evolve the configuration options.
**Key Components and Responsibilities:**
The primary responsibility of this module is to define and store these instance
configurations. Each JSON file represents a distinct Perf instance, often
corresponding to a specific project or a particular version of a project (e.g.,
a public vs. internal build, or a stable vs. experimental branch).
- **Instance-Specific Configuration Files (e.g., `android2.json`,
`chrome-public.json`):**
- **Why:** Each project or system being monitored by Perf has unique
requirements. These include where its performance data is stored (e.g.,
GCS buckets), how it's ingested (e.g., Pub/Sub topics), which Git
repository tracks its code changes, how users authenticate, and how
notifications for regressions are handled.
- **How:** These files use a JSON structure that maps directly to the
`config.InstanceConfig` Go struct.
- `URL`: The public-facing URL of the Perf instance.
- `data_store_config`: Defines the backend database (e.g., CockroachDB,
Spanner), connection strings, and parameters like `tile_size` which can
impact query performance and data retrieval efficiency. The choice
between CockroachDB and Spanner often depends on scalability needs and
existing infrastructure.
- `ingestion_config`: Specifies how performance data is brought into Perf.
This includes the `source_type` (e.g., `gcs` for Google Cloud Storage,
`dir` for local directories), the specific `sources` (e.g., GCS bucket
paths or local file paths), and Pub/Sub topics for real-time ingestion.
This section is vital for connecting Perf to the data producers.
- `git_repo_config`: Links Perf to the source code repository. This allows
Perf to correlate performance data with specific code changes (commits).
It includes the repository `url`, the `provider` (e.g., `gitiles`,
`git`), and sometimes a `commit_number_regex` to extract meaningful
commit identifiers from commit messages.
- `notify_config`: Configures how alerts and notifications are sent when
regressions are detected. This can range from `none` to `html_email`,
`markdown_issuetracker`, or `anomalygroup`. It often includes templates
for notification subjects and bodies, leveraging placeholders like `{{
.Alert.DisplayName }}` to include dynamic information.
- `auth_config`: Defines the authentication mechanism, commonly using a
header like `X-WEBAUTH-USER` for integration with existing
authentication systems.
- `query_config`: Customizes how users can query and view data, including
which parameters are available for filtering (`include_params`), default
selections, and URL value defaults to tailor the user experience. It can
also include caching configurations (e.g., using Redis) to improve query
performance by specifying `cache_config` with `level1_cache_key` and
`level2_cache_key`.
- `anomaly_config`: Contains settings related to anomaly detection, such
as `settling_time` which defines how long Perf waits before considering
new data for anomaly detection, helping to avoid flagging transient
issues.
- Other fields like `contact`, `ga_measurement_id` (for Google Analytics),
`feedback_url`, `trace_sample_proportion` (to control the volume of
detailed trace data collected), and `favorites` (for pre-defined links
on the Perf UI) further customize the instance.
- **Example Workflow (Data Ingestion and Alerting for `android2.json`):**
* **Data Production:** Android benchmarks generate performance data.
* **Data Upload:** This data is uploaded to GCS buckets specified in
`ingestion_config.source_config.sources` (e.g.,
`gs://android-perf-2/android2`).
* **Pub/Sub Notification:** A message is sent to the Pub/Sub topic
`perf-ingestion-android2-production`.
* **Perf Ingestion Service:** The Perf ingestion service, subscribed to
this topic, reads the new data file from GCS.
* **Data Processing & Storage:** Perf processes the data, associates it
with the corresponding commit from the `git_repo_config` (e.g.,
`https://android.googlesource.com/platform/superproject`), and stores it
in the CockroachDB instance defined in `data_store_config`.
* **Anomaly Detection:** Perf's anomaly detection algorithms analyze the
new data points.
* **Regression Found:** If a regression is detected based on the
`anomaly_config`.
* **Notification Sent:** A notification is generated according to
`notify_config`. For `android2.json`, this means an issue is filed in an
issue tracker (`"notifications": "markdown_issuetracker"`) with a
subject and body formatted using the provided templates, including
details like affected tests and devices.
- **`local.json`:**
- **Why:** Provides a standardized configuration for local development and
manual testing of Perf. It's designed to be self-contained and not rely
on external production services.
- **How:** It typically points the `ingestion_config` to a local directory
(`integration/data`) that contains sample data. This data is often the
same data used for unit tests, ensuring consistency between testing
environments. The database connection will also point to a local
instance.
- **`demo.json` and `demo_spanner.json`:**
- **Why:** These configurations are likely used for demonstration purposes
or for setting up small-scale, illustrative Perf instances. They
showcase Perf's capabilities with sample data.
- **How:** Similar to `local.json`, `demo.json` uses a local directory for
data ingestion (`"./demo/data/"`) and a local CockroachDB instance.
`demo_spanner.json` is analogous but configured to use Spanner as the
backend, demonstrating flexibility in data store choices. They often
include simpler `git_repo_config` pointing to public demo repositories
(e.g., `https://github.com/skia-dev/perf-demo-repo.git`). The
`favorites` section in `demo.json` shows how to add curated links to the
Perf UI.
- **`/spanner` subdirectory:**
- **Why:** This subdirectory groups configurations for Perf instances that
specifically use Google Cloud Spanner as their backend data store.
Spanner is chosen for its scalability, strong consistency, and global
distribution capabilities, making it suitable for large-scale Perf
deployments.
- **How:** Files within this directory (e.g.,
`spanner/chrome-public.json`, `spanner/skia-public.json`) will have
their `data_store_config.datastore_type` set to `"spanner"`. They often
include Spanner-specific settings or optimizations. For example,
`enable_follower_reads` might be set to `true` in `data_store_config`
for Spanner instances to distribute read load. Many of these
configurations also define `redis_config` within their
`query_config.cache_config` to further enhance query performance for
frequently accessed data.
- The `optimize_sqltracestore` flag, often set to `true` in Spanner
configurations, indicates that specific optimizations for the SQL-based
trace store are enabled, likely tailored to Spanner's characteristics.
- Configurations like `chrome-internal.json` and `chrome-public.json`
demonstrate sophisticated setups, including:
- `commit_number_regex` in `git_repo_config` to extract structured commit
positions.
- `temporal_config` for integrating with Temporal workflows for tasks like
regression grouping and bisection.
- `enable_sheriff_config` to integrate with sheriffing systems for
managing alerts.
- `trace_format: "chrome"` indicates that the performance data adheres to
the Chrome trace event format.
The choice of fields and their values within each JSON file reflects a series of
design decisions aimed at balancing flexibility, performance, and operational
manageability for each specific Perf instance. For instance, the `tile_size` in
`data_store_config` is adjusted based on expected data characteristics and query
patterns. Similarly, `trace_sample_proportion` is set to manage storage costs
and processing load while still capturing enough data for meaningful analysis.
The `notify_config` templates are crafted to provide actionable information to
developers when regressions occur.
# Module: /csv2days
## csv2days Module Documentation
### Overview
The `csv2days` module is a command-line utility designed to process CSV files
downloaded from the Perf performance monitoring system. Its primary purpose is
to simplify time-series data by consolidating multiple data points from the same
calendar day into a single representative value. This is particularly useful
when analyzing performance trends over longer periods, where daily granularity
is sufficient and finer-grained timestamps can introduce noise or unnecessary
complexity.
The core problem this module solves is the overabundance of data points when
Perf exports data at a high temporal resolution (e.g., multiple commits per
day). For certain types of analysis, this level of detail is not required and
can make it harder to discern broader trends. `csv2days` transforms such CSVs by
keeping only the first encountered data column for each unique day and
aggregating subsequent values from the same day into that single column using a
"max" aggregation strategy.
### Design and Implementation
The module operates as a streaming processor. It reads the input CSV file row by
row, processes the header to determine which columns to modify or drop, and then
transforms each subsequent data row accordingly before writing it to standard
output.
**Key Design Choices:**
1. **Command-Line Interface:** The tool is designed as a simple command-line
application for ease of integration into scripting workflows. It takes an
input file path via the `--in` flag and outputs the transformed CSV to
`stdout`. This follows common Unix philosophies for tool interoperability.
2. **Streaming Processing:** Instead of loading the entire CSV into memory,
which could be problematic for very large files, `csv2days` processes the
file line by line. This makes the tool memory-efficient.
3. **Date-Based Grouping:** The core logic revolves around identifying columns
that represent timestamps. It uses a regular expression (`datetime`) to
match RFC3339 formatted dates in the header row. The date part (YYYY-MM-DD)
of these timestamps is used for grouping.
4. **"First Seen" Column for a Day:** For each unique calendar day encountered
in the header, only the first column corresponding to that day is retained
in the output header. Subsequent columns from the same day are marked for
removal.
5. **"Max" Aggregation:** When multiple columns from the same day are
encountered in a data row, the values from these columns are aggregated. The
`csv2days` tool currently implements a "max" aggregation strategy: for the
set of values corresponding to a single day, the maximum numerical value is
chosen. If non-numerical values are encountered, the first value in the
sequence is typically used.
6. **Reverse Sorted Index Removal:** When removing columns, the indices of
columns to be skipped (`skipCols`) are sorted in reverse order. This is
crucial because removing an element from a slice shifts the indices of
subsequent elements. Processing removals from right-to-left (largest index
to smallest) ensures that the indices remain valid throughout the removal
process.
**Workflow:**
The main workflow within `transformCSV` can be visualized as follows:
```
Read Input CSV File (--in flag)
|
v
Parse Header Row
|
+----------------------------------------------------------------------+
| Identify Timestamp Columns (using RFC3339 regex) |
| For each timestamp: |
| Extract Date (YYYY-MM-DD) |
| If new date: |
| Add Date to Output Header |
| Record current column as start of a new "run" for this day |
| Else (same date as previous timestamp): |
| Mark current column for skipping (`skipCols`) |
| Increment length of current day's "run" (`runLengths`) |
| |
| Non-timestamp columns are added to Output Header as-is |
+----------------------------------------------------------------------+
|
v
Write Transformed Header to Output
|
v
Sort `skipCols` in Reverse Order
|
v
For each Data Row in Input CSV:
|
+----------------------------------------------------------------------+
| Apply "Max" Aggregation: |
| For each "run" of columns belonging to the same day (from header): |
| Find the maximum numerical value in the corresponding cells |
| Replace the first cell of the run with this max value |
+----------------------------------------------------------------------+
|
v
Remove Skipped Columns (based on `skipCols` from header processing)
|
v
Write Transformed Data Row to Output
|
v
Flush Output Buffer
```
### Key Components/Files
- **`main.go`**: This is the heart of the module.
- **`main()` function**: Handles command-line flag parsing (`--in` for the
input CSV file). It orchestrates the reading of the input file and calls
`transformCSV` to perform the core logic. Error handling and logging are
also managed here.
- **`transformCSV(input io.Reader, output io.Writer) error`**: This is the
core function responsible for the CSV transformation.
- It initializes `csv.Reader` for input and `csv.Writer` for output.
- **Header Processing**: It reads the first line (header) of the CSV. It
iterates through the header cells.
- A regular expression (`datetime = regexp.MustCompile(...)`) is used
to identify columns containing RFC3339 timestamps.
- It maintains `lastDate` to detect when a new day starts in the
header sequence.
- `skipCols` (a slice of integers) stores the indices of columns that
represent subsequent entries for an already seen day and should thus
be removed from the data rows.
- `runLengths` (a map of `int` to `int`) stores, for each column that
starts a sequence of same-day entries, how many columns belong to
that day. This is used later for aggregation. For example, if
columns 5, 6, and 7 are all for "2023-01-15", `runLengths[5]` would
be `3`.
- The output header (`outHeader`) is constructed by keeping the date
part (YYYY-MM-DD) for the first occurrence of each day and omitting
subsequent columns for the same day. Non-date columns are passed
through unchanged.
- **Data Row Processing**: It then reads the rest of the CSV file row by
row.
- `applyMaxToRuns(s []string, runLengths map[int]int) []string`: For
each "run" of columns identified in the header as belonging to the
same day, this function takes the corresponding values from the
current data row and replaces the value in the first column of that
run with the maximum of those values. The `max(s []string) string`
helper function is used here to find the maximum float value,
falling back to the first string if parsing fails.
- `removeAllIndexesFromSlices(s []string, skipCols []int) []string`:
After aggregation, this function removes the data cells
corresponding to the `skipCols` identified during header processing.
It uses `removeValueFromSliceAtIndex` repeatedly. It's crucial that
`skipCols` is sorted in reverse order for this to work correctly.
- The transformed row is then written to the output CSV.
- **Helper Functions**:
- `removeValueFromSliceAtIndex(s []string, index int) []string`: A utility
to remove an element at a specific index from a string slice.
- `max(s []string) string`: Iterates through a slice of strings, attempts
to parse them as floats, and returns the string representation of the
maximum float found. If no floats are found or parsing errors occur, it
defaults to returning the first string in the input slice. This function
underpins the aggregation logic.
- **`main_test.go`**: Contains unit tests for the `transformCSV` function.
- `TestTransformCSV_HappyPath`: Provides a simple input CSV string and the
expected output string. It then calls `transformCSV` with these and
asserts that the actual output matches the expected output. This serves
as a concrete example of the module's behavior.
- **`BUILD.bazel`**: Defines how the `csv2days` Go binary and its associated
library and tests are built using Bazel. It specifies source files,
dependencies (like `skerr`, `sklog`, `util`), and visibility.
The design decision to use `strconv.ParseFloat` and handle potential errors by
continuing or defaulting implies that the tool is somewhat lenient with
non-numeric data in columns expected to be numeric. The "max" operation will
effectively ignore non-convertible strings unless all strings in a run are
non-convertible, in which case the first string is chosen.
# Module: /demo
The `demo` module provides the necessary data and tools to showcase the
capabilities of the Perf performance monitoring system. Its primary purpose is
to offer a tangible and reproducible example of how Perf ingests and processes
performance data. This allows users and developers to understand Perf's
functionality without needing to set up a complex real-world data pipeline.
The core of this module revolves around a set of pre-generated data files and a
Go program to create them.
**Key Components:**
- **`/demo/data/` (Directory):** This directory houses the actual demo data
files in JSON format. Each file represents performance measurements
associated with a specific commit hash.
- **Why:** These static files serve as the input for a 'dir' type ingester
in a demo Perf instance. They are structured according to the
`format.Format` specification (defined in `perf/go/ingest/format`),
which Perf understands. This allows for a simple and direct way to feed
data into Perf for demonstration purposes.
- **How:** Each JSON file (e.g., `demo_data_commit_1.json`) contains a
`git_hash`, `key` (identifying the test environment like architecture
and configuration), and `results`. The `results` section includes
measurements for various tests (like "encode" and "decode") across
different units (like "ms" and "kb"). Some files also include `links`
which can point to external resources relevant to the data point or the
overall commit. The data in these files is designed to show some
variation over commits to demonstrate Perf's ability to track changes
and detect regressions/improvements. For instance, the `decode`
measurement and `encodeMemory` show a deliberate shift in values
starting from `demo_data_commit_6.json`.
- **`generate_data.go`:** This Go program is responsible for creating the JSON
data files located in the `/demo/data/` directory.
- **Why:** While the static data files are sufficient for running the
demo, this program provides the means to regenerate or modify the demo
dataset. This is crucial if the demo requirements change, if new Perf
features need to be showcased with different data patterns, or if the
underlying `format.Format` evolves. It ensures the demo data remains
relevant and can be adapted.
- **How:**
* It defines a list of Git commit hashes. These hashes are specifically
chosen from the `skia-dev/perf-demo-repo` repository, establishing a
direct link between the performance data and a version control history,
a common scenario in real-world Perf usage.
* It iterates through these hashes. For each hash: _ It programmatically
generates performance values (e.g., `encode`, `decode`, `encodeMemory`).
The generation includes some randomness (`rand.Float32()`) to make the
data appear more realistic. _ A deliberate change in the data generation
logic is introduced for commits at index 5 and onwards (e.g.,
`multiplier = 1.2`), which leads to a noticeable shift in `decode` and
`encodeMemory` values in the corresponding JSON files. This is done to
demonstrate how Perf can track and visualize such changes. _ It
populates a `format.Format` struct (from
`go.skia.org/infra/perf/go/ingest/format`) with the generated data,
including the Git hash, environment keys, and the measurement results. _
The `format.Format` struct is then marshaled into JSON with indentation
for readability. \* Finally, the JSON data is written to a file named
according to the commit sequence (e.g., `demo_data_commit_1.json`)
within the `data` subdirectory. The program uses the `runtime.Caller(0)`
function to determine its own location, ensuring that the `data`
directory is created relative to the Go file itself, making the script
more portable.
**Workflow for Demo Data Usage:**
```
generate_data.go --(generates)--> /demo/data/*.json files
|
V
Perf Ingester (type 'dir', configured to read from /demo/data/)
|
V
Perf System (stores, analyzes, and visualizes the data)
```
The demo data is specifically designed to be used in conjunction with the
`perf/configs/demo.json` configuration file and the
`https://github.com/skia-dev/perf-demo-repo.git` repository. This linkage
provides a complete, albeit simplified, end-to-end scenario for demonstrating
Perf.
# Module: /go
## Module: /go
This main module, located at `/go`, serves as the root for all Go language
components of the Perf performance monitoring system. It encompasses a wide
array of functionalities, from data ingestion and storage to analysis, alerting,
and user interface backend logic. The design promotes modularity, with specific
responsibilities delegated to sub-modules.
The system is designed to handle large volumes of performance data, track it
against code revisions, detect regressions automatically, and provide tools for
developers and performance engineers to investigate and manage performance.
### Key Design Philosophies and Architectural Choices:
1. **Modularity:** The system is broken down into numerous sub-modules (e.g.,
`/go/alerts`, `/go/ingest`, `/go/regression`, `/go/frontend`), each with a
well-defined responsibility. This promotes separation of concerns, making
the system easier to develop, test, and maintain.
2. **Interface-Based Design:** Many modules define interfaces for their core
components (e.g., `tracestore.Store`, `alerts.Store`, `regression.Store`).
This allows for different implementations to be swapped in (e.g., SQL-based
stores vs. in-memory mocks for testing) and promotes loose coupling.
3. **Configuration-Driven Behavior:** The `/go/config` module defines a
comprehensive `InstanceConfig` structure, which is loaded from a JSON file.
This configuration dictates many aspects of an instance's behavior,
including database connections, data sources, alert settings, and UI
features. This allows for flexible deployment and customization of Perf
instances.
4. **Asynchronous Processing and Workflows:** For long-running tasks like data
ingestion, regression detection, and bisection, the system leverages
asynchronous processing.
- Go routines are widely used for concurrent operations.
- The `/go/progress` module provides a mechanism for tracking and
reporting the status of such tasks to the UI.
- The `/go/workflows` module utilizes Temporal to orchestrate complex,
multi-step processes like triggering bisections and processing their
results. Temporal provides resilience and fault tolerance for these
critical operations.
5. **Data Storage and Retrieval:**
- **SQL Database:** A relational database (primarily targeting
CockroachDB, with Spanner compatibility) is the main persistence layer
for most structured data, including alert configurations (`/go/alerts`),
regression details (`/go/regression`), commit information (`/go/git`),
user favorites (`/go/favorites`), subscriptions (`/go/subscription`),
and more. The `/go/sql` module manages the database schema.
- **Trace Data (`/go/tracestore`):** Performance trace data is stored in a
tiled fashion, with inverted indexes to allow for efficient querying.
This specialized storage approach is optimized for time-series
performance metrics.
- **File Storage (GCS):** Raw ingested data files and potentially other
large artifacts are often stored in Google Cloud Storage. The `/go/file`
and `/go/filestore` modules provide abstractions for interacting with
these files.
6. **Caching:** Various caching strategies are employed to improve performance:
- In-memory LRU caches for frequently accessed data (e.g., in `/go/git`,
`/go/progress`).
- A dedicated `/go/tracecache` for trace IDs.
- The `/go/psrefresh` module manages caching of `ParamSet`s (used for UI
query builders), potentially using Redis (`/go/redis`).
- `/go/graphsshortcut` offers an in-memory cache for graph shortcuts,
especially for development.
7. **External Service Integration:**
- **Git:** The `/go/git` module interacts with Git repositories (via local
CLI or Gitiles API) to fetch commit information.
- **Issue Trackers:** Modules like `/go/issuetracker` and `/go/culprit`
integrate with issue tracking systems (e.g., Buganizer) for automated
bug filing.
- **Chrome Perf:** The `/go/chromeperf` module allows communication with
the Chrome Performance Dashboard for reporting regressions or fetching
anomaly data.
- **Pinpoint:** The `/go/pinpoint` module provides a client for the
Pinpoint bisection service.
- **LUCI Config:** The `/go/sheriffconfig` module integrates with LUCI
Config for managing alert configurations.
8. **Command-Line Tools:**
- `/go/perfserver`: The main executable for running different Perf
services (frontend, ingestion, clustering, maintenance).
- `/go/perf-tool`: A CLI for various administrative and data inspection
tasks.
- `/go/initdemo`: A tool to initialize a database for demo or development.
- `/go/ts`: A utility to generate TypeScript definitions from Go structs
for frontend type safety.
### Core Workflows (Conceptual High-Level):
1. **Data Ingestion:**
```
External Data Source (e.g., GCS event)
|
V
/go/file (Source Interface: DirSource, GCSSource) --> Raw File Data
|
V
/go/ingest/process (Orchestrator)
|
+--> /go/ingest/parser (Parses file based on /go/ingest/format) --> Extracted Traces & Metadata
|
+--> /go/git (Resolves Git hash to CommitNumber)
|
V
/go/tracestore (Writes traces, updates inverted index & ParamSets)
|
V
/go/ingestevents (Publishes event: "File Ingested")
```
2. **Regression Detection (Event-Driven Example):**
```
/go/ingestevents (Receives "File Ingested" event)
|
V
/go/regression/continuous (Controller)
|
+--> /go/alerts (Loads matching Alert configurations)
|
+--> /go/dfiter & /go/dataframe & /go/dfbuilder (Prepare DataFrames for analysis)
|
V
/go/regression/detector (Core detection logic)
|
+--> /go/clustering2 (KMeans clustering)
|
+--> /go/stepfit (Individual trace step detection)
|
V
Detected Regressions
|
+--> /go/regression (Store results using Store interface, e.g., sqlregression2store)
|
+--> /go/notify (Format & send notifications via Email, IssueTracker, Chromeperf)
|
+--> /go/workflows (MaybeTriggerBisectionWorkflow for potential bisection)
```
3. **User Interaction (Frontend Request for Graph):**
```
User in Browser (Requests graph)
|
V
/go/frontend (HTTP Handlers, e.g., graphApi)
|
+--> /go/ui/frame (ProcessFrameRequest)
| |
| +--> /go/dataframe/dfbuilder (Builds DataFrame based on query)
| | |
| | +--> /go/tracestore (Fetch trace data)
| | +--> /go/git (Fetch commit data)
| |
| +--> /go/calc (If formulas are used)
| |
| +--> /go/pivot (If pivot table requested)
| |
| +--> /go/anomalies (Fetch anomaly data to overlay)
|
V
FrameResponse (JSON data for UI) --> User in Browser
```
4. **Automated Bisection via Temporal Workflow:**
`/go/workflows.MaybeTriggerBisectionWorkflow (Triggered by significant
regression) | +--> Waits for related anomalies to group | +-->
/go/anomalygroup (Loads anomaly group details) | +--> If GroupAction ==
BISECT: | | | +--> /go/gerrit (Activity: Get commit hashes from positions) |
| | +--> Executes Pinpoint.CulpritFinderWorkflow (Child Workflow) | |
(Pinpoint performs bisection) | V | Pinpoint calls back to
/go/workflows.ProcessCulpritWorkflow | | | +--> /go/culprit (Activity:
Persist culprit & Notify user) | +--> If GroupAction == REPORT: | +-->
/go/culprit (Activity: Notify user of anomaly group)`
### Sub-Module Summaries (Illustrative, not exhaustive):
- **/go/alertfilter**: Constants for alert filtering modes (e.g., `ALL`,
`OWNER`). Ensures consistent filter definitions.
- **/go/alerts**: Manages `Alert` configurations, their storage
(`sqlalertstore`), and efficient retrieval (`ConfigProvider` with caching).
Defines how performance regressions are detected.
- **/go/anomalies**: Retrieves anomaly data, often by proxying to Chrome Perf,
with a caching layer to improve performance.
- **/go/anomalygroup**: Groups related anomalies to consolidate actions like
bug filing or bisection. Uses a gRPC service and SQL store.
- **/go/backend**: A gRPC backend service for internal, non-UI-facing APIs,
promoting stable interfaces.
- **/go/builders**: Centralized factory for creating core components (data
stores, Git client) based on instance configuration, preventing cyclic
dependencies.
- **/go/bug**: Generates URLs for reporting bugs to issue trackers using
configurable URI templates.
- **/go/calc**: (Not detailed in provided docs, but generally) Evaluates
formulas on trace data.
- **/go/chromeperf**: Client for interacting with the Chrome Performance
Dashboard API (reporting regressions, fetching anomalies).
- **/go/clustering2**: Implements k-means clustering for grouping similar
performance traces.
- **/go/config**: Defines and validates the `InstanceConfig` structure (loaded
from JSON) that governs a Perf instance.
- **/go/ctrace2**: Adapts trace data (normalization, handling missing points)
for use with k-means clustering.
- **/go/culprit**: Manages identified culprits (commits causing regressions),
their storage, and notification.
- **/go/dataframe**: Provides the `DataFrame` structure for handling
performance data in a tabular, commit-centric way, inspired by R's
dataframes.
- **/go/dfbuilder**: Constructs `DataFrame` objects from `TraceStore`,
handling query logic and data aggregation.
- **/go/dfiter**: Iterates over `DataFrame`s, typically by slicing a larger
fetched frame. Used in regression detection.
- **/go/dryrun**: Allows testing alert configurations without creating actual
alerts, simulating regression detection.
- **/go/favorites**: Manages user-saved favorite configurations/views, stored
in SQL.
- **/go/file**: Defines `File` and `Source` interfaces for abstracting file
access from different origins (local, GCS via Pub/Sub).
- **/go/filestore**: Implements `fs.FS` for local and GCS file access,
providing a unified way to read files.
- **/go/frontend**: Backend for the Perf web UI, handling HTTP requests,
rendering templates, and interacting with data stores.
- **/go/git**: Abstraction for Git repository interaction, caching commit data
in SQL, with providers for local CLI and Gitiles.
- **/go/graphsshortcut**: Manages shortcuts for collections of graph
configurations, using hashed IDs for de-duplication.
- **/go/ingest**: Orchestrates the data ingestion pipeline: reading files,
parsing formats, and writing to `TraceStore`.
- **/go/ingestevents**: Defines and handles serialization/deserialization of
ingestion completion events for PubSub.
- **/go/initdemo**: CLI tool to initialize a database (CockroachDB or Spanner
emulator) with the Perf schema.
- **/go/issuetracker**: Provides an interface and implementation for
interacting with Google Issue Tracker (Buganizer).
- **/go/kmeans**: Generic k-means clustering algorithm implementation using
interfaces for flexibility.
- **/go/maintenance**: Runs background tasks like Git repo sync, regression
migration, query cache refresh, and old data deletion.
- **/go/notify**: Framework for formatting and sending notifications (email,
issue tracker) about regressions.
- **/go/notifytypes**: Defines constants for different notification mechanisms
and data providers.
- **/go/perf-tool**: CLI for Perf administration (config management, data
inspection, database maintenance).
- **/go/perfclient**: Client for pushing performance data (typically from
trybots) to Perf's GCS ingestion endpoint.
- **/go/perfresults**: Fetches, parses, and processes performance results from
Telemetry benchmarks (Chromium).
- **/go/perfserver**: Main executable for Perf, consolidating frontend,
ingestion, clustering, and maintenance services.
- **/go/pinpoint**: Client for the Pinpoint (Chromeperf) bisection service.
- **/go/pivot**: Aggregates and summarizes trace data within a `DataFrame`
based on specified grouping criteria (like pivot tables).
- **/go/progress**: Tracks the progress of long-running backend tasks and
exposes it to the UI via HTTP polling.
- **/go/psrefresh**: Manages and caches `paramtools.ParamSet` instances (used
for UI query builders) to improve performance.
- **/go/redis**: Manages interaction with Redis for caching, primarily to
support the query UI.
- **/go/regression**: Core module for detecting, storing, and managing
performance regressions.
- **/go/samplestats**: Performs statistical analysis on sets of performance
data to identify significant changes between "before" and "after" states.
- **/go/sheriffconfig**: Manages Sheriff Configurations (alerting rules
defined in Protobuf, stored in LUCI Config), importing them into Perf.
- **/go/shortcut**: Manages shortcuts for lists of trace keys, using hashed
IDs.
- **/go/sql**: Central module for SQL database schema management (definition,
generation, validation, migration).
- **/go/stepfit**: Analyzes time-series data to detect significant changes
("steps") using various statistical algorithms.
- **/go/subscription**: Manages alerting subscriptions, defining how to react
to anomalies (e.g., bug filing details).
- **/go/tracecache**: Caches trace identifiers for specific tiles and queries
to improve performance.
- **/go/tracefilter**: Filters trace data based on hierarchical paths,
identifying "leaf" traces.
- **/go/tracesetbuilder**: Efficiently constructs `TraceSet` and
`ReadOnlyParamSet` objects from multiple, potentially disparate chunks of
trace data using a worker pool.
- **/go/tracestore**: Defines interfaces and SQL implementations for storing
and retrieving performance trace data, using a tiled storage approach.
- **/go/tracing**: Initializes and configures distributed tracing capabilities
using OpenCensus.
- **/go/trybot**: Manages performance data from trybots (pre-submit tests),
including ingestion, storage, and analysis.
- **/go/ts**: Utility to generate TypeScript definition files from Go structs
for frontend type safety.
- **/go/types**: Defines core data types used throughout Perf (e.g.,
`CommitNumber`, `TileNumber`, `Trace`).
- **/go/ui**: Handles frontend requests and prepares data for display,
bridging UI interactions with backend data sources.
- **/go/urlprovider**: Generates URLs for various pages within the Perf
application consistently.
- **/go/userissue**: Manages associations between specific data points (trace
key + commit position) and Buganizer issues.
- **/go/workflows**: Defines and implements Temporal workflows for automating
tasks like bisection triggering and culprit processing.
This comprehensive suite of modules works together to provide the Skia Perf
performance monitoring system.
# Module: /go/alertfilter
This module, `go/alertfilter`, provides constants that define different
filtering modes for alerts. These constants are used throughout the Perf
application to control which alerts are displayed or processed.
The primary motivation behind this module is to centralize the definition of
alert filtering options. By having these constants in a dedicated module, we
avoid scattering magic strings like "ALL" or "OWNER" throughout the codebase.
This improves maintainability, reduces the risk of typos, and makes it easier to
understand and modify the filtering logic. If new filtering modes are needed in
the future, they can be added here, providing a single source of truth.
**Key Components/Files:**
- **`alertfilter.go`**: This is the sole file in this module. It defines the
string constants used for alert filtering.
- **`ALL`**: This constant represents a filter that includes all alerts,
irrespective of their owner or other properties. It is used when a user
or a system process needs to view or operate on the entire set of active
alerts.
- **`OWNER`**: This constant represents a filter that includes only alerts
assigned to a specific owner. This is crucial for user-specific views
where individuals only want to see alerts relevant to their
responsibilities.
**Workflow/Usage Example:**
Imagine a user interface for viewing alerts. The user might have a dropdown to
select how they want to filter the alerts.
```
User Interface:
[Alert List]
Filter: [Dropdown: "ALL", "OWNER"]
Backend Logic:
func GetAlerts(filterMode string, userID string) []Alert {
if filterMode == alertfilter.ALL {
// Fetch all alerts from the database.
return database.GetAllAlerts()
} else if filterMode == alertfilter.OWNER {
// Fetch alerts owned by the current user.
return database.GetAlertsByOwner(userID)
}
// ... other filter modes or error handling
}
```
In this scenario, the backend uses the constants from the `alertfilter` module
to determine the correct query to execute against the database. This ensures
consistency and clarity in how filtering is applied.
# Module: /go/alerts
The `/go/alerts` module is responsible for managing alert configurations within
the Perf application. These configurations define the conditions under which
users or systems should be notified about performance regressions. The module
handles the definition, storage, retrieval, and caching of these alert
configurations.
A core design principle is the separation of concerns between defining an
alert's structure (`config.go`), providing access to these configurations
(`configprovider.go`), and persisting them (`store.go` and its SQL
implementation in `sqlalertstore`). This modularity allows for flexibility in
how alerts are stored (e.g., potentially different database backends) and
accessed.
**Key Components and Responsibilities:**
- **`config.go`**: This file defines the `Alert` struct, which is the central
data structure representing a single alert configuration.
- **Why**: It encapsulates all the parameters necessary to define an
alert, such as the query to select relevant performance traces, the
notification destination (email or issue tracker), thresholds for
triggering, clustering algorithms, and the desired action (e.g., report,
bisect).
- **How**: The `Alert` struct includes fields for:
- `IDAsString`: A string representation of the alert's unique identifier.
This is used for JSON serialization to avoid potential issues with large
integer handling in JavaScript. The `BadAlertID` and
`BadAlertIDAsAsString` constants represent an invalid/uninitialized ID.
- `Query`: A URL-encoded string that defines the criteria for selecting
traces from the performance data.
- `GroupBy`: A comma-separated list of parameter keys. If specified, the
`Query` is expanded into multiple sub-queries, one for each unique
combination of values for the `GroupBy` keys found in the data. This
allows for more granular alerting. The `GroupCombinations` and
`QueriesFromParamset` methods handle this expansion.
- `Alert`: The email address for notifications.
- `IssueTrackerComponent`: The ID of the issue tracker component to file
bugs against. A custom `SerializesToString` type is used for this field
to handle JSON serialization of the int64 component ID as a string, with
`0` serializing to `""`.
- `DirectionAsString`: Specifies whether to alert on upward (`UP`),
downward (`DOWN`), or both (`BOTH`) changes in performance. This
replaces the deprecated `StepUpOnly` boolean.
- `StateAsString`: Indicates if the alert is `ACTIVE` or `DELETED`. This
is managed internally and affects whether an alert is processed.
- `Action`: Defines what action to take when an anomaly is detected (e.g.,
`types.AlertActionReport`, `types.AlertActionBisect`).
- Other fields like `Interesting`, `Algo`, `Step`, `Radius`, `K`,
`Sparse`, `MinimumNum`, `Category` control the specifics of regression
detection and reporting.
- The file also defines enums like `Direction` and `ConfigState` and
helper functions for ID conversion and validation (`Validate`). The
`Validate` function ensures consistency, for example, that `GroupBy`
keys do not also appear in the main `Query`.
- **`store.go`**: This file defines the `Store` interface, which abstracts the
persistence mechanism for `Alert` configurations.
- **Why**: Decoupling the alert logic from the specific storage
implementation (e.g., SQL, Datastore) makes the system more adaptable
and testable.
- **How**: The `Store` interface specifies methods for:
- `Save`: Saving a new or updating an existing alert. It takes a
`SaveRequest` which includes the `Alert` configuration and an optional
`SubKey` (linking the alert to a subscription).
- `ReplaceAll`: Atomically replacing all existing alerts with a new set.
This is useful for bulk updates, often tied to configuration
subscriptions. It requires a `pgx.Tx` to ensure transactional integrity.
- `Delete`: Marking an alert as deleted.
- `List`: Retrieving alerts, with an option to include deleted ones.
Alerts are typically sorted by `DisplayName`.
- `ListForSubscription`: Retrieving all active alerts associated with a
specific subscription name.
- **`configprovider.go`**: This file implements a `ConfigProvider` that serves
`Alert` configurations, incorporating a caching layer.
- **Why**: To provide efficient and responsive access to alert
configurations, especially in a high-traffic system. Repeatedly fetching
from the underlying `Store` for every request would be inefficient.
- **How**:
- `configProviderImpl` implements the `ConfigProvider` interface.
- It maintains two internal caches (`cache_active` for active alerts and
`cache_all` for all alerts including deleted ones) using the
`configCache` struct.
- Upon initialization (`NewConfigProvider`), it performs an initial
refresh and starts a background goroutine that periodically calls
`Refresh` to update the caches from the `Store`.
- `GetAllAlertConfigs` and `GetAlertConfig` serve data from these caches.
- A `sync.RWMutex` is used to protect concurrent access to the caches.
- The `Refresh` method explicitly fetches data from the `alertStore` and
updates both caches.
- The refresh interval is configurable.
- **Submodule `sqlalertstore`**: This submodule provides a SQL-based
implementation of the `alerts.Store` interface.
- **`sqlalertstore.go`**:
- **Why**: To persist alert configurations in a relational database
(specifically CockroachDB, with Spanner compatibility).
- **How**: The `SQLAlertStore` struct holds a database connection pool
(`pool.Pool`) and a map of SQL statements.
- Alerts are stored as JSON strings in the `Alerts` table (schema
defined in `sqlalertstore/schema/schema.go`). This simplifies schema
evolution of the `Alert` struct itself, as changes to the struct
don't always require immediate SQL schema migrations, though it
makes querying based on specific alert fields harder directly in
SQL.
- `Save`: For new alerts (ID is `BadAlertIDAsAsString`), it performs
an `INSERT` and retrieves the generated ID. For existing alerts, it
performs an `UPSERT` (or an `INSERT ... ON CONFLICT DO UPDATE` for
Spanner).
- `Delete`: Marks an alert as deleted by setting its `config_state` to
`1` (representing `alerts.DELETED`) and updates `last_modified`. It
doesn't physically remove the row.
- `ReplaceAll`: Within a transaction, it first marks all existing
active alerts as deleted, then inserts the new set of alerts.
- `List` and `ListForSubscription`: Query the `Alerts` table,
deserialize the JSON `alert` column into `alerts.Alert` structs, and
sort them by `DisplayName`.
- **`spanner.go`**: Contains Spanner-specific SQL statements. This is
necessary because CockroachDB and Spanner have slightly different SQL
syntax for certain operations like UPSERTs and RETURNING clauses. The
correct set of statements is chosen in `sqlalertstore.New` based on the
`dbType`.
- **`sqlalertstore/schema/schema.go`**: Defines the Go struct
`AlertSchema` representing the `Alerts` table in the SQL database. Key
fields include `id`, `alert` (TEXT, storing the JSON serialized
`alerts.Alert`), `config_state` (INT), `last_modified` (INT, Unix
timestamp), `sub_name`, and `sub_revision`.
**Key Workflows:**
1. **Creating/Updating an Alert:**
- User/System constructs an `alerts.Alert` struct.
- `alerts.Store.Save()` is called.
- If SQL-backed:
- `sqlalertstore.Save()` serializes the `Alert` to JSON.
- If `IDAsString` is `BadAlertIDAsAsString`, an `INSERT` statement is
executed, and the new ID is populated back into the `Alert` struct.
- Otherwise, an `UPSERT` or `INSERT ... ON CONFLICT DO UPDATE`
statement is executed.
- The `ConfigProvider`'s cache will eventually be updated during its next
refresh cycle.
```
[Client/Service] -- Alert Data --> [alerts.Store.Save()]
|
v
[sqlalertstore.Save()] -- Serializes Alert to JSON --> [Database]
| (If new, DB returns ID)
<---------------------------------------
| (Updates Alert struct with ID)
v
[ConfigProvider.Refresh() periodically] --> [alerts.Store.List()]
|
v
[sqlalertstore.List()] --> [Database]
| (Reads & deserializes)
v
[ConfigProvider Cache Update]
```
2. **Retrieving All Active Alerts:**
- A service requests alert configurations via
`alerts.ConfigProvider.GetAllAlertConfigs(ctx, false)`.
- `configProviderImpl.GetAllAlertConfigs()` checks its `cache_active`.
- If the cache is up-to-date (within refresh interval), it returns the
cached `[]*Alert`.
- If the cache needs refresh (or it's the first call), the background
refresher (or an explicit `Refresh` call) would have populated it by:
- Calling `alerts.Store.List(ctx, false)`.
- Which in turn calls `sqlalertstore.List(ctx, false)`.
- `sqlalertstore` queries the database for alerts where
`config_state = 0` (ACTIVE), deserializes them, and returns the
list.
```
[Service] -- Request All Active Alerts --> [ConfigProvider.GetAllAlertConfigs(includeDeleted=false)]
| (Checks cache_active)
|
+-- [Cache Hit] ----> Returns cached []*Alert
|
+-- [Cache Miss/Stale (via periodic Refresh)]
|
v
[alerts.Store.List(includeDeleted=false)]
|
v
[sqlalertstore.List(includeDeleted=false)] -- SQL Query (WHERE config_state=0) --> [Database]
| (Reads & deserializes)
v
[Updates & Returns from Cache]
```
3. **Expanding `GroupBy` Queries:**
- When an alert with a `GroupBy` clause is processed (e.g., by the
regression detection system), `Alert.QueriesFromParamset(paramset)` is
called.
- `Alert.GroupCombinations(paramset)` is invoked to find all unique
combinations of values for the keys specified in `GroupBy` from the
provided `paramtools.ReadOnlyParamSet`.
- For each combination, a new query string is generated by taking the
original `Alert.Query` and appending the key-value pairs from the
combination.
- This results in a list of specific queries to be executed against the
trace data.
```
[Alert Processing System] -- Has Alert with GroupBy="config,arch", Query="metric=latency" & ParamSet --> [Alert.QueriesFromParamset()]
|
v
[Alert.GroupCombinations()]
| (e.g., finds {config:A, arch:X}, {config:B, arch:X})
v
[Generates specific queries:]
- "metric=latency&config=A&arch=X"
- "metric=latency&config=B&arch=X"
|
<-- Returns []string (list of queries)
```
The use of `SerializesToString` for `IssueTrackerComponent` highlights a common
challenge when interfacing Go backend systems with JavaScript frontends:
JavaScript's limitations with handling large integer IDs. Serializing them as
strings is a robust workaround.
The existence of a `mock` subdirectory with generated mocks for `Store` and
`ConfigProvider` (using `stretchr/testify/mock`) is standard Go practice,
facilitating unit testing of components that depend on these interfaces without
needing a real database or complex setup.
# Module: /go/anomalies
The `/go/anomalies` module is responsible for retrieving anomaly data. Anomalies
represent significant deviations in performance metrics. This module acts as an
intermediary between the application and the `chromeperf` service, which is the
source of truth for anomaly data. It provides an abstraction layer, potentially
including caching, to optimize anomaly retrieval.
### Key Components and Responsibilities
**1. `anomalies.go`:**
- **Purpose:** Defines the `Store` interface. This interface dictates the
contract for any component that aims to provide anomaly data. It ensures
that different implementations (e.g., a cached store or a direct passthrough
store) can be used interchangeably.
- **Why:** Separating the interface from the implementation promotes loose
coupling and testability. It allows for different strategies for fetching
anomalies without changing the consuming code.
- **Key Methods:**
- `GetAnomalies`: Retrieves anomalies for a list of trace names within a
specific commit position range. This is useful for analyzing performance
regressions or improvements tied to code changes.
- `GetAnomaliesInTimeRange`: Fetches anomalies within a given time window.
This is helpful for time-based analysis, independent of specific commit
versions.
- `GetAnomaliesAroundRevision`: Finds anomalies that occurred near a
particular revision (commit). This helps pinpoint performance changes
related to a specific code submission.
**2. `impl.go`:**
- **Purpose:** Provides a basic, non-caching implementation of the `Store`
interface. It directly forwards requests to the
`chromeperf.AnomalyApiClient`.
- **Why:** This serves as a foundational implementation. It's simple and
directly reflects the capabilities of the underlying `chromeperf` service.
It can be used when caching is not desired or not yet implemented.
- **How:** Each method in the `store` struct (the implementation of `Store`)
makes a corresponding call to the `ChromePerf` client. For example,
`GetAnomalies` calls `ChromePerf.GetAnomalies`. Error handling is included
to log failures from the `chromeperf` service. Trace names are sorted before
being passed to `chromeperf` which might be a requirement or an optimization
for the `chromeperf` API.
**3. `/go/anomalies/cache/cache.go`:**
- **Purpose:** Implements a caching layer for the `Store` interface. This is
designed to improve performance by reducing the number of direct calls to
the `chromeperf` service, which can be network-intensive.
- **Why:** Repeatedly fetching the same anomaly data can be inefficient. A
cache stores frequently accessed or recent anomalies locally, leading to
faster response times and reduced load on the `chromeperf` service.
- **How:**
- **LRU Cache:** Uses two Least Recently Used (LRU) caches: `testsCache`
for anomalies queried by trace names and commit ranges, and
`revisionCache` for anomalies queried around a specific revision. LRU
ensures that the least accessed items are evicted when the cache reaches
its `cacheSize` limit.
- **Cache Invalidation:**
- **TTL (Time-To-Live):** Cache entries have a `cacheItemTTL`. A periodic
`cleanupCache` goroutine removes entries older than this TTL. This
ensures that stale data doesn't persist indefinitely.
- **`invalidationMap`:** This map tracks trace names for which anomalies
have been modified (e.g., an alert was updated). If a trace name is in
this map, any cached anomalies for that trace are considered invalid and
will be re-fetched from `chromeperf`.
- The `invalidationMap` itself is cleared periodically
(`invalidationCleanupPeriod`) to prevent it from growing too large.
This is a trade-off: it's simpler and has lower memory overhead but
can lead to inaccuracies if a trace is invalidated and then the map
is cleared before the next fetch for that trace.
- **Metrics:** Tracks the `numEntriesInCache` to monitor cache
utilization.
- **Key Methods (`store` struct in `cache.go`):**
- `GetAnomalies`:
* Attempts to retrieve anomalies from `testsCache`.
* Checks the `invalidationMap`. If a trace is marked invalid, it's treated
as a cache miss.
* For any cache misses or invalidated traces, it fetches the data from
`as.ChromePerf.GetAnomalies`.
* Populates the `testsCache` with newly fetched data. `Client Request
(traceNames, startCommit, endCommit) | v [Cache Store] -- GetAnomalies()
| +---------------------------------+ | For each traceName: | | 1. Check
testsCache | ----> Cache Hit? -----> Add to Result | (Key:
trace:start:end) | | | 2. Check invalidationMap | No (Cache Miss or
Invalidated) +---------------------------------+ | |
(traceNamesMissingFromCache) | v | [ChromePerf Client] -- GetAnomalies()
-----------+ | v [Cache Store] -- Add new data to testsCache | v Return
Combined Result`
- `GetAnomaliesInTimeRange`: This method currently bypasses the cache and
directly calls `as.ChromePerf.GetAnomaliesTimeBased`. The decision to
not cache time-based queries might be due to the potentially large and
less frequently reused nature of such requests, or it might be a feature
planned for later.
- `GetAnomaliesAroundRevision`: Similar to `GetAnomalies`, it first checks
`revisionCache`. If it's a miss, it fetches from
`as.ChromePerf.GetAnomaliesAroundRevision` and updates the cache.
- `InvalidateTestsCacheForTraceName`: Adds a `traceName` to the
`invalidationMap`. This is likely called when an external event (e.g.,
user updating an anomaly in Chrome Perf) indicates that the cached data
for this trace is no longer accurate.
**4. `/go/anomalies/mock/Store.go`:**
- **Purpose:** Provides a mock implementation of the `Store` interface,
generated using the `testify/mock` library.
- **Why:** Essential for unit testing. It allows other components that depend
on the `anomalies.Store` to be tested in isolation, without needing a real
`chromeperf` instance or a fully functional cache. Developers can define
expected calls and return values for the mock store.
- **How:** It's an auto-generated file. The `mock.Mock` struct from
`stretchr/testify` is embedded, providing methods like `On()`, `Return()`,
and `AssertExpectations()` to control and verify the mock's behavior during
tests.
### Design Decisions and Rationale
- **Interface-based Design (`anomalies.Store`):** This is a common and robust
pattern in Go. It allows for flexibility in how anomalies are fetched and
managed. For example, a new caching strategy or a different backend data
source could be implemented without affecting code that consumes anomalies,
as long as the new implementation adheres to the `Store` interface.
- **Caching Strategy (`cache.go`):**
- **LRU:** A good general-purpose caching algorithm when memory is limited
and recent/frequently accessed items are more likely to be requested
again.
- **TTL for Cache Items:** Prevents indefinitely storing stale data.
- **`invalidationMap`:** A pragmatic approach to handling external data
modifications. While not perfectly accurate (invalidates all anomalies
for a trace even if only one changed, and susceptible to the
`invalidationCleanupPeriod` timing), it's simpler and less
memory-intensive than more granular invalidation schemes. This suggests
a balance was struck between accuracy, complexity, and resource usage.
- **Separate Caches (`testsCache`, `revisionCache`):** Likely done because
the query patterns and cache keys for these two types of requests are
different. `testsCache` uses a composite key
(`traceName:startCommit:endCommit`), while `revisionCache` uses the
`revision` number as the key.
- **Error Handling:** The implementations generally log errors from
`chromeperf` but often return an empty `AnomalyMap` or `nil` slice to the
caller in case of an error from the underlying service. This design choice
means that callers might receive no data instead of an error, simplifying
the caller's error handling logic but potentially obscuring issues if not
monitored through logs.
- **Sorting Trace Names:** Before calling `chromeperf.GetAnomalies` or
`chromeperf.GetAnomaliesTimeBased`, the list of `traceNames` is sorted. This
could be a requirement of the `chromeperf` API for deterministic behavior,
or an optimization to improve `chromeperf`'s internal processing or caching.
- **Tracing (`go.opencensus.io/trace`):** Spans are added to some methods
(`GetAnomaliesInTimeRange`, `GetAnomaliesAroundRevision`). This is crucial
for observability, allowing developers to track the performance and flow of
requests through the system, especially in a distributed environment.
### Workflows
**Typical Anomaly Retrieval (with Cache):**
1. A service needs anomalies (e.g., for displaying on a dashboard).
2. It calls one of the `GetAnomalies*` methods on an `anomalies.Store` instance
(which is likely the cached `store` from `cache.go`).
3. **Cache Check:**
- The cached `store` first checks its internal LRU cache(s) (`testsCache`
or `revisionCache`) for the requested data.
- For `GetAnomalies`, it also consults the `invalidationMap` to see if any
relevant traces have been marked as stale.
4. **Cache Hit:** If valid data is found in the cache, it's returned directly.
`Caller -> anomalies.Store.GetAnomalies(traces, range) | v
Cache.GetAnomalies() | +--> Check testsCache (e.g., trace1:100:200) -> Found
& Valid | +--> Check testsCache (e.g., trace2:100:200) -> Not Found or
Invalid | Return cached data for trace1`
5. **Cache Miss / Stale Data:** If data is not in the cache or is marked stale: - The cached `store` makes a network request to the
`chromeperf.AnomalyApiClient`. - The response from `chromeperf` is received. - This new data is added to the LRU cache for future requests. - The data is returned to the caller. `Caller ->
anomalies.Store.GetAnomalies(traces, range) | v Cache.GetAnomalies() |
+--> Check testsCache (e.g., trace1:100:200) -> Found & Valid | +-->
Check testsCache (e.g., trace2:100:200) -> Not Found or Invalid | | |
(Data for trace1) v +---------------------------> [ ChromePerf API ] --
GetAnomalies(trace2, range) | v Cache.Add(trace2_data) | v Combine
trace1_data & trace2_data | v Return to Caller`
**Cache Invalidation Workflow:**
1. An external event occurs (e.g., a user triages an anomaly in the Chrome Perf
UI, which modifies its state).
2. A mechanism (not detailed within this module, but implied) detects this
change.
3. This mechanism calls `cache.store.InvalidateTestsCacheForTraceName(ctx,
"affected_trace_name")`.
4. The `affected_trace_name` is added to the `invalidationMap` in the
`cache.store`.
5. **Next `GetAnomalies` call for `affected_trace_name`:**
- Even if `testsCache` contains an entry for this trace and range, the
presence of `affected_trace_name` in `invalidationMap` will cause a
cache miss.
- Data will be re-fetched from `chromeperf`.
- The `invalidationMap` entry for `affected_trace_name` typically remains
until the `invalidationMap` is periodically cleared.
This module effectively decouples the rest of the Perf application from the
direct complexities of interacting with `chromeperf` for anomaly data, offering
performance benefits through caching and a consistent interface for data
retrieval.
# Module: /go/anomalygroup
The `anomalygroup` module is designed to group related anomalies (regressions in
performance metrics) together. This grouping allows for consolidated actions
like filing a single bug report for multiple related regressions or triggering a
single bisection job to find the common culprit for a set of anomalies. This
approach aims to reduce noise and improve the efficiency of triaging performance
regressions.
The core idea is to identify anomalies that share common characteristics, such
as the subscription (alert configuration), benchmark, and commit range. When a
new anomaly is detected, the system attempts to find an existing group that
matches these criteria. If a suitable group is found, the new anomaly is added
to it. Otherwise, a new group is created.
The module defines a gRPC service for managing anomaly groups, a storage
interface for persisting group data, and utilities for processing regressions
and interacting with the grouping logic.
### Key Components and Responsibilities
#### `store.go`: Anomaly Group Storage Interface
The `store.go` file defines the `Store` interface, which outlines the contract
for persisting and retrieving anomaly group data. This abstraction allows for
different storage backends (e.g., SQL databases) to be used.
**Key Responsibilities:**
- **Creating new anomaly groups:** When a new anomaly doesn't fit into an
existing group, a new group record needs to be created. This involves
storing metadata about the group, such as the subscription details,
benchmark, initial commit range, and the action to be taken (e.g., REPORT,
BISECT).
- **Loading anomaly groups:** Retrieving group information by its unique ID is
essential for processing and taking actions on the group.
- **Finding existing groups:** This is a crucial part of the grouping logic.
When a new anomaly is detected, the store is queried to find existing groups
that match criteria like subscription, revision, domain (master), benchmark,
commit range, and action type.
- **Updating anomaly groups:** Groups are dynamic. As new anomalies are added,
or as actions are taken (e.g., bisection started, bug filed), the group
record needs to be updated. This includes:
- Adding new anomaly IDs to the group.
- Adding culprit commit IDs once a bisection identifies them.
- Storing the ID of a bisection job associated with the group.
- Storing the ID of a reported issue (bug) associated with the group.
The `Store` interface ensures that the core logic for anomaly grouping is
decoupled from the specific implementation of data persistence.
#### `sqlanomalygroupstore/sqlanomalygroupstore.go`: SQL-backed Anomaly Group Store
This file provides a concrete implementation of the `Store` interface using a
SQL database (specifically designed with CockroachDB and Spanner in mind).
**Implementation Details:**
- **Schema:** The SQL schema for anomaly groups is defined in
`sqlanomalygroupstore/schema/schema.go`. It includes fields for the group
ID, creation time, list of anomaly IDs, metadata (stored as JSONB), common
commit range, action type, and associated IDs for bisections, issues, and
culprits.
- **Database Operations:**
- `Create`: Inserts a new row into the `AnomalyGroups` table. It takes
parameters like subscription details, benchmark, commit range, and
action, and stores them. The group metadata (subscription name,
revision, domain, benchmark) is marshaled into a JSON string before
insertion.
- `LoadById`: Selects an anomaly group from the database based on its ID.
It retrieves core attributes of the group.
- `UpdateBisectID`, `UpdateReportedIssueID`, `AddAnomalyID`,
`AddCulpritIDs`: These methods execute SQL UPDATE statements to modify
specific fields of an existing anomaly group record. They handle array
appends for lists like `anomaly_ids` and `culprit_ids`, with specific
syntax considerations for different SQL databases (e.g., Spanner's
`COALESCE` for array concatenation).
- `FindExistingGroup`: Constructs a SQL SELECT query with WHERE clauses to
match the provided criteria (subscription, revision, domain, benchmark,
commit range overlap, and action). This allows finding groups that a new
anomaly might belong to.
**Design Choices:**
- **UUIDs for IDs:** Using UUIDs for group IDs, anomaly IDs, and culprit IDs
ensures global uniqueness.
- **JSONB for Metadata:** Storing `group_meta_data` as JSONB provides
flexibility in the metadata stored without requiring schema changes for
minor additions.
- **Array Columns:** Storing `anomaly_ids` and `culprit_ids` as array types in
the database is a natural way to represent lists of associated entities.
- **Database Type Abstraction:** While targeting SQL, there are minor
conditional logic snippets (e.g., for array appending in Spanner vs.
CockroachDB) to handle database-specific syntax, indicated by `dbType`
checks.
#### `service/service.go`: gRPC Service Implementation
This file implements the `AnomalyGroupServiceServer` interface defined by the
protobuf definitions in `proto/v1/anomalygroup_service.proto`. It acts as the
entry point for external systems to interact with the anomaly grouping
functionality.
**Responsibilities:**
- **Exposing Store Operations via gRPC:** The service methods largely delegate
to the corresponding methods of the `anomalygroup.Store` interface. For
example, `CreateNewAnomalyGroup` calls `anomalygroupStore.Create`.
- **Handling gRPC Requests and Responses:** It translates incoming gRPC
requests into calls to the store and formats the store's output into gRPC
responses.
- **`FindTopAnomalies` Logic:** This method involves more than a simple store
passthrough.
1. It loads the specified anomaly group.
2. It retrieves all regressions (anomalies) associated with that group
using the `regression.Store`.
3. It sorts these regressions based on the percentage change in their
median values (from `median_before` to `median_after`).
4. It formats the top N regressions (or all if N is not specified or is too
large) into the `ag.Anomaly` protobuf message format, extracting
relevant paramset values (bot, benchmark, story, measurement, stat).
- **`FindIssuesFromCulprits` Logic:**
1. Loads the specified anomaly group.
2. Retrieves the culprit IDs associated with the group.
3. Uses the `culprit.Store` to get the details of these culprits.
4. For each culprit, it checks its `GroupIssueMap` to find any issue IDs
that are specifically associated with the given anomaly group ID. This
allows correlation between a group (potentially containing multiple
anomalies that led to a bisection) and the issues filed for the culprits
found by that bisection.
**Design Choices:**
- **Dependency Injection:** The service takes instances of
`anomalygroup.Store`, `culprit.Store`, and `regression.Store` as
dependencies, promoting testability and decoupling.
- **Metric Collection:** It increments a counter (`newGroupCounter`) whenever
a new group is created, allowing for monitoring of the system's behavior.
#### `proto/v1/anomalygroup_service.proto`: Protocol Buffer Definitions
This file defines the gRPC service `AnomalyGroupService` and the message types
used for requests and responses. This is the contract for how clients interact
with the anomaly grouping system.
**Key Messages:**
- `AnomalyGroup`: Represents a group of anomalies, including its ID, the
action to take, lists of associated anomaly and culprit IDs, reported issue
ID, and metadata like subscription and benchmark names.
- `Anomaly`: Represents a single regression, including its start and end
commit positions, a `paramset` (key-value pairs describing the test),
improvement direction, and median values before and after the regression.
- `GroupActionType`: An enum defining the possible actions for a group
(NOACTION, REPORT, BISECT).
- Request/Response Messages: Specific messages for each RPC method (e.g.,
`CreateNewAnomalyGroupRequest`, `FindExistingGroupsResponse`).
**Purpose:**
- Defines a clear, language-agnostic API for the service.
- Ensures type safety and structured data exchange.
#### `notifier/anomalygroupnotifier.go`: Anomaly Group Notifier
This component implements the `notify.Notifier` interface. It's invoked when a
new regression is detected by the alerting system. Its primary role is to
integrate the regression detection with the anomaly grouping logic.
**Workflow when `RegressionFound` is called:**
1. Receive details of a newly detected regression (commit information, alert
configuration, cluster summary, trace data, regression ID).
2. Extract the `paramset` from the trace data.
3. Validate the `paramset` to ensure it contains required keys (e.g., master,
bot, benchmark, test, subtest_1). This is important because the grouping and
subsequent actions (like bisection) rely on these parameters.
4. Determine the `testPath` from the `paramset`. This path is used in finding
or creating anomaly groups.
5. Call `grouper.ProcessRegressionInGroup` (which eventually calls
`utils.ProcessRegression`) to handle the grouping logic for this new
regression.
**Design Choices:**
- **Interface Implementation:** Adheres to the `notify.Notifier` interface,
allowing it to be plugged into the existing notification pipeline of the
performance monitoring system.
- **Delegation to `AnomalyGrouper`:** It delegates the core grouping logic to
an `AnomalyGrouper` instance (typically `utils.AnomalyGrouperImpl`). This
keeps the notifier focused on the integration aspect.
- **Handling of Summary Traces:** It explicitly ignores regressions found on
summary-level traces (traces representing an aggregation of multiple
specific tests), as anomaly grouping is typically more meaningful for
specific test cases.
#### `utils/anomalygrouputils.go`: Anomaly Grouping Utilities
This file contains the core logic for processing a new regression and
integrating it into an anomaly group.
**`ProcessRegression` Function - Key Steps:**
1. **Synchronization:** Uses a `sync.Mutex` (`groupingMutex`). This is a
critical point: it aims to prevent race conditions when multiple regressions
are processed concurrently, especially around creating new groups. _However,
the comment notes that with multiple containers, this mutex might not be
sufficient and needs review._
2. **Client Initialization:** Creates an `AnomalyGroupServiceClient` to
communicate with the gRPC service.
3. **Find Existing Group:** Calls the `FindExistingGroups` gRPC method to see
if the new anomaly fits into any current groups based on subscription,
revision, action type, commit range overlap, and test path.
4. **Group Creation or Update:** - **If no existing group is found:** - Calls `CreateNewAnomalyGroup` to create a new group. - Calls `UpdateAnomalyGroup` to add the current `anomalyID` to this
newly created group. - **Triggers a Temporal Workflow:** Initiates a
`MaybeTriggerBisection` workflow. This workflow is responsible for
deciding whether to start a bisection or file a bug based on the
group's action type and other conditions. `Regression Detected -->
FindExistingGroups | +-- No Group Found --> CreateNewAnomalyGroup
--> UpdateAnomalyGroup (add anomaly) --> Start Temporal Workflow
(MaybeTriggerBisection)` - **If existing group(s) are found:** - For each matching group: - Calls `UpdateAnomalyGroup` to add the current `anomalyID` to that
group. - Calls `FindIssuesToUpdate` to determine if any existing bug reports
(either the group's own `ReportedIssueId` or issues linked via
culprits) should be updated with information about this new anomaly. - If issues are found, it uses the `issuetracker` to add a comment to
each relevant issue. `Regression Detected --> FindExistingGroups |
+-- Group(s) Found --> For each group: | +-- UpdateAnomalyGroup (add
anomaly) +-- FindIssuesToUpdate --> If issues exist --> Add Comment
to Issue(s)`
5. **Return Group ID(s):** Returns a comma-separated string of group IDs the
anomaly was associated with.
**`FindIssuesToUpdate` Function:**
This helper determines which existing issue tracker IDs should be updated with
information about a new anomaly being added to a group.
- If the `group_action` is `REPORT` and `reported_issue_id` is set on the
group, that issue ID is returned.
- If the `group_action` is `BISECT`, it calls the `FindIssuesFromCulprits`
gRPC method. This method looks up culprits associated with the group and
then checks if those culprits have specific issues filed for them in the
context of _this particular group_. This is important because a single
culprit (commit) might be associated with multiple anomaly groups, and each
might have its own context or bug report.
**Design Choices:**
- **Centralized Grouping Logic:** This package encapsulates the
decision-making process of whether to create a new group or add to an
existing one.
- **Temporal Workflow Integration:** Offloads the decision and execution of
bisection or bug filing to a Temporal workflow. This makes the process
asynchronous and more resilient.
- **Issue Tracker Interaction:** Directly interacts with the issue tracker to
update existing bugs, keeping them relevant as new, related anomalies are
found.
### Mocking Strategy
The module extensively uses mocks for testing:
- **`mocks/Store.go`:** A mock implementation of the `anomalygroup.Store`
interface, generated by `testify/mock`. Used in `service/service_test.go`.
- **`proto/v1/mocks/AnomalyGroupServiceServer.go`:** A mock for the gRPC
server interface `AnomalyGroupServiceServer`, generated by `testify/mock`
(with manual adjustments noted in the file). Used by clients or other
services that might call this gRPC service.
- **`utils/mocks/AnomalyGrouper.go`:** A mock for the `AnomalyGrouper`
interface, used in `notifier/anomalygroupnotifier_test.go`.
This approach allows for unit testing components in isolation by providing
controlled behavior for their dependencies.
### Overall Workflow Example (Simplified)
1. **Anomaly Detection:** Perf system detects a new regression (anomaly).
2. **Notification:** `AnomalyGroupNotifier.RegressionFound` is called.
3. **Preprocessing:** The notifier extracts `paramset`, validates it, and
derives `testPath`.
4. **Grouping Logic (`utils.ProcessRegression`):**
- The system queries `AnomalyGroupService.FindExistingGroups` using the
anomaly's properties (subscription, commit range, test path, action
type).
- **Scenario A: No existing group:**
- `AnomalyGroupService.CreateNewAnomalyGroup` is called.
- The new anomaly ID is added to this group via
`AnomalyGroupService.UpdateAnomalyGroup`.
- A Temporal workflow (`MaybeTriggerBisection`) is started for this
new group.
- **Scenario B: Existing group(s) found:**
- The new anomaly ID is added to each matching group via
`AnomalyGroupService.UpdateAnomalyGroup`.
- `utils.FindIssuesToUpdate` is called for each group.
- If the group's action is `REPORT` and it has a `ReportedIssueId`,
that issue is updated.
- If the group's action is `BISECT`,
`AnomalyGroupService.FindIssuesFromCulprits` is called. If it
returns issue IDs associated with this group's culprits, those
issues are updated.
5. **Temporal Workflow (`MaybeTriggerBisection` - not detailed here but
implied):**
- Based on the group's `GroupActionType`:
- If `BISECT`: It might check conditions (e.g., number of anomalies in
the group) and then trigger a bisection job (e.g., Pinpoint) using
`AnomalyGroupService.FindTopAnomalies` to pick the most significant
anomaly. The bisection ID is then saved to the group.
- If `REPORT`: It might check conditions and then file a bug using
`AnomalyGroupService.FindTopAnomalies` to gather details. The issue
ID is saved to the group.
This system aims to automate and streamline the handling of performance
regressions by intelligently grouping them and initiating appropriate follow-up
actions.
# Module: /go/backend
The `/go/backend` module implements a gRPC-based backend service for Perf. This
service is designed to host API endpoints that are not directly user-facing,
promoting a separation of concerns and enabling better scalability and
maintainability.
**Core Purpose and Design Philosophy:**
The primary motivation for this backend service is to create a stable, internal
API layer. This decouples user-facing components (like the frontend) from the
direct implementation details of various backend tasks. For instance, if Perf
needs to trigger a Pinpoint job, the frontend doesn't interact with Pinpoint or
a workflow engine like Temporal directly. Instead, it makes a gRPC call to an
endpoint on this backend service. The backend service then handles the
interaction with the underlying system (e.g., Temporal).
This design offers several advantages:
- **Interface Stability:** If the underlying implementation for a task changes
(e.g., replacing Temporal with another workflow orchestrator), the gRPC
contract exposed by the backend service can remain the same. This minimizes
changes required in calling services.
- **Load Offloading:** Computationally intensive operations that might
otherwise burden the frontend can be delegated to this backend service.
Examples include dry-running regression detection.
- **Centralized Internal Logic:** It provides a dedicated place for internal,
non-UI-facing business logic.
**Key Components and Responsibilities:**
- **`backend.go`**: This is the heart of the backend service.
- **`Backend` struct:** Encapsulates the state and configuration of the
backend application, including gRPC server settings, ports, and loaded
configuration.
- **`BackendService` interface:** Defines a contract for any service that
wishes to be hosted by this backend. Each such service must provide its
gRPC service descriptor, registration logic, and an authorization
policy. This interface-based approach allows for modular addition of new
functionalities.
- The `GetAuthorizationPolicy()` method returns a
`shared.AuthorizationPolicy` which specifies whether unauthenticated
access is allowed and which user roles are authorized to call the
service or specific methods within it.
- `RegisterGrpc()` is responsible for registering the specific gRPC
service implementation with the main gRPC server.
- `GetServiceDescriptor()` provides metadata about the gRPC service.
- **`initialize()` function:** This is a crucial setup function. It:
- Initializes common application components (like Prometheus metrics).
- Loads and validates the application configuration (from a JSON file,
e.g., `demo.json`).
- Instantiates various data stores (for anomaly groups, culprits,
subscriptions, regressions) by using builder functions that typically
read connection details from the loaded configuration. This allows for
flexibility in choosing data store implementations (e.g., Spanner,
CockroachDB).
- Sets up a culprit notifier, which is responsible for sending
notifications about identified culprits.
- Initializes a Temporal client if the `NotifyConfig.Notifications` is set
to `AnomalyGrouper`, as this indicates that anomaly grouping workflows
managed by Temporal are in use.
- Dynamically configures and registers all `BackendService`
implementations. This involves setting up authorization rules based on
the policy defined by each service and then registering their gRPC
handlers.
- Starts listening for gRPC connections on the configured port.
- **`configureServices()` and `registerServices()`:** These helper
functions iterate over the list of `BackendService` implementations to
set up authorization and register them with the main gRPC server.
- **`configureAuthorizationForService()`:** This function applies the
authorization policies defined by each individual service to the gRPC
server's authorization policy. It uses `grpcsp.ServerPolicy` to define
which roles can access the service or specific methods.
- **`New()` constructor:** Creates and initializes a new `Backend`
instance. It takes various store implementations and a notifier as
arguments, allowing for dependency injection, particularly useful for
testing. If these are `nil`, they are typically created within
`initialize()` based on the configuration.
- **`ServeGRPC()` and `Serve()`:** These methods start the gRPC server and
block until it's shut down.
- **`Cleanup()`:** Handles graceful shutdown of the gRPC server.
- **`pinpoint.go`**: This file defines a wrapper for the actual Pinpoint
service implementation (which resides in `pinpoint/go/service`).
- **`pinpointService` struct:** Implements the `BackendService` interface.
- **`NewPinpointService()`:** Creates a new instance, taking a Temporal
provider and a rate limiter as arguments. This indicates that Pinpoint
operations might be rate-limited and potentially involve Temporal
workflows.
- It defines an authorization policy requiring users to have at least
`roles.Editor` to access Pinpoint functionalities. This is a good
example of how specific services define their own access control rules.
- **`shared/authorization.go`**:
- **`AuthorizationPolicy` struct:** A simple struct used by
`BackendService` implementations to declare their authorization
requirements. This includes whether unauthenticated access is permitted,
a list of roles authorized for the entire service, and a map for
method-specific role authorizations. This promotes a consistent way for
services to define their security posture.
- **`client/backendclientutil.go`**: This utility file provides helper
functions for creating gRPC clients to connect to the backend service itself
(or specific services hosted by it).
- **`getGrpcConnection()`:** Abstracts the logic for establishing a gRPC
connection. It handles both insecure (typically for local
development/testing) and secure connections. For secure connections, it
uses TLS (with `InsecureSkipVerify: true` as it's intended for internal
GKE cluster communication) and OAuth2 for authentication, obtaining
tokens for the service account running the client process.
- **`NewPinpointClient()`, `NewAnomalyGroupServiceClient()`,
`NewCulpritServiceClient()`:** These are factory functions that simplify
the creation of typed gRPC clients for the specific services hosted on
the backend. They first check if the backend service is
configured/enabled before attempting to create a connection. This
pattern makes it easy for other internal services to consume the APIs
provided by this backend.
- **`backendserver/main.go`**: This is the entry point for the backend server
executable.
- It uses the `urfave/cli` library to define a command-line interface.
- The `run` command initializes and starts the `Backend` service using the
`backend.New()` constructor and then calls `b.Serve()`.
- It primarily parses command-line flags (defined in
`config.BackendFlags`) and passes them to the `backend` package. It
doesn't instantiate stores or notifiers directly, relying on the
`backend.New` (and subsequently `initialize`) to create them based on
the loaded configuration if `nil` is passed.
**Workflow Example: Handling a gRPC Request**
1. A client (e.g., the Perf frontend or another internal service) uses a
generated gRPC client stub (potentially created with helpers from
`client/backendclientutil.go`) to make a call to a specific method on a
service hosted by the backend (e.g., `Pinpoint.ScheduleJob`).
2. The gRPC request arrives at the `Backend` server's listener (`b.lisGRPC`).
3. The `grpc.Server` routes the request to the appropriate service
implementation (e.g., `pinpointService`).
4. **Authentication/Authorization (via `grpcsp.ServerPolicy`):** Before the
service method is executed, the `UnaryInterceptor` configured in
`backend.go` (which uses `b.serverAuthPolicy`) intercepts the call.
`Incoming gRPC Request --> UnaryInterceptor (grpcsp) | V Check Auth Policy
for Service/Method (defined by pinpointService.GetAuthorizationPolicy()) | V
Allow/Deny ----> Yes: Proceed to service method No: Return error`
5. If authorized, the corresponding method on the `pinpointService` (which
delegates to the actual `pinpoint_service.PinpointServer` implementation) is
invoked.
6. The service method performs its logic (e.g., interacting with Temporal to
schedule a Pinpoint job, querying data stores).
7. A response is sent back to the client.
**Configuration and Initialization:**
The system relies heavily on a configuration file (specified by
`flags.ConfigFilename`, often `demo.json` for local development as seen in
`backend_test.go` and `testdata/demo.json`). This file dictates:
- Data store connection strings and types (`data_store_config`).
- Notification settings (`notify_config`).
- The backend service's own host URL (`backend_host_url`), which it might use
if it needs to call itself or if other components need to discover it.
- Temporal configuration (`temporal_config` - though not explicitly in
`demo.json`, it's checked in `backend.go`).
The `initialize` function in `backend.go` is responsible for parsing this
configuration and setting up all necessary dependencies like database
connections, the Temporal client, and the culprit notifier. The use of builder
functions (e.g., `builders.NewAnomalyGroupStoreFromConfig`) allows the system to
be flexible with regard to the actual implementations of these components, as
long as they conform to the required interfaces.
This backend module serves as a crucial intermediary, enhancing the robustness
and maintainability of the Perf system by providing a well-defined internal API
layer.
# Module: /go/bug
The `go/bug` module is designed to facilitate the creation of URLs for reporting
bugs or regressions identified within the Skia performance monitoring system.
Its primary purpose is to dynamically generate these URLs based on a predefined
template and specific details about the identified issue. This approach allows
for flexible integration with various bug tracking systems, as the URL structure
can be configured externally.
**Core Functionality and Design:**
The module centers around the concept of URI templates. Instead of hardcoding
URL formats for specific bug trackers, it uses a template string that contains
placeholders for relevant information. This makes the system adaptable to
changes in bug tracker URL schemes or the adoption of new trackers without
requiring code modifications.
The key function, `Expand`, takes a URI template and populates it with details
about the regression. These details include:
1. **`clusterLink`**: A URL pointing to the specific performance data cluster
that exhibits the regression. This provides direct context for anyone
investigating the bug.
2. **`c provider.Commit`**: Information about the specific commit suspected of
causing the regression. This includes the commit's URL, allowing for easy
navigation to the code change. The use of the `provider.Commit` type from
`perf/go/git/provider` indicates an integration with a system that can
furnish commit details.
3. **`message`**: A user-provided message describing the regression. This
allows the reporter to add specific observations or context.
The `Expand` function utilizes the `gopkg.in/olivere/elastic.v5/uritemplates`
library to perform the actual substitution of placeholders in the template
string with the provided values. This library handles URL encoding of the
substituted values, ensuring the generated URL is valid.
**Key Components/Files:**
- **`bug.go`**: This file contains the core logic for expanding URI templates.
- `Expand(uriTemplate string, clusterLink string, c provider.Commit,
message string) string`: This is the primary function responsible for
generating the bug reporting URL. It takes the template and the
contextual information as input and returns the fully formed URL. If the
template expansion fails (e.g., due to a malformed template), it logs an
error using `go.skia.org/infra/go/sklog` and returns an empty string or
a partially formed URL depending on the nature of the error.
- `ExampleExpand(uriTemplate string) string`: This function serves as a
utility or example for demonstrating how to use the `Expand` function.
It calls `Expand` with pre-defined example data for the cluster link,
commit, and message. This can be useful for testing the template
expansion logic or for providing a quick way to see how a given template
would be populated.
- **`bug_test.go`**: This file contains unit tests for the functionality in
`bug.go`.
- `TestExpand(t *testing.T)`: This test function verifies that the
`Expand` function correctly substitutes the provided values into the URI
template and produces the expected URL. It uses the
`github.com/stretchr/testify/assert` library for assertions, ensuring
that the generated URL matches the anticipated output, including proper
URL encoding.
**Workflow:**
A typical workflow involving this module would be:
1. **Configuration**: An external system (e.g., the Perf frontend) is
configured with a URI template for the desired bug tracking system. This
template will contain placeholders like `{cluster_url}`, `{commit_url}`, and
`{message}`. Example Template:
`https://bugtracker.example.com/new?summary=Regression%20Found&description=Regression%20details:%0ACluster:%20{cluster_url}%0ACommit:%20{commit_url}%0AMessage:%20{message}`
2. **Regression Identification**: A user or an automated system identifies a
performance regression.
3. **Information Gathering**: The system gathers the necessary information:
- The URL to the Perf cluster graph showing the regression.
- Details of the commit suspected to have introduced the regression.
- An optional message from the user.
4. **URL Generation**: The `Expand` function in `go/bug` is called with the
configured URI template and the gathered information.
```
template := "https://bugtracker.example.com/new?summary=Regression%20Found&description=Cluster:%20{cluster_url}%0ACommit:%20{commit_url}%0AMessage:%20{message}"
clusterURL := "https://perf.skia.org/t/?some_params"
commitData := provider.Commit{URL: "https://skia.googlesource.com/skia/+show/abcdef123"}
userMessage := "Significant drop in frame rate on TestXYZ."
bugReportURL := bug.Expand(template, clusterURL, commitData, userMessage)
```
5. **Redirection/Display**: The generated `bugReportURL` is then presented to
the user, who can click it to navigate to the bug tracker with the
pre-filled information.
This design decouples the bug reporting logic from the specifics of any single
bug tracking system, promoting flexibility and maintainability. The use of a
standard URI template expansion library ensures robustness in URL generation.
# Module: /go/builders
The `builders` module is responsible for constructing various core components of
the Perf system based on instance configuration. This centralized approach to
object creation prevents cyclical dependencies that could arise if configuration
objects were directly responsible for building the components they configure.
The module acts as a factory, taking an `InstanceConfig` and returning fully
initialized and operational objects like data stores, file sources, and caches.
The primary design goal is to decouple the configuration of Perf components from
their instantiation. This allows for cleaner dependencies and makes it easier to
manage the lifecycle of different parts of the system. For example, a
`TraceStore` needs a database connection, but the `InstanceConfig` that defines
the database connection string shouldn't also be responsible for creating the
`TraceStore` itself. The `builders` module bridges this gap.
Key components and their instantiation logic:
- **`builders.go`**: This is the central file containing all the builder
functions.
- **Database Pool (`NewDBPoolFromConfig`)**: This function is crucial as
many other components rely on a database connection. It establishes a
connection pool to the configured database (e.g., CockroachDB, Spanner).
- **Why**: A connection pool is used to manage database connections
efficiently, reusing existing connections to reduce the overhead of
establishing new ones for each request.
- **How**: It parses the connection string from the `InstanceConfig`,
configures pool parameters like maximum and minimum connections, and
sets up a logging adapter (`pgxLogAdaptor`) to integrate database logs
with the application's logging system.
- **Singleton**: A key design choice here is the `singletonPool`. This
ensures that only one database connection pool is created per
application instance, preventing resource exhaustion and ensuring
consistent database interaction. A mutex (`singletonPoolMutex`) protects
the creation of this singleton.
- **Schema Check**: Optionally, it can verify that the connected database
schema matches the expected schema defined for the application. This is
important for ensuring data integrity and compatibility.
- **Timeout Wrapper**: The raw database pool is wrapped with a
`timeout.New` wrapper. This enforces that all database operations are
performed within a context that has a timeout, preventing indefinite
blocking. `InstanceConfig --> NewDBPoolFromConfig -->
pgxpool.ParseConfig | +-> pgxpool.ConnectConfig --> rawPool | +->
timeout.New(rawPool) --> singletonPool (if schema check passes)`
- **PerfGit (`NewPerfGitFromConfig`)**: Constructs a `perfgit.Git` object,
which provides an interface to Git repository data.
- **Why**: Perf needs to associate performance data with specific code
revisions.
- **How**: It first obtains a database pool using `getDBPool` (which in
turn uses `NewDBPoolFromConfig`) and then instantiates `perfgit.New`
with this pool and the instance configuration.
- **TraceStore (`NewTraceStoreFromConfig`)**: Creates a
`tracestore.TraceStore` for managing performance trace data.
- **Why**: This is the core component for storing and retrieving
time-series performance metrics.
- **How**: It gets a database pool and a `TraceParamStore` (for managing
trace parameter sets) and then instantiates the appropriate
`sqltracestore`.
- **MetadataStore (`NewMetadataStoreFromConfig`)**: Creates a
`tracestore.MetadataStore` for managing metadata associated with traces.
- **How**: Similar to `TraceStore`, it obtains a database pool and then
creates an `sqltracestore.NewSQLMetadataStore`.
- **AlertStore, RegressionStore, ShortcutStore, GraphsShortcutStore,
AnomalyGroupStore, CulpritStore, SubscriptionStore, FavoriteStore,
UserIssueStore**: These functions follow a similar pattern: they obtain
a database pool via `getDBPool` and then instantiate their respective
SQL-backed store implementations (e.g., `sqlalertstore`,
`sqlregression2store`).
- **Why**: These stores manage various aspects of Perf's functionality,
such as alerting configurations, regression tracking, saved shortcuts,
etc. Centralizing their creation based on the common database
configuration simplifies the system.
- **RegressionStore Variation**: `NewRegressionStoreFromConfig` has a
conditional logic based on `instanceConfig.UseRegression2` to
instantiate either `sqlregression2store` or `sqlregressionstore`. This
allows for migrating to a new regression store implementation controlled
by configuration.
- **GraphsShortcutStore Caching**: `NewGraphsShortcutStoreFromConfig` can
return a cached version
(`graphsshortcutstore.NewCacheGraphsShortcutStore`) if `localToProd` is
true, indicating a local development or testing environment where a
simpler in-memory cache might be preferred over a database-backed store.
- **Source (`NewSourceFromConfig`)**: Creates a `file.Source` which
defines where Perf ingests data from (e.g., Google Cloud Storage, local
directories).
- **Why**: Perf needs to be flexible in terms of where it reads input data
files.
- **How**: It uses a `switch` statement based on
`instanceConfig.IngestionConfig.SourceConfig.SourceType` to instantiate
either a `gcssource` or a `dirsource`.
- **IngestedFS (`NewIngestedFSFromConfig`)**: Creates a `fs.FS` (file
system interface) that provides access to already ingested files.
- **Why**: To provide a consistent way to access files regardless of their
underlying storage (GCS or local).
- **How**: Similar to `NewSourceFromConfig`, it switches on the source
type to return a GCS or local file system implementation.
- **Cache (`GetCacheFromConfig`)**: Returns a `cache.Cache` instance
(either Redis-backed or local in-memory).
- **Why**: Caching is used to improve the performance of frequently
accessed data or computationally intensive queries.
- **How**: It checks `instanceConfig.QueryConfig.CacheConfig.Type` to
determine whether to create a `redisCache` (connecting to a Google Cloud
Redis instance) or a `localCache`.
The `getDBPool` helper function is used internally by many builder functions. It
acts as a dispatcher based on `instanceConfig.DataStoreConfig.DataStoreType`,
calling `NewDBPoolFromConfig` with appropriate schema checking flags. This
abstracts the direct call to `NewDBPoolFromConfig` and centralizes the logic for
selecting the database type.
The test file (`builders_test.go`) ensures that these builder functions
correctly instantiate objects and handle different configurations, including
invalid ones. A notable aspect of the tests is the management of the
`singletonPool`. Since `NewDBPoolFromConfig` creates a singleton, tests that
require fresh database instances must explicitly clear this singleton
(`singletonPool = nil`) before calling the builder to avoid reusing a connection
from a previous test. This is handled in `newDBConfigForTest`.
# Module: /go/chromeperf
The `chromeperf` module facilitates interaction with the Chrome Perf backend,
which is the system of record for performance data for Chromium. This module
allows Perf to send and receive data from Chrome Perf.
## Key Responsibilities
The primary responsibility of this module is to abstract the communication
details with the Chrome Perf API. It provides a typed Go interface to various
Chrome Perf endpoints, handling request formatting, authentication, and response
parsing.
This interaction is crucial for:
- **Reporting Regressions:** When Perf detects a performance regression, it
needs to inform Chrome Perf to create an alert and potentially file a bug.
- **Fetching Anomaly Data:** Perf needs to retrieve information about existing
anomalies and alert groups from Chrome Perf to display them in its UI or use
them in its analysis. This includes details about the commit range, affected
tests, and associated bug IDs.
- **Maintaining Test Path Consistency:** Chrome Perf and Perf may have
slightly different representations of test paths (e.g., due to character
restrictions). This module, in conjunction with the `sqlreversekeymapstore`
submodule, helps manage these differences.
## Key Components
### `chromeperfClient.go`
This file defines the generic `ChromePerfClient` interface and its
implementation, `chromePerfClientImpl`. This is the core component responsible
for making HTTP GET and POST requests to the Chrome Perf API.
**Why:** Abstracting the HTTP client allows for easier testing (by mocking the
client) and centralizes the logic for handling authentication (using OAuth2
Google default token source) and constructing target URLs.
**How:**
- It uses `google.DefaultTokenSource` for authentication.
- `generateTargetUrl` constructs the correct API endpoint URL, differentiating
between the Skia-Bridge proxy
(`https://skia-bridge-dot-chromeperf.appspot.com`) and direct calls to the
legacy Chrome Perf endpoint (`https://chromeperf.appspot.com`). The
Skia-Bridge is generally preferred.
- `SendGetRequest` and `SendPostRequest` handle the actual HTTP communication,
JSON marshalling/unmarshalling, and basic error handling, including checking
for accepted HTTP status codes.
Example workflow for a POST request:
```
Caller -> chromePerfClient.SendPostRequest(ctx, "anomalies", "add", requestBody, &responseObj, []int{200})
|
| (Serializes requestBody to JSON)
v
|--------------------------------------------------------------------------------------------------------|
| generateTargetUrl("https://skia-bridge-dot-chromeperf.appspot.com/anomalies/add") |
|--------------------------------------------------------------------------------------------------------|
|
v
httpClient.Post(targetUrl, "application/json", jsonBody)
|
v
(HTTP Request to Chrome Perf API)
|
v
(Receives HTTP Response)
|
v
(Checks if response status code is in acceptedStatusCodes)
|
v
(Deserializes response body into responseObj)
|
v
Caller (receives populated responseObj or error)
```
### `anomalyApi.go`
This file builds upon `chromeperfClient.go` to provide a specialized client for
interacting with the `/anomalies` endpoint in Chrome Perf. It defines the
`AnomalyApiClient` interface and its implementation `anomalyApiClientImpl`.
**Why:** This client encapsulates the logic specific to anomaly-related
operations, such as formatting requests for reporting regressions or fetching
anomaly details, and parsing the specific JSON structures returned by these
endpoints. It also handles the translation between Perf's trace identifiers and
Chrome Perf's `test_path` format.
**How:**
- **`ReportRegression`**: Constructs a `ReportRegressionRequest` and sends it
to the `anomalies/add` endpoint. This is how Perf informs Chrome Perf about
a new regression.
- **`GetAnomalyFromUrlSafeKey`**: Fetches details for a specific anomaly using
its key from the `anomalies/get` endpoint.
- **`GetAnomalies`**: Retrieves anomalies for a list of tests within a
specific commit range (`min_revision`, `max_revision`) by calling the
`anomalies/find` endpoint.
- It performs a crucial translation step: `traceNameToTestPath` converts
Perf's comma-separated key-value trace names (e.g.,
`,benchmark=Blazor,bot=MacM1,...`) into Chrome Perf's slash-separated
`test_path` (e.g., `ChromiumPerf/MacM1/Blazor/...`).
- It also handles potential discrepancies in commit numbers if Chrome Perf
returns commit hashes. It uses `perfGit.CommitNumberFromGitHash` to
resolve these.
- **`GetAnomaliesTimeBased`**: Similar to `GetAnomalies`, but fetches
anomalies based on a time range (`start_time`, `end_time`) by calling the
`anomalies/find_time` endpoint.
- **`GetAnomaliesAroundRevision`**: Fetches anomalies that occurred around a
specific revision number.
- **`traceNameToTestPath`**: This function is key for interoperability. It
parses a Perf trace name (which is a string of key-value pairs) and
constructs the corresponding `test_path` string that Chrome Perf expects. It
also handles an experimental feature (`EnableSkiaBridgeAggregation`) which
can modify how test paths are generated, particularly for aggregated
statistics (e.g., ensuring `testName_avg` is used if the `stat` is `value`).
- The logic for `statToSuffixMap` and `hasSuffixInTestValue` addresses
historical inconsistencies where test names in Perf might or might not
include statistical suffixes (like `_avg`, `_max`). The goal is to
derive the correct Chrome Perf `test_path`.
Workflow for fetching anomalies:
```
Perf UI/Backend -> anomalyApiClient.GetAnomalies(ctx, ["trace_A,key=val", "trace_B,key=val"], 100, 200)
|
v
(For each traceName)
traceNameToTestPath("trace_A,key=val") -> "chromeperf/test/path/A"
|
v
chromeperfClient.SendPostRequest(ctx, "anomalies", "find", {Tests: ["path/A", "path/B"], MinRevision: "100", MaxRevision: "200"}, &anomaliesResponse, ...)
|
v
(Parses anomaliesResponse, potentially resolving commit hashes to commit numbers)
|
v
Perf UI/Backend (receives AnomalyMap)
```
### `alertGroupApi.go`
This file provides a client for interacting with Chrome Perf's `/alert_group`
API, specifically to get details about alert groups. An alert group in Chrome
Perf typically corresponds to a set of related anomalies (regressions).
**Why:** When Perf displays information about an alert (which might have
originated from Chrome Perf), it needs to fetch details about the associated
alert group, such as the specific anomalies included, the commit range, and
other metadata.
**How:**
- **`GetAlertGroupDetails`**: Takes an alert group key and calls the
`alert_group/details` endpoint on Chrome Perf.
- The `AlertGroupDetails` struct holds the response, including a map of
`Anomalies` (where the value is the Chrome Perf `test_path`) and start/end
commit numbers/hashes.
- **`GetQueryParams` and `GetQueryParamsPerTrace`**: These methods are
utilities to transform the `AlertGroupDetails` into query parameters that
can be used to construct URLs for Perf's own explorer page. This allows
users to easily navigate from a Chrome Perf alert to viewing the
corresponding data in Perf.
- `GetQueryParams` aggregates all test path components (masters, bots,
benchmarks, etc.) from all anomalies in the group into a single set of
parameters.
- `GetQueryParamsPerTrace` generates a separate set of query parameters
for _each_ individual anomaly in the alert group.
- They parse the slash-separated `test_path` from Chrome Perf back into
individual components.
Workflow for getting alert group details:
```
Perf Backend (e.g., when processing an incoming alert from Chrome Perf)
|
v
alertGroupApiClient.GetAlertGroupDetails(ctx, "chrome_perf_group_key")
|
v
chromeperfClient.SendGetRequest(ctx, "alert_group", "details", {key: "chrome_perf_group_key"}, &alertGroupResponse)
|
v
(alertGroupResponse is populated)
|
v
alertGroupResponse.GetQueryParams(ctx) -> Perf Explorer URL query params
```
### `store.go` and the `sqlreversekeymapstore` submodule
`store.go` defines the `ReverseKeyMapStore` interface. The
`sqlreversekeymapstore` directory and its `schema` subdirectory provide an
SQL-based implementation of this interface.
**Why:** Test paths in Chrome Perf can contain characters that are considered
"invalid" or are handled differently by Perf's parameter parsing (e.g., Perf's
trace keys are comma-separated key-value pairs, and the values themselves should
ideally not interfere with this). When data is ingested into Perf from Chrome
Perf, or when Perf constructs test paths to query Chrome Perf, these "invalid"
characters in Chrome Perf test path components (like subtest names) might be
replaced (e.g., with underscores).
This creates a problem: if Perf has `test/foo_bar` and Chrome Perf has
`test/foo?bar`, Perf needs a way to know that `foo_bar` corresponds to `foo?bar`
when querying Chrome Perf. The `ReverseKeyMapStore` is designed to store these
mappings.
**How:**
- `sqlreversekeymapstore/schema/schema.go` defines the SQL table schema
`ReverseKeyMapSchema` with columns:
- `ModifiedValue`: The value as it appears in Perf (e.g., `foo_bar`).
- `ParamKey`: The parameter key this value belongs to (e.g., `subtest_1`).
- `OriginalValue`: The original value as it was in Chrome Perf (e.g.,
`foo?bar`).
- The primary key is a combination of `ModifiedValue` and `ParamKey`.
- `sqlreversekeymapstore/sqlreversekeymapstore.go` implements the
`ReverseKeyMapStore` interface using a SQL database (configurable for
CockroachDB or Spanner via different SQL statements).
- `Create`: Inserts a new mapping. If a mapping for the `ModifiedValue`
and `ParamKey` already exists (conflict), it does nothing. This is
important because the mapping should be stable.
- `Get`: Retrieves the `OriginalValue` given a `ModifiedValue` and
`ParamKey`.
This store is likely used during the process of converting between Perf trace
parameters and Chrome Perf test paths, especially when generating requests _to_
Chrome Perf. If a parameter value in Perf might have been modified from its
Chrome Perf original, this store can be queried to get the original value needed
for the Chrome Perf API call. The exact point of integration for creating these
mappings (i.e., when are `Create` calls made) is not explicitly detailed within
this module but would typically happen when Perf first encounters/ingests a test
path from Chrome Perf that requires modification.
For example, if `anomalyApi.go` needs to construct a `test_path` to query Chrome
Perf based on parameters from Perf:
1. Perf has params: `test=my_test, subtest_1=value_with_question_mark`
2. When constructing the `test_path` segment for `subtest_1`: - Call `reverseKeyMapStore.Get(ctx, "value_with_question_mark",
"subtest_1")`. - If it returns an original value like `"value?with?question?mark"`, use
that for the Chrome Perf API call. - Otherwise, use `"value_with_question_mark"`.
The `store.go` file simply defines the interface, allowing for different backend
implementations of this mapping store if needed, though `sqlreversekeymapstore`
is the provided concrete implementation.
# Module: /go/clustering2
## Overview
The `clustering2` module is responsible for grouping similar performance traces
together using k-means clustering. This helps in identifying patterns and
regressions in performance data by analyzing the collective behavior of traces
rather than individual ones. The core idea is to represent each trace as a point
in a multi-dimensional space and then find `k` clusters of these points.
## Design and Implementation
### Why K-Means?
K-means is a well-understood and relatively efficient clustering algorithm
suitable for the scale of performance data encountered. It partitions data into
`k` distinct, non-overlapping clusters. Each data point belongs to the cluster
with the nearest mean (cluster centroid). This approach allows for the
summarization of large numbers of traces into a smaller set of representative
"shapes" or behaviors.
### Key Components and Files
#### `clustering.go`
This file contains the primary logic for performing k-means clustering on
performance traces.
- **`ClusterSummary`**: This struct represents a single cluster found by the
k-means algorithm.
- `Centroid`: The average shape of all traces in this cluster. This is the
core representation of the cluster's behavior.
- `Keys`: A list of identifiers for the traces belonging to this cluster.
These are sorted by their distance to the `Centroid`, allowing users to
quickly see the most representative traces. This is not serialized to
JSON to keep the payload manageable, as it can be very large.
- `Shortcut`: An identifier for a pre-computed set of `Keys`, used for
efficient retrieval and display in UIs.
- `ParamSummaries`: A breakdown of the parameter key-value pairs present
in the cluster and their prevalence (see `valuepercent.go`). This helps
in understanding what distinguishes this cluster (e.g., "all traces in
this cluster are for `arch=x86`").
- `StepFit`: Contains information about how well the `Centroid` fits a
step function. This is crucial for identifying regressions or
improvements that manifest as sudden shifts in performance.
- `StepPoint`: The specific data point (commit/timestamp) where the step
(if any) in the `Centroid` is detected.
- `Num`: The total number of traces in this cluster.
- `Timestamp`: Records when the cluster analysis was performed.
- `NotificationID`: Stores the ID of any alert or notification sent
regarding a significant step change detected in this cluster.
- **`ClusterSummaries`**: A container for all the `ClusterSummary` objects
produced by a single clustering run, along with metadata like the `K` value
used and the `StdDevThreshold`.
- **`CalculateClusterSummaries` function**: This is the main entry point for
the clustering process.
- **Trace Conversion**: It takes a `dataframe.DataFrame` (which holds
traces and their metadata) and converts each trace into a
`kmeans.Clusterable` object. The `ctrace2.NewFullTrace` function is used
here, which likely involves some form of normalization or feature
extraction to make traces comparable. The `stddevThreshold` parameter is
used during this conversion, potentially to filter out noisy or flat
traces.
- **Initial Centroid Selection (`chooseK`)**: K-means requires an initial
set of `k` centroids. This function randomly selects `k` traces from the
input data to serve as the initial centroids. Random selection is a
common and simple initialization strategy.
- **K-Means Iteration**:
- The `kmeans.Do` function performs one iteration of the k-means
algorithm:
1. Assign each observation (trace) to the nearest centroid.
2. Recalculate the centroids based on the mean of the observations
assigned to them. The `ctrace2.CalculateCentroid` function is likely
responsible for computing the mean of a set of traces.
- This process is repeated for a maximum of `MAX_KMEANS_ITERATIONS` or
until the change in `totalError` (sum of squared distances from each
point to its centroid) between iterations falls below `KMEAN_EPSILON`.
This convergence criterion prevents unnecessary computations once the
clusters stabilize.
- A `Progress` callback can be provided to monitor the clustering process,
reporting the `totalError` at each iteration.
- **Summary Generation (`getClusterSummaries`)**: After the k-means
algorithm converges, this function takes the final centroids and the
original observations to generate `ClusterSummary` objects for each
cluster.
- For each cluster, it identifies the member traces.
- It calculates `ParamSummaries` (see `valuepercent.go`) to describe the
common characteristics of traces in that cluster.
- It performs step detection (`stepfit.GetStepFitAtMid`) on the cluster's
centroid to identify significant performance shifts. The `interesting`
parameter likely defines a threshold for what constitutes a noteworthy
step change, and `stepDetection` specifies the algorithm or method used
for step detection.
- It sorts the traces within each cluster by their distance to the
centroid, ensuring `ClusterSummary.Keys` lists the most representative
traces first. A limited number of sample keys
(`config.MaxSampleTracesPerCluster`) are stored.
- Finally, the resulting `ClusterSummary` objects are sorted, likely by
the magnitude or significance of the detected step
(`StepFit.Regression`), to highlight the most impactful changes first.
- **Constants**:
- `K`: The default number of clusters to find. 50 is chosen as a balance
between granularity and computational cost.
- `MAX_KMEANS_ITERATIONS`: A safeguard against non-converging k-means
runs.
- `KMEAN_EPSILON`: A threshold to determine convergence, balancing
precision with computation time.
#### `valuepercent.go`
This file defines how to summarize and present the parameter distributions
within a cluster.
- **`ValuePercent` struct**: Represents a specific parameter key-value pair
(e.g., "config=8888") and the percentage of traces in a cluster that have
this pair. This provides a quantitative measure of how characteristic a
parameter is for a given cluster.
- **`SortValuePercentSlice` function**: This is crucial for making the
`ParamSummaries` in `ClusterSummary` human-readable and informative. The
goal is to:
1. Group parameter values by their key (e.g., all "config=..." values
together).
2. Within each key group, sort by the percentage (highest first).
3. Sort the key groups themselves by the highest percentage of their top
value. If percentages are equal, an alphabetical sort of the value is
used as a tie-breaker.
This complex sorting logic ensures that the most dominant and distinguishing
parameters for a cluster are presented prominently. For example:
```
config=8888 90%
config=565 10%
arch=x86 80%
arch=arm 20%
```
Here, "config" is listed before "arch" because its top value ("config=8888")
has a higher percentage (90%) than the top value for "arch" ("arch=x86" at
80%).
### Workflow: Calculating Cluster Summaries
```
Input: DataFrame (traces, headers), K, StdDevThreshold, ProgressCallback, InterestingThreshold, StepDetectionMethod
1. [clustering.go: CalculateClusterSummaries]
a. Initialize empty list of observations.
b. For each trace in DataFrame.TraceSet:
i. Create ClusterableTrace (ctrace2.NewFullTrace) using trace data and StdDevThreshold.
ii. Add to observations list.
c. If no observations, return error.
d. [clustering.go: chooseK]
i. Randomly select K observations to be initial centroids.
e. Initialize lastTotalError = 0.0
f. Loop MAX_KMEANS_ITERATIONS times OR until convergence:
i. [kmeans.Do] -> new_centroids
1. Assign each observation to its closest centroid (from previous iteration or initial).
2. Recalculate centroids (ctrace2.CalculateCentroid) based on assigned observations.
ii. [kmeans.TotalError] -> currentTotalError
iii. If ProgressCallback provided, call it with currentTotalError.
iv. If |currentTotalError - lastTotalError| < KMEAN_EPSILON, break loop.
v. lastTotalError = currentTotalError
g. [clustering.go: getClusterSummaries] -> clusterSummaries
i. [kmeans.GetClusters] -> allClusters (list of observations per centroid)
ii. For each cluster in allClusters and its corresponding centroid:
1. Create new ClusterSummary.
2. [clustering.go: getParamSummaries] (using cluster members) -> ParamSummaries
a. [clustering.go: GetParamSummariesForKeys]
i. Count occurrences of each param=value in cluster keys.
ii. Convert counts to ValuePercent structs.
iii. [valuepercent.go: SortValuePercentSlice] -> sorted ParamSummaries.
3. [stepfit.GetStepFitAtMid] (on centroid values, StdDevThreshold, InterestingThreshold, StepDetectionMethod) -> StepFit, StepPoint.
4. Set ClusterSummary.Num = number of members in cluster.
5. Sort cluster members by distance to centroid.
6. Populate ClusterSummary.Keys with top N sorted member keys.
7. Populate ClusterSummary.Centroid with centroid values.
iii. Sort all ClusterSummary objects (e.g., by StepFit.Regression).
h. Populate ClusterSummaries struct with results, K, and StdDevThreshold.
i. Return ClusterSummaries.
Output: ClusterSummaries object or error.
```
This process effectively transforms raw trace data into a structured summary
that highlights significant patterns and changes, facilitating performance
analysis and regression detection.
# Module: /go/config
The `/go/config` module defines the configuration structure for Perf instances
and provides utilities for loading, validating, and managing these
configurations. It plays a crucial role in customizing the behavior of a Perf
instance, from data ingestion and storage to alert notifications and UI
presentation.
**Core Responsibilities and Design:**
The primary responsibility of this module is to define and manage the
`InstanceConfig` struct. This struct is a comprehensive container for all
settings that govern a Perf instance. The design emphasizes:
1. **Centralized Configuration:** By consolidating all instance-specific
settings into a single `InstanceConfig` struct (`config.go`), the module
provides a single source of truth. This simplifies understanding the state
of an instance and reduces the chances of configuration drift.
2. **Typed Configuration:** Using Go structs with explicit types ensures that
configuration values are of the expected format, catching many potential
errors at compile-time or during validation. This is preferable to using
untyped maps or generic configuration formats.
3. **JSON Serialization/Deserialization:** Configuration files are expected to
be in JSON format. The module uses standard Go `encoding/json` for this,
making it easy to create, read, and modify configurations.
4. **Schema Validation:** To ensure the integrity and correctness of
configuration files, the module employs JSON Schema validation
(`/go/config/validate/validate.go`,
`/go/config/validate/instanceConfigSchema.json`).
- A JSON schema (`instanceConfigSchema.json`) formally defines the
structure and types of the `InstanceConfig`. This schema is
automatically generated from the Go struct definition using the
`/go/config/generate/main.go` program, ensuring the schema stays in sync
with the code.
- The `validate.InstanceConfigFromFile` function uses this schema to
validate a configuration file before attempting to deserialize it. This
allows for early detection of malformed or incomplete configurations.
5. **Command-Line Flag Integration:** The module defines structs like
`BackendFlags`, `FrontendFlags`, `IngestFlags`, and `MaintenanceFlags`
(`config.go`). These structs group related command-line flags and provide
methods (`AsCliFlags`) to convert them into `cli.Flag` slices, compatible
with the `github.com/urfave/cli/v2` library. This design keeps flag
definitions organized and associated with the components they configure.
6. **Extensibility:** The `InstanceConfig` is designed to be extensible. New
configuration options can be added as new fields to the relevant
sub-structs. The JSON schema generation and validation mechanisms will
automatically adapt to these changes.
**Key Components and Files:**
- **`config.go`:** This is the heart of the module.
- It defines the main `InstanceConfig` struct, which aggregates various
sub-configuration structs like `AuthConfig`, `DataStoreConfig`,
`IngestionConfig`, `GitRepoConfig`, `NotifyConfig`,
`IssueTrackerConfig`, `AnomalyConfig`, `QueryConfig`, `TemporalConfig`,
and `DataPointConfig`. Each of these sub-structs groups settings related
to a specific aspect of the Perf system (e.g., authentication, data
storage, data ingestion).
- It defines various enumerated types (e.g., `DataStoreType`,
`SourceType`, `GitAuthType`, `GitProvider`, `TraceFormat`) to provide
clear and constrained options for certain configuration values.
- It includes `DurationAsString`, a custom type for handling
`time.Duration` serialization and deserialization as strings in JSON,
which is more human-readable than nanosecond integers. It also provides
a custom JSON schema for this type.
- It defines structs for command-line flags used by different Perf
services (backend, frontend, ingest, maintenance). This helps in
organizing and parsing command-line arguments.
- Global constants like `MaxSampleTracesPerCluster`, `MinStdDev`,
`GotoRange`, and `QueryMaxRunTime` are defined here, providing default
values or limits used across the application.
- **`/go/config/validate/validate.go`:**
- This file contains the logic for validating an `InstanceConfig` beyond
what the JSON schema can enforce. This includes semantic checks, such as
ensuring that required fields are present based on the values of other
fields (e.g., API keys for issue tracker notifications).
- The `InstanceConfigFromFile` function is the primary entry point for
loading and validating a configuration file. It first performs schema
validation and then calls the `Validate` function for further business
logic checks.
- It also validates the Go text templates used in `NotifyConfig` by
attempting to format them with sample data. This helps catch template
syntax errors early.
- **`/go/config/validate/instanceConfigSchema.json`:**
- This is an automatically generated JSON Schema file that defines the
expected structure and data types for `InstanceConfig` JSON files. It is
used by `validate.go` to perform initial validation of configuration
files.
- **`/go/config/generate/main.go`:**
- This is a small utility program that generates the
`instanceConfigSchema.json` file based on the `InstanceConfig` struct
definition in `config.go`. This ensures that the schema is always
up-to-date with the Go code. The `//go:generate` directive at the top of
the file allows for easy regeneration of the schema.
- **`config_test.go` and `/go/config/validate/validate_test.go`:**
- These files contain unit tests for the configuration loading,
serialization/deserialization (especially for custom types like
`DurationAsString`), and validation logic. The tests for `validate.go`
include checks against actual configuration files used in production
(`//perf:configs`), ensuring that the validation logic is robust and
correctly handles real-world scenarios.
**Workflows:**
**1. Loading and Validating a Configuration File:**
```
User provides config file path (e.g., "configs/nano.json")
|
V
Application calls validate.InstanceConfigFromFile("configs/nano.json")
|
V
validate.go: Reads the JSON file content.
|
V
validate.go: Validates content against instanceConfigSchema.json (using jsonschema.Validate).
| \
| (If schema violation) \
V V
Error returned with schema violations. Deserializes JSON into config.InstanceConfig struct.
|
V
validate.go: Calls Validate(instanceConfig) for further business logic checks.
| (e.g., API key presence, template validity)
|
| (If validation error)
V
Error returned.
|
V (If all valid)
Returns the populated config.InstanceConfig struct.
|
V
Application sets config.Config = returnedInstanceConfig
|
V
Perf instance uses config.Config for its operations.
```
**2. Generating the JSON Schema:**
This is typically done during development when the `InstanceConfig` struct
changes.
```
Developer modifies config.InstanceConfig struct in config.go
|
V
Developer runs `go generate` in the /go/config/generate directory (or via bazel)
|
V
/go/config/generate/main.go: Calls jsonschema.GenerateSchema("../validate/instanceConfigSchema.json", &config.InstanceConfig{})
|
V
jsonschema library: Introspects the config.InstanceConfig struct and its fields.
|
V
jsonschema library: Generates a JSON Schema definition.
|
V
/go/config/generate/main.go: Writes the generated schema to /go/config/validate/instanceConfigSchema.json.
```
The design prioritizes robustness through schema and semantic validation,
maintainability through structured Go types and centralized configuration, and
ease of use through standard JSON format and command-line flag integration. The
separation of schema generation (`generate` subdirectory) and validation
(`validate` subdirectory) keeps concerns distinct.
# Module: /go/ctrace2
## ctrace2 Module Documentation
### Overview
The `ctrace2` module provides the functionality to adapt trace data (represented
as a series of floating-point values) for use with k-means clustering
algorithms. The primary goal is to transform raw trace data into a format that
is suitable for distance calculations and centroid computations, which are
fundamental operations in k-means. This involves normalization and handling of
missing data points.
### Why and How
In performance analysis, traces often represent measurements over time or across
different configurations. Clustering these traces helps identify groups of
similar performance characteristics. However, raw trace data might have issues
that hinder effective clustering:
1. **Varying Scales:** Different traces might have values in vastly different
ranges, leading to biased distance calculations where traces with larger
absolute values dominate.
2. **Missing Data:** Traces can have missing data points, which need to be
handled appropriately during normalization and distance computation.
3. **Zero Standard Deviation:** Traces with constant values (zero standard
deviation) can cause division by zero errors during normalization.
The `ctrace2` module addresses these by:
- **Normalization:** Each trace is normalized to have a standard deviation of
1.0. This ensures that the scale of the values does not disproportionately
influence the clustering. The `vec32.Norm` function from the `go/vec32`
module is leveraged for this. Before normalization, any missing data points
(`vec32.MissingDataSentinel`) are filled in using `vec32.Fill`, which likely
interpolates or uses a similar strategy to replace them.
- **Minimum Standard Deviation:** To prevent division by zero or issues with
extremely small standard deviations, a `minStdDev` parameter is used during
normalization. If the calculated standard deviation of a trace is below this
minimum, the `minStdDev` value is used instead. This is a practical approach
to handle traces with very little variation without excluding them from
clustering.
- **`ClusterableTrace` Structure:** This structure wraps the trace data (`Key`
and `Values`) and implements the `kmeans.Clusterable` and `kmeans.Centroid`
interfaces from the `perf/go/kmeans` module. This makes `ClusterableTrace`
instances directly usable by the k-means algorithm.
### Responsibilities and Key Components
- **`ctrace.go`:** This is the core file of the module.
- **`ClusterableTrace` struct:**
- **Purpose:** Represents a single trace that is ready for clustering. It
holds a `Key` (a string identifier for the trace) and `Values` (a slice
of `float32` representing the normalized data points).
- **Why:** This struct is designed to be directly consumable by the
k-means clustering algorithm by implementing necessary interfaces.
- **`Distance(c kmeans.Clusterable) float64` method:** Calculates the
Euclidean distance between the current `ClusterableTrace` and another
`ClusterableTrace`. This is crucial for the k-means algorithm to
determine how similar two traces are. The calculation assumes that both
traces have the same number of data points (a guarantee maintained by
`NewFullTrace`). `For each point i in trace1 and trace2: diff_i =
trace1.Values[i] - trace2.Values[i] squared_diff_i = diff_i * diff_i Sum
all squared_diff_i Distance = Sqrt(Sum)`
- **`AsClusterable() kmeans.Clusterable` method:** Returns the
`ClusterableTrace` itself, satisfying the `kmeans.Centroid` interface
requirement.
- **`Dup(newKey string) *ClusterableTrace` method:** Creates a deep copy
of the `ClusterableTrace` with a new key. This is useful when you need
to manipulate a trace without affecting the original.
- **`NewFullTrace(key string, values []float32, minStdDev float32)
*ClusterableTrace` function:**
- **Purpose:** The primary factory function for creating
`ClusterableTrace` instances from raw trace data.
- **How:** 1. It takes a `key` (string identifier), raw `values` (`[]float32`),
and a `minStdDev`. 2. Creates a copy of the input `values` to avoid modifying the original
slice. 3. Calls `vec32.Fill()` on the copied values. This step handles missing
data points by filling them, likely through interpolation or a
similar imputation technique provided by the `go/vec32` module. 4. Calls `vec32.Norm()` on the filled values, using `minStdDev`. This
normalizes the trace data so that its standard deviation is
effectively 1.0 (or adjusted if the original standard deviation was
below `minStdDev`). 5. Returns a new `ClusterableTrace` with the provided `key` and the
processed (filled and normalized) `values`. `Input: key, raw_values,
minStdDev ------------------------------------ copied_values =
copy(raw_values) filled_values = vec32.Fill(copied_values)
normalized_values = vec32.Norm(filled_values, minStdDev) Output:
ClusterableTrace{Key: key, Values: normalized_values}`
- **`CalculateCentroid(members []kmeans.Clusterable) kmeans.Centroid`
function:**
- **Purpose:** Implements the `kmeans.CalculateCentroid` function type.
Given a slice of `ClusterableTrace` instances (which are members of a
cluster), it computes their centroid.
- **How:** 1. It initializes a new slice of `float32` (`mean`) with the same
length as the `Values` of the first member trace. 2. It iterates through each member trace in the `members` slice. 3. For each member, it iterates through its `Values` and adds each
value to the corresponding element in the `mean` slice. 4. After summing up all values component-wise, it divides each element
in the `mean` slice by the total number of `members` to get the
average value for each dimension. 5. It returns a new `ClusterableTrace` representing the centroid. The
key for this centroid trace is set to `CENTROID_KEY`
("special_centroid"). `Input: members (list of
ClusterableTraces) ------------------------------------------
Initialize mean_values = [0.0, 0.0, ..., 0.0] (same length as
members[0].Values) For each member_trace in members: For each i from
0 to len(member_trace.Values) - 1: mean_values[i] = mean_values[i] +
member_trace.Values[i] For each i from 0 to len(mean_values) - 1:
mean_values[i] = mean_values[i] / len(members) Output:
ClusterableTrace{Key: CENTROID_KEY, Values: mean_values}`
- **`CENTROID_KEY` constant:**
- **Purpose:** Defines a standard key ("special_centroid") to be used for
traces that represent the centroid of a cluster.
- **Why:** This provides a consistent way to identify centroid traces if
they are, for example, added back into a collection of traces (e.g., in
a DataFrame).
The interaction with the `go/vec32` module is crucial for data preprocessing
(filling missing values and normalization), while the `perf/go/kmeans` module
provides the interfaces that `ctrace2` implements to be compatible with k-means
clustering algorithms.
# Module: /go/culprit
The `culprit` module is responsible for identifying, storing, and notifying
about commits that are likely causes of performance regressions. It integrates
with anomaly detection and subscription systems to automate the process of
pinpointing culprits and alerting relevant parties.
## Key Responsibilities
- **Culprit Identification:** While the actual bisection logic might reside
elsewhere, this module is responsible for receiving information about
potential culprit commits.
- **Culprit Persistence:** Storing identified culprits in a database, linking
them to the anomaly groups they are associated with.
- **Notification:** Generating and sending notifications (e.g., creating
issues in an issue tracker) when new culprits are found or when new anomaly
groups are reported.
- **Data Formatting:** Formatting notification messages (subjects and bodies)
based on configurable templates.
## Key Components and Files
### `store.go` & `sqlculpritstore/sqlculpritstore.go`
- **Purpose:** These files define the interface and implementation for storing
and retrieving culprit data. The primary goal is to persist information
about commits identified as culprits, associating them with specific anomaly
groups and any filed issues.
- **How it Works:**
- `store.go` defines the `Store` interface, which outlines the contract
for culprit data operations like `Get`, `Upsert`, and `AddIssueId`.
- `sqlculpritstore/sqlculpritstore.go` provides a SQL-based implementation
of this interface. It uses a SQL database (configured via `pool.Pool`)
to store culprit information.
- The `Upsert` method is crucial. It either inserts a new culprit record
or updates an existing one if a commit has already been identified as a
culprit for a different anomaly group. This prevents duplicate culprit
entries for the same commit. It also links the culprit to the
`anomaly_group_id`.
- The `AddIssueId` method updates a culprit record to include the ID of an
issue (e.g., a bug tracker ticket) that was created for it, and also
maintains a map between the anomaly group and the issue ID. This is
important for tracking and referencing.
- The database schema (defined in `sqlculpritstore/schema/schema.go`)
includes fields for commit details (host, project, ref, revision),
associated anomaly group IDs, and associated issue IDs. An index on
`(revision, host, project, ref)` helps in efficiently querying for
existing culprits.
- **Design Choices:**
- Using an interface (`Store`) decouples the rest of the module from the
specific database implementation, allowing for easier testing and
potential future changes in the storage backend.
- The `Upsert` logic is designed to handle cases where the same commit
might be identified as a culprit for multiple regressions (different
anomaly groups). Instead of creating duplicate entries, it appends the
new `anomaly_group_id` to the existing record.
- Storing `group_issue_map` as JSONB allows flexible storage of the
mapping between anomaly groups and the specific issue filed for that
group in the context of this culprit.
### `formatter/formatter.go`
- **Purpose:** This component is responsible for constructing the content
(subject and body) of notifications. It allows for customizable message
formats.
- **How it Works:**
- Defines the `Formatter` interface with methods
`GetCulpritSubjectAndBody` (for new culprit notifications) and
`GetReportSubjectAndBody` (for new anomaly group reports).
- `MarkdownFormatter` is the concrete implementation. It uses Go's
`text/template` package to render notification messages.
- Templates for subjects and bodies can be provided via `InstanceConfig`.
If not provided, default templates are used.
- `TemplateContext` and `ReportTemplateContext` provide the data that can
be used within the templates (e.g., commit details, subscription
information, anomaly group details).
- Helper functions like `buildCommitURL`, `buildAnomalyGroupUrl`, and
`buildAnomalyDetails` are available within the templates to construct
URLs and format anomaly details.
- **Design Choices:**
- The use of interfaces and templates promotes flexibility. Users can
define their own notification formats without modifying the core
notification logic.
- Default templates ensure that the system can function even without
explicit template configuration.
- Separating formatting from the transport mechanism (how notifications
are sent) adheres to the single responsibility principle.
- **`formatter/noop.go`**: Provides a `NoopFormatter` that generates empty
subjects and bodies, useful for disabling notifications or for testing
scenarios where actual formatting is not needed.
### `transport/transport.go`
- **Purpose:** This component handles the actual sending of notifications to
external systems, primarily issue trackers.
- **How it Works:**
- Defines the `Transport` interface with the `SendNewNotification` method.
- `IssueTrackerTransport` is the concrete implementation for interacting
with an issue tracker (e.g., Google Issue Tracker/Buganizer).
- It uses the `go.skia.org/infra/go/issuetracker/v1` client library.
- Authentication is handled using an API key retrieved via the `secret`
package.
- When `SendNewNotification` is called, it constructs an
`issuetracker.Issue` object based on the provided subject, body, and
subscription details (like component ID, priority, CCs, hotlists).
- It then calls the issue tracker API to create a new issue.
- Metrics (`SendNewNotificationSuccess`, `SendNewNotificationFail`) are
recorded to monitor the success rate of sending notifications.
- **Design Choices:**
- The `Transport` interface allows for different notification mechanisms
to be plugged in (e.g., email, Slack) in the future.
- Configuration for the issue tracker (API key, secret project/name) is
externalized, promoting better security and manageability.
- Error handling and metrics provide visibility into the notification
delivery process.
- **`transport/noop.go`**: Provides a `NoopTransport` that doesn't actually
send any notifications, useful for disabling notifications or for testing.
### `notify/notify.go`
- **Purpose:** This component orchestrates the notification process by
combining a `Formatter` and a `Transport`.
- **How it Works:**
- Defines the `CulpritNotifier` interface with methods
`NotifyCulpritFound` and `NotifyAnomaliesFound`.
- `DefaultCulpritNotifier` implements this interface. It takes a
`formatter.Formatter` and a `transport.Transport` as dependencies.
- The `GetDefaultNotifier` factory function determines which `Formatter`
and `Transport` to use based on the
`InstanceConfig.IssueTrackerConfig.NotificationType`. If `NoneNotify`,
it uses `NoopFormatter` and `NoopTransport`. If `IssueNotify`, it sets
up `MarkdownFormatter` and `IssueTrackerTransport`.
- `NotifyCulpritFound`:
* Calls the formatter's `GetCulpritSubjectAndBody` to get the message
content.
* Calls the transport's `SendNewNotification` to send the message.
* Returns the ID of the created issue (or an empty string if no
notification was sent).
- `NotifyAnomaliesFound`:
* Calls the formatter's `GetReportSubjectAndBody`.
* Calls the transport's `SendNewNotification`.
* Returns the ID of the created issue.
- **Design Choices:**
- Decouples the high-level notification logic from the specifics of
message formatting and sending.
- Configuration-driven selection of formatter and transport makes the
notification behavior adaptable.
### `service/service.go`
- **Purpose:** Implements the gRPC service defined in `culprit.proto`. This is
the main entry point for external systems (like a bisection service or an
anomaly detection pipeline) to interact with the culprit module.
- **How it Works:**
- Implements the `pb.CulpritServiceServer` interface.
- It depends on `anomalygroup.Store`, `culprit.Store`,
`subscription.Store`, and `notify.CulpritNotifier`.
- **`PersistCulprit` RPC:**
* Calls `culpritStore.Upsert` to save the identified culprit commits and
associate them with the `anomaly_group_id`.
* Calls `anomalygroupStore.AddCulpritIDs` to link the newly
created/updated culprit IDs back to the anomaly group. ``[Client (e.g.,
Bisection Service)] | v [PersistCulpritRequest {Commits,
AnomalyGroupID}] | v [culpritService.PersistCulprit] | \ | `->
[culpritStore.Upsert(AnomalyGroupID, Commits)] -> Returns CulpritIDs | |
| v `<-----------------------
[anomalygroupStore.AddCulpritIDs(AnomalyGroupID, CulpritIDs)] | v
[PersistCulpritResponse {CulpritIDs}] | v [Client]``
- **`GetCulprit` RPC:**
* Calls `culpritStore.Get` to retrieve culprit details by their IDs.
- **`NotifyUserOfCulprit` RPC:**
* Retrieves culprit details using `culpritStore.Get`.
* Loads the corresponding `AnomalyGroup` using
`anomalygroupStore.LoadById`.
* Loads the `Subscription` associated with the anomaly group using
`subscriptionStore.GetSubscription`.
* Calls `notifier.NotifyCulpritFound` for each culprit to send a
notification (e.g., file a bug).
* Calls `culpritStore.AddIssueId` to store the generated issue ID with the
culprit and the specific anomaly group. ``[Client (e.g., Bisection
Service after PersistCulprit)] | v [NotifyUserOfCulpritRequest
{CulpritIDs, AnomalyGroupID}] | v [culpritService.NotifyUserOfCulprit]
|-> [culpritStore.Get(CulpritIDs)] -> Culprits |->
[anomalygroupStore.LoadById(AnomalyGroupID)] -> AnomalyGroup | | ->
[subscriptionStore.GetSubscription(AnomalyGroup.SubName,
AnomalyGroup.SubRev)] -> Subscription | (For each Culprit in Culprits) |
| | `-> [notifier.NotifyCulpritFound(Culprit, Subscription)] -> Returns
IssueID | | | v | `-> [culpritStore.AddIssueId(Culprit.ID, IssueID,
AnomalyGroupID)] | v [NotifyUserOfCulpritResponse {IssueIDs}] | v
[Client]``
- **`NotifyUserOfAnomaly` RPC:**
* Loads the `AnomalyGroup` and its associated `Subscription`.
* Calls `notifier.NotifyAnomaliesFound` to send a notification about the
group of anomalies (e.g., file a summary bug). ``[Client (e.g., Anomaly
Detection Service)] | v [NotifyUserOfAnomalyRequest {AnomalyGroupID,
Anomalies[]}] | v [culpritService.NotifyUserOfAnomaly] |->
[anomalygroupStore.LoadById(AnomalyGroupID)] -> AnomalyGroup | | ->
[subscriptionStore.GetSubscription(AnomalyGroup.SubName,
AnomalyGroup.SubRev)] -> Subscription | `->
[notifier.NotifyAnomaliesFound(AnomalyGroup, Subscription, Anomalies[])]
-> Returns IssueID | v [NotifyUserOfAnomalyResponse {IssueID}] | v
[Client]``
- `PrepareSubscription` is a helper function used to potentially override
or mock subscription details for testing or during transitional phases
before full sheriff configuration is active. This is a temporary
measure.
- **Design Choices:**
- Clear separation of concerns: the service layer orchestrates actions by
calling appropriate stores and notifiers.
- gRPC provides a well-defined, language-agnostic interface for the
service.
- The authorization policy (`GetAuthorizationPolicy`) is currently set to
allow unauthenticated access, which might need to be revisited for
production environments.
### `proto/v1/culprit_service.proto`
- **Purpose:** Defines the gRPC service contract for culprit-related
operations.
- **Key Messages and RPCs:**
- `Commit`: Represents a source code commit.
- `Culprit`: Represents an identified culprit commit, including its ID,
the commit details, associated anomaly group IDs, and issue IDs. It also
includes `group_issue_map` to track which issue was filed for which
anomaly group in the context of this culprit.
- `Anomaly`: Represents a detected performance anomaly (duplicated from
anomalygroup service for potential independent evolution).
- `PersistCulpritRequest`/`Response`: For storing new culprits.
- `GetCulpritRequest`/`Response`: For retrieving existing culprits.
- `NotifyUserOfAnomalyRequest`/`Response`: For triggering notifications
about a new set of anomalies (anomaly group).
- `NotifyUserOfCulpritRequest`/`Response`: For triggering notifications
about newly identified culprits.
- **Design Choices:**
- Proto definitions provide a clear, typed contract for communication.
- The `Anomaly` message is duplicated from the `anomalygroup` service.
This choice was made to allow the `culprit` service and `anomalygroup`
service to evolve their respective `Anomaly` definitions independently
if needed in the future, avoiding tight coupling.
- The `group_issue_map` in the `Culprit` message is important for
scenarios where a single culprit might be associated with multiple
anomaly groups, and each of those (culprit, group) pairs might result in
a distinct bug being filed.
### Mocks (`mocks/` subdirectories)
- These directories contain generated mock implementations for the interfaces
defined within the `culprit` module (e.g., `Store`, `Formatter`,
`Transport`, `CulpritNotifier`, `CulpritServiceServer`).
- **Purpose:** Facilitate unit testing by allowing dependencies to be easily
mocked. This is standard practice for writing testable Go code. They are
generated using tools like `mockery`.
## Overall Workflow Example: Finding and Notifying a Culprit
1. **Anomaly Detection:** An external system detects a performance regression
and groups related anomalies into an `AnomalyGroup`.
2. **Bisection (External):** A bisection process is triggered (potentially
externally) to find the commit(s) responsible for the anomalies in the
`AnomalyGroup`.
3. **Persist Culprit:** The bisection service calls the
`CulpritService.PersistCulprit` RPC with the identified `Commit`(s) and the
`AnomalyGroupID`.
- `culpritService` uses `culpritStore.Upsert` to save these commits as
`Culprit` records, linking them to the `AnomalyGroupID`.
- It then calls `anomalygroupStore.AddCulpritIDs` to update the
`AnomalyGroup` record with the IDs of these new culprits.
4. **Notify User of Culprit:** The bisection service (or another orchestrator)
then calls `CulpritService.NotifyUserOfCulprit` RPC with the `CulpritID`(s)
and the `AnomalyGroupID`.
- `culpritService` retrieves the full `Culprit` details and the associated
`Subscription`.
- `DefaultCulpritNotifier` is invoked:
- `MarkdownFormatter` generates the subject and body for the
notification.
- `IssueTrackerTransport` sends this formatted message to the issue
tracker, creating a new bug.
- The ID of the created bug is returned.
- `culpritService` calls `culpritStore.AddIssueId` to associate this bug
ID with the specific `Culprit` and `AnomalyGroupID`.
This flow ensures that culprits are stored, linked to their regressions, and
users are notified through the configured channels. The modular design allows
for flexibility in how each step (storage, formatting, transport) is
implemented.
# Module: /go/dataframe
## Module: /go/dataframe
The `dataframe` module provides the `DataFrame` data structure and related
functionality for handling and manipulating performance trace data. It is a core
component for querying, analyzing, and visualizing performance metrics within
the Skia Perf system.
**Key Design Principles:**
- **Tabular Data Representation:** Inspired by R's DataFrame, this module
represents performance data as a table. Rows correspond to individual traces
(identified by a structured key), and columns represent distinct commit
points or patch levels. This structure facilitates efficient querying and
analysis of time-series data across different configurations.
- **TraceSet and ParamSet:** A `DataFrame` encapsulates a `types.TraceSet`,
which is a map of trace keys to their corresponding performance values. It
also maintains a `paramtools.ReadOnlyParamSet`, which describes the unique
parameter key-value pairs present in the `TraceSet`. This allows for
efficient filtering and aggregation based on trace characteristics.
- **Commit-Centric Columns:** The columns of a `DataFrame` are defined by
`ColumnHeader` structs, each containing a commit offset and a timestamp.
This ties the performance data directly to specific points in the codebase's
history.
- **Data Retrieval Abstraction:** The `DataFrameBuilder` interface decouples
the `DataFrame` creation logic from the underlying data source. This allows
for different implementations to fetch data (e.g., from a database) while
providing a consistent API for consumers.
- **Efficiency for Common Operations:** The module provides functions for
common data manipulation tasks like merging DataFrames (`Join`), filtering
traces (`FilterOut`), slicing data (`Slice`), and compressing data by
removing empty columns (`Compress`). These operations are designed with
performance considerations in mind.
**Key Components and Files:**
- **`dataframe.go`**: This is the central file defining the `DataFrame` struct
and its associated methods.
- **`DataFrame` struct**:
- `TraceSet`: Stores the actual performance data, mapping trace keys
(strings representing parameter combinations like
",arch=x86,config=8888,") to `types.Trace` (slices of float32 values).
- `Header`: A slice of `*ColumnHeader` pointers, defining the columns of
the DataFrame. Each `ColumnHeader` links a column to a specific commit
(`Offset`) and its `Timestamp`.
- `ParamSet`: A `paramtools.ReadOnlyParamSet` that contains all unique
key-value pairs from the keys in `TraceSet`. This is crucial for
understanding the dimensions of the data and for building UI controls
for filtering. It's rebuilt by `BuildParamSet()`.
- `Skip`: An integer indicating if any commits were skipped during data
retrieval to keep the DataFrame size manageable (related to
`MAX_SAMPLE_SIZE`).
- **`DataFrameBuilder` interface**: Defines the contract for objects that
can construct `DataFrame` instances. This allows for different data
sources or retrieval strategies. Key methods include:
- `NewFromQueryAndRange`: Creates a DataFrame based on a query and a time
range.
- `NewFromKeysAndRange`: Creates a DataFrame for specific trace keys over
a time range.
- `NewNFromQuery` / `NewNFromKeys`: Creates a DataFrame with the N most
recent data points for matching traces or specified keys.
- `NumMatches` / `PreflightQuery`: Used to estimate the size of the data
that a query will return, often for UI feedback or to refine queries.
- **`ColumnHeader` struct**: Represents a single column in the DataFrame,
typically corresponding to a commit. It contains:
- `Offset`: A `types.CommitNumber` identifying the commit.
- `Timestamp`: The timestamp of the commit in seconds since the Unix
epoch.
- **Key Functions**:
- `NewEmpty()`: Creates an empty DataFrame.
- `NewHeaderOnly()`: Creates a DataFrame with populated headers (commits
within a time range) but no trace data. This can be useful for setting
up the structure before fetching actual data.
- `FromTimeRange()`: Retrieves commit information (headers and commit
numbers) for a given time range from a `perfgit.Git` instance. This is a
foundational step in populating the `Header` of a DataFrame.
- `MergeColumnHeaders()`: A utility function that takes two slices of
`ColumnHeader` and merges them into a single sorted slice, returning
mapping indices to reconstruct traces. This is essential for the `Join`
operation.
- `Join()`: Combines two DataFrames into a new DataFrame. It merges their
headers and trace data. If traces exist in one DataFrame but not the
other for a given key, missing data points (`vec32.MissingDataSentinel`)
are inserted. The `ParamSet` of the resulting DataFrame is the union of
the input ParamSets. `DataFrame A (Header: [C1, C3], TraceX: [v1, v3]) |
V DataFrame B (Header: [C2, C3], TraceX: [v2', v3']) | V Joined
DataFrame (Header: [C1, C2, C3], TraceX: [v1, v2', v3/v3']) (TraceY from
A or B padded with missing data)`
- `BuildParamSet()`: Recalculates the `ParamSet` for a DataFrame based on
the current keys in its `TraceSet`. This is called after operations like
`FilterOut` that might change the set of traces.
- `FilterOut()`: Removes traces from the `TraceSet` based on a provided
`TraceFilter` function. It then calls `BuildParamSet()` to update the
`ParamSet`.
- `Slice()`: Returns a new DataFrame that is a view into a sub-section of
the original DataFrame's columns. The underlying trace data is sliced,
not copied, for efficiency.
- `Compress()`: Creates a new DataFrame by removing any columns (and
corresponding data points in traces) that contain only missing data
sentinels across all traces. This helps in reducing data size and
focusing on relevant data points.
- **`dataframe_test.go`**: Contains unit tests for the functionality in
`dataframe.go`. These tests cover various scenarios, including empty
DataFrames, different merging and joining cases, filtering, slicing, and
compression. The tests often use `gittest` for creating mock Git
repositories to test time range queries.
- **`/go/dataframe/mocks/DataFrameBuilder.go`**: This file contains a mock
implementation of the `DataFrameBuilder` interface, generated using the
`testify/mock` library. This mock is used in tests of other packages that
depend on `DataFrameBuilder`, allowing them to simulate DataFrame creation
without needing a real data source or Git repository.
**Workflows:**
1. **Fetching Data for Display/Analysis:**
- A client (e.g., a web UI) specifies a query and a time range.
- An implementation of `DataFrameBuilder` (e.g., one that queries a
CockroachDB instance) uses `NewFromQueryAndRange`.
- Internally, this likely involves:
1. Resolving the time range to a list of commits using `FromTimeRange`
(which calls `perfgit.Git.CommitSliceFromTimeRange`). This populates
the `Header`.
2. Querying the data source for traces matching the query and falling
within the identified commit range.
3. Populating the `TraceSet`.
4. Building the `ParamSet` using `BuildParamSet()`.
- The resulting `DataFrame` is returned.
```
Client Request (Query, TimeRange)
|
V
DataFrameBuilder.NewFromQueryAndRange(ctx, begin, end, query, ...)
|
+-> FromTimeRange(ctx, git, begin, end, ...) // Get commit headers
| |
| V
| perfgit.Git.CommitSliceFromTimeRange()
| |
| V
| [ColumnHeader{Offset, Timestamp}, ...]
|
+-> DataSource.QueryTraces(query, commit_numbers) // Fetch trace data
| |
| V
| types.TraceSet
|
+-> DataFrame.BuildParamSet() // Populate ParamSet
|
V
DataFrame{Header, TraceSet, ParamSet}
```
2. **Joining DataFrames (e.g., from different sources or queries):**
- Two `DataFrame` instances, `dfA` and `dfB`, are available.
- `Join(dfA, dfB)` is called.
- `MergeColumnHeaders(dfA.Header, dfB.Header)` creates a unified header
and maps to align traces.
- A new `TraceSet` is built. For each key:
- If a key is in `dfA` but not `dfB`, its trace is copied, padded with
missing values for columns unique to `dfB`.
- If a key is in `dfB` but not `dfA`, its trace is copied, padded with
missing values for columns unique to `dfA`.
- If a key is in both, values are merged based on the unified header.
- The `ParamSet`s of `dfA` and `dfB` are combined.
- A new, joined `DataFrame` is returned.
3. **Filtering Data:**
- A `DataFrame` `df` exists.
- A `TraceFilter` function `myFilter` is defined (e.g., to remove traces
with all zero values).
- `df.FilterOut(myFilter)` is called.
- The method iterates through `df.TraceSet`. If `myFilter` returns `true`
for a trace, that trace is deleted from the `TraceSet`.
- `df.BuildParamSet()` is called to reflect the potentially reduced set of
parameters.
**Constants:**
- `DEFAULT_NUM_COMMITS`: Default number of commits to fetch when using methods
like `NewNFromQuery`. Set to 50.
- `MAX_SAMPLE_SIZE`: A limit on the number of commits (columns) a DataFrame
might contain, especially when downsampling. Set to 5000. (Note: The
`downsample` parameter in `FromTimeRange` is currently ignored, meaning this
might not be strictly enforced by that specific function directly but could
be a target for other parts of the system or future enhancements.)
# Module: /go/dfbuilder
The `dfbuilder` module is responsible for constructing `DataFrame` objects.
`DataFrames` are fundamental data structures in Perf, representing a collection
of performance traces (time series data) along with their associated parameters
and commit information. This module acts as an intermediary between the raw
trace data stored in a `TraceStore` and the higher-level analysis and
visualization components that consume `DataFrames`.
The core design revolves around efficiently fetching and organizing trace data
based on various querying criteria. This involves interacting with a
`perfgit.Git` instance to resolve commit ranges and timestamps, and a
`tracestore.TraceStore` to retrieve the actual trace data.
**Key Responsibilities and Components:**
- **`dfbuilder.go`**: This is the central file implementing the
`DataFrameBuilder` interface.
- **`builder` struct**: This struct holds the necessary dependencies like
`perfgit.Git`, `tracestore.TraceStore`, `tracecache.TraceCache`, and
configuration parameters (e.g., `tileSize`, `numPreflightTiles`,
`QueryCommitChunkSize`). It also maintains metrics for various DataFrame
construction operations.
- **Construction (`NewDataFrameBuilderFromTraceStore`)**: Initializes a
`builder` instance. An important configuration here is
`filterParentTraces`. If enabled, the builder will attempt to remove
redundant parent traces when child traces (more specific traces) exist.
For example, if traces for `test=foo,subtest=bar` and `test=foo` both
exist, the latter might be filtered out if `filterParentTraces` is true.
- **Fetching by Time Range and Query (`NewFromQueryAndRange`)**:
- **Why**: This is a common use case where users want to see traces
matching a specific query (e.g., `config=8888`) within a given time
period.
- **How**: 1. It first uses `dataframe.FromTimeRange` (which internally queries
`perfgit.Git`) to get a list of `ColumnHeader` (commit information)
and `CommitNumber`s within the specified time range. It also handles
downsampling if requested. 2. It then determines the relevant tiles to query from the `TraceStore`
based on the commit numbers (`sliceOfTileNumbersFromCommits`). 3. The core data fetching happens in the `new` method. This method
queries the `TraceStore` for matching traces _per tile_ concurrently
using `errgroup.Group` for parallelism. This is a key optimization
to speed up data retrieval, especially over large time ranges
spanning multiple tiles. 4. A `tracesetbuilder.TraceSetBuilder` is used to efficiently aggregate
the traces fetched from different tiles into a single
`types.TraceSet` and `paramtools.ParamSet`. 5. Finally, it constructs and returns a compressed `DataFrame`.
`NewFromQueryAndRange | -> dataframe.FromTimeRange (get commits in
time range from Git) | -> sliceOfTileNumbersFromCommits (determine
tiles to query) | -> new (concurrently query TraceStore for each
tile) | -> TraceStore.QueryTraces (for each tile) | ->
tracesetbuilder.Add (aggregate results) | -> tracesetbuilder.Build |
-> DataFrame.Compress`
- **Fetching by Keys and Time Range (`NewFromKeysAndRange`)**:
- **Why**: Used when the specific trace keys are already known, and data
for these keys is needed within a time range.
- **How**: Similar to `NewFromQueryAndRange` in terms of getting commit
information for the time range. However, instead of querying by a
`query.Query` object, it directly calls `TraceStore.ReadTraces` for each
relevant tile, providing the list of trace keys. Results are then
aggregated. This is generally faster if the exact trace keys are known
as it avoids the overhead of query parsing and matching within the
`TraceStore`.
- **Fetching N Most Recent Data Points (`NewNFromQuery`,
`NewNFromKeys`)**:
- **Why**: Often, users are interested in the N most recent data points
for a query or a set of keys, typically for displaying recent trends or
for alert evaluation.
- **How**: These methods work by iterating backward in time, tile by tile
(or by `QueryCommitChunkSize` if configured), until `N` data points are
collected for the matching traces. 1. It starts from a given `end` time (or the latest commit if `end` is
zero). 2. It determines an initial `beginIndex` and `endIndex` for commit
numbers. The `QueryCommitChunkSize` can influence this `beginIndex`
to fetch a larger chunk of commits at once, potentially improving
parallelism in the `new` method. 3. In a loop: - It fetches commit headers and indices for the current
`beginIndex`-`endIndex` range. - It calls the `new` method (for `NewNFromQuery`) or a similar
tile-based fetching logic (for `NewNFromKeys`) to get a
`DataFrame` for this smaller range. - It counts non-missing data points in the fetched `DataFrame`. If
no data is found for `maxEmptyTiles` consecutive attempts, it
stops to prevent searching indefinitely through sparse data. - It appends the data from the fetched `DataFrame` to the result
`DataFrame`, working backward from the `N`th slot. - It then adjusts `beginIndex` and `endIndex` to move to the
previous chunk of commits/tiles. 4. If `filterParentTraces` is enabled, it calls `filterParentTraces` to
remove redundant parent traces from the final `TraceSet`. 5. The resulting `DataFrame` might have traces of length less than `N`
if not enough data points were found. It trims the traces if
necessary. `NewNFromQuery (or NewNFromKeys) | -> findIndexForTime
(get commit number for 'end' time) | -> Loop (until N points are
found or maxEmptyTiles reached): | -> fromIndexRange (get commits
for current chunk) | -> new (or similar logic for keys) (fetch data
for this chunk) | -> Aggregate data into result DataFrame | ->
Update beginIndex/endIndex to previous chunk | -> [Optional]
filterParentTraces | -> Trim traces if fewer than N points found`
- **Preflighting Queries (`PreflightQuery`)**:
- **Why**: Before executing a potentially expensive query to fetch a full
`DataFrame`, it's useful to get an estimate of how many traces will
match and what the resulting `ParamSet` will look like. This allows UIs
to present filter options dynamically.
- **How**: 1. It fetches the latest tile number from the `TraceStore`. 2. It queries the `numPreflightTiles` most recent tiles (concurrently)
for trace IDs matching the query `q`. This uses `getTraceIds`, which
first attempts to fetch from `tracecache` and falls back to
`TraceStore.QueryTracesIDOnly`. 3. The trace IDs (which are `paramtools.Params`) found are used to
build up a `ParamSet`. 4. The count of matching traces from the tile with the most matches is
taken as the estimated count. 5. Crucially, for parameter keys _present in the input query `q`_, it
replaces the values in the computed `ParamSet` with _all_ values for
those keys from the `referenceParamSet`. This ensures that the UI
can still offer all possible filter options for parameters the user
has already started filtering on. 6. The resulting `ParamSet` is normalized. `PreflightQuery | ->
TraceStore.GetLatestTile | -> Loop (for numPreflightTiles,
concurrently): | -> getTraceIds (TileN, query) // Checks tracecache
first, then TraceStore.QueryTracesIDOnly | -> [If cache miss]
TraceStore.QueryTracesIDOnly | -> [If cache miss & tracecache
enabled] tracecache.CacheTraceIds | -> Aggregate Params into a new
ParamSet -> Update max count | -> Update ParamSet with values from
referenceParamSet for keys in the original query | -> Normalize
ParamSet`
- **Counting Matches (`NumMatches`)**:
- **Why**: A simpler version of `PreflightQuery` that only returns the
estimated number of matching traces.
- **How**: It queries the two most recent tiles using
`TraceStore.QueryTracesIDOnly` and returns the higher of the two counts.
- **Parent Trace Filtering (`filterParentTraces` function)**:
- **Why**: To reduce data redundancy and present a cleaner set of traces
to the user, especially in UIs where deeply nested subtests can create
many similar-looking parent traces.
- **How**: It uses `tracefilter.NewTraceFilter()`. For each trace key in
the input `TraceSet`:
1. The key is parsed into `paramtools.Params`.
2. A "path" is constructed from the parameter values based on a
predefined order of keys (e.g., "master", "bot", "benchmark",
"test", "subtest_1", ...).
3. This path and the original trace key are added to the `traceFilter`.
4. After processing all keys, `traceFilter.GetLeafNodeTraceKeys()`
returns only the keys corresponding to the most specific (leaf)
traces in the hierarchical structure implied by the paths.
5. A new `TraceSet` is built containing only these leaf node traces.
- **Caching (`getTraceIds`, `cacheTraceIdsIfNeeded`)**:
- **Why**: `QueryTracesIDOnly` can still be somewhat expensive if
performed frequently on the same tiles and queries (e.g., during
`PreflightQuery`). Caching the results (the list of matching trace
IDs/params) can significantly speed this up.
- **How**: The `getTraceIds` function first attempts to retrieve trace IDs
from the `tracecache.TraceCache`. If there's a cache miss or the cache
is not configured, it queries the `TraceStore`. If a database query was
performed and the cache is configured, `cacheTraceIdsIfNeeded` is called
to store the results in the cache for future requests. The cache key is
typically a combination of the tile number and the query string.
**Design Choices and Trade-offs:**
- **Tile-Based Processing**: The `TraceStore` organizes data into tiles. Most
`dfbuilder` operations that involve fetching data across a range of commits
are designed to process these tiles concurrently. This improves performance
by parallelizing I/O and computation.
- **`tracesetbuilder`**: This utility is used to efficiently merge trace data
coming from different tiles (which might have different sets of commits)
into a coherent `TraceSet` and `ParamSet`.
- **`QueryCommitChunkSize`**: This parameter in `NewNFromQuery` allows
fetching data in larger chunks than a single tile. This can increase
parallelism in the underlying `new` method call, but fetching too large a
chunk might lead to excessive memory usage or longer latency for the first
chunk.
- **`maxEmptyTiles` / `newNMaxSearch`**: When searching backward for N data
points, these constants prevent indefinite searching if the data is very
sparse or the query matches very few traces.
- **`singleTileQueryTimeout`**: This guards against queries on individual
tiles taking too long, which could happen with "bad" tiles containing
excessive data or due to backend issues. This is particularly important for
operations like `NewNFromQuery` or `PreflightQuery` which might issue many
such single-tile queries.
- **Caching for `PreflightQuery`**: `PreflightQuery` is often called by UIs to
populate filter options. Caching the results of `QueryTracesIDOnly` (which
provides the raw data for `ParamSet` construction in preflight) via
`tracecache` helps make these UI interactions faster.
- **Parent Trace Filtering**: This is an opinionated feature that aims to
improve usability by default. The specific heuristic for identifying
"parent" vs. "child" traces is based on a predefined order of parameter
keys.
The `dfbuilder_test.go` file provides comprehensive unit tests for these
functionalities, covering various scenarios including empty queries, queries
matching data in single or multiple tiles, N-point queries, and preflight
operations with and without caching. It uses `gittest` for creating a mock Git
history and `sqltest` (for Spanner) or mock implementations for the `TraceStore`
and `TraceCache`.
# Module: /go/dfiter
## `dfiter` Module Documentation
### Overview
The `dfiter` module is responsible for efficiently creating and providing
`dataframe.DataFrame` objects, which are fundamental data structures used in
regression detection within the Perf application. It acts as an iterator,
allowing consuming code to process DataFrames one by one. This is particularly
useful for performance reasons, as constructing and holding all possible
DataFrames in memory simultaneously could be resource-intensive.
The core purpose of `dfiter` is to abstract away the complexities of fetching
and structuring data from the underlying trace store and Git history. It ensures
that DataFrames are generated with the correct dimensions and data points based
on user-defined queries, commit ranges, and alert configurations.
### Design and Implementation Choices
The `dfiter` module employs a "slicing" strategy for generating DataFrames. This
means it typically fetches a larger, encompassing DataFrame from the
`dataframe.DataFrameBuilder` and then yields smaller, overlapping
sub-DataFrames.
**Why this approach?**
- **Efficiency:** Fetching a larger chunk of data once from the database (via
`DataFrameBuilder`) is often more efficient than making numerous small
queries. The slicing operation itself is a relatively cheap in-memory
operation.
- **Context for Regression Detection:** Regression detection algorithms often
need to look at data points before and after a specific commit (the "radius"
of an alert). The slicing approach naturally provides this sliding window of
context.
**Key Components and Responsibilities:**
- **`DataFrameIterator` Interface:**
- **Why:** Defines a standard contract for iterating over DataFrames. This
promotes loose coupling, allowing different implementations of DataFrame
generation if needed in the future, and simplifies how other parts of
the system consume DataFrames.
- **How:** It provides two methods:
- `Next() bool`: Advances the iterator to the next DataFrame. Returns
`true` if a next DataFrame is available, `false` otherwise.
- `Value(ctx context.Context) (*dataframe.DataFrame, error)`: Returns the
current DataFrame.
- **`dataframeSlicer` struct:**
- **Why:** This is the concrete implementation of `DataFrameIterator`. It
embodies the slicing strategy described above.
- **How:** It holds a reference to a larger, source `dataframe.DataFrame`
(`df`), the desired `size` of the sliced DataFrames (determined by
`alert.Radius`), and the current `offset` for slicing. The `Next()`
method checks if another slice of the specified `size` can be made, and
`Value()` performs the actual slicing using `df.Slice()`.
- **`NewDataFrameIterator` Function:**
- **Why:** This is the factory function for creating `DataFrameIterator`
instances. It encapsulates the logic for determining how the initial,
larger DataFrame should be fetched based on the input parameters.
- **How:**
* **Parameter Parsing:** Parses the input `queryAsString` into a
`query.Query` object.
* **Mode Determination (Implicit):** The function behaves differently
based on `domain.Offset`: - **`domain.Offset == 0` (Continuous/Sliding Window Mode):** - This mode is typically used for ongoing regression detection
across a range of recent commits. - It fetches a DataFrame of `domain.N` commits ending at
`domain.End`. - **Settling Time:** If `anomalyConfig.SettlingTime` is
configured, it adjusts `domain.End` to exclude very recent data
points that might not have "settled" (e.g., due to data
ingestion delays or pending backfills). This prevents alerts on
potentially incomplete or volatile fresh data. - The `dataframeSlicer` will then produce overlapping DataFrames
of size `2*alert.Radius + 1`. - **`domain.Offset != 0` (Specific Commit/Exact DataFrame Mode):** - This mode is used when analyzing a specific commit or a small,
fixed window around it (e.g., when a user clicks on a specific
point in a chart to see its details or re-runs detection for a
particular regression). - It aims to return a _single_ DataFrame. - The size of this DataFrame is `2*alert.Radius + 1`. - To determine the `End` time for fetching data, it calculates the
commit `alert.Radius` positions _after_ the `domain.Offset`.
This ensures the commit at `domain.Offset` is centered within
the radius. For example, if `domain.Offset` is commit 21 and
`alert.Radius` is 3, it will fetch data up to commit 24 (`21 +
3`). The resulting DataFrame will then contain commits `[18, 19,
20, 21, 22, 23, 24]`. This is a specific requirement to ensure
consistency with how different step detection algorithms expect
their input DataFrames.
* **Data Fetching:** Uses the injected `dataframe.DataFrameBuilder`
(`dfBuilder`) to construct the initial DataFrame
(`dfBuilder.NewNFromQuery`). This involves querying the trace store and
potentially Git history.
* **Data Sufficiency Check:** Verifies if the fetched DataFrame contains
enough data points (at least `2*alert.Radius + 1` commits). If not, it
returns `ErrInsufficientData`. This is crucial because regression
detection algorithms require a minimum amount of data to operate
correctly.
* **Metrics:** Records the number of floating-point values queried from
the database using
`metrics2.GetCounter("perf_regression_detection_floats")`. This helps in
monitoring the data processing load.
* **Iterator Instantiation:** Creates and returns a `dataframeSlicer`
instance initialized with the fetched DataFrame and the calculated slice
size.
- **`ErrInsufficientData`:**
- **Why:** A specific error type to indicate that while the queries were
successful, the available data didn't meet the minimum requirements
(e.g., not enough commits within the requested range or matching the
query). This allows calling code to handle this scenario gracefully,
perhaps by informing the user or adjusting parameters.
### Key Workflows
**1. Continuous Regression Detection (Sliding Window):**
This typically happens when `domain.Offset` is 0.
```
[Caller] [NewDataFrameIterator] [DataFrameBuilder]
| -- Request with query, domain (N, End), alert (Radius) --> | |
| | -- Parse query |
| | -- (If anomalyConfig.SettlingTime > 0) Adjust domain.End --> |
| | -- dfBuilder.NewNFromQuery(ctx, domain.End, q, domain.N) --> |
| | | -- Query TraceStore
| | | -- Build large DataFrame
| | | <----- DataFrame (df)
| | -- Check if len(df.Header) >= 2*Radius+1 |
| | -- (If insufficient) Return ErrInsufficientData ----------- |
| | -- Create dataframeSlicer(df, size=2*Radius+1, offset=0) |
| <----------------- DataFrameIterator (slicer) ------------- | |
[Caller] [dataframeSlicer]
| |
| -- it.Next() ---------------------------------------------> |
| | -- return offset+size <= len(df.Header)
| <------------------------------ true ---------------------- |
| -- it.Value() --------------------------------------------> |
| | -- subDf = df.Slice(offset, size)
| | -- offset++
| <-------------------------- subDf, nil -------------------- |
| -- (Process subDf) |
| ... (loop Next()/Value() until Next() returns false) ... |
```
**2. Specific Commit Analysis (Exact DataFrame):**
This typically happens when `domain.Offset` is non-zero.
```
[Caller] [NewDataFrameIterator] [Git] [DataFrameBuilder]
| -- Request with query, domain (Offset), alert (Radius) --> | | |
| | -- Parse query | |
| | -- targetCommitNum = domain.Offset + alert.Radius | |
| | -- perfGit.CommitFromCommitNumber(targetCommitNum) ------> | |
| | | -- Lookup commit |
| | <----------------------------- commitDetails, nil --------- | |
| | -- dfBuilder.NewNFromQuery(ctx, commitDetails.Timestamp, | |
| | q, n=2*Radius+1) ------------> | |
| | | | -- Query TraceStore
| | | | -- Build DataFrame (size 2*R+1)
| | <-------------------------------------------------------- DataFrame (df) ----- |
| | -- Check if len(df.Header) >= 2*Radius+1 | |
| | -- (If insufficient) Return ErrInsufficientData --------- | |
| | -- Create dataframeSlicer(df, size=2*Radius+1, offset=0) | |
| <----------------------- DataFrameIterator (slicer) ------ | | |
[Caller] [dataframeSlicer]
| |
| -- it.Next() ---------------------------------------------> |
| | -- return offset+size <= len(df.Header) (true for the first call)
| <------------------------------ true ---------------------- |
| -- it.Value() --------------------------------------------> |
| | -- subDf = df.Slice(offset, size) (returns the whole df)
| | -- offset++
| <-------------------------- subDf, nil -------------------- |
| -- (Process subDf) |
| -- it.Next() ---------------------------------------------> |
| | -- return offset+size <= len(df.Header) (false for subsequent calls)
| <------------------------------ false --------------------- |
```
This design allows for flexible and efficient generation of DataFrames tailored
to the specific needs of regression detection, whether it's scanning a wide
range of recent commits or focusing on a particular point in time. The use of an
iterator pattern also helps manage memory consumption by processing DataFrames
sequentially.
# Module: /go/dryrun
## Dryrun Module Documentation
### Overview
The `dryrun` module provides the capability to test an alert configuration and
preview the regressions it would identify without actually creating an alert or
sending notifications. This is a crucial tool for developers and performance
engineers to fine-tune alert parameters and ensure they accurately capture
relevant performance changes.
The core idea is to simulate the regression detection process for a given alert
configuration over a historical range of data. This allows users to iterate on
alert definitions, observe the potential impact of those definitions, and avoid
alert fatigue caused by poorly configured alerts.
### Responsibilities and Key Components
The primary responsibility of the `dryrun` module is to handle HTTP requests for
initiating and reporting the progress of these alert simulations.
#### Key Files and Components:
- **`dryrun.go`**: This is the heart of the `dryrun` module. It defines the
`Requests` struct, which manages the state and dependencies required for
processing dry run requests. It also contains the HTTP handler
(`StartHandler`) that orchestrates the dry run process.
- **`Requests` struct**:
- **Why**: Encapsulates all necessary dependencies (like `perfgit.Git` for
Git interactions, `shortcut.Store` for shortcut lookups,
`dataframe.DataFrameBuilder` for data retrieval, `progress.Tracker` for
reporting progress, and `regression.ParamsetProvider` for accessing
parameter sets) into a single unit. This promotes modularity and makes
it easier to manage and test the dry run functionality.
- **How**: It is instantiated via the `New` function, which takes these
dependencies as arguments. This allows for dependency injection, making
the component more testable and flexible.
- **`StartHandler` function**:
- **Why**: This is the entry point for initiating a dry run. It handles
the incoming HTTP request, validates the alert configuration, and kicks
off the asynchronous regression detection process.
- **How**:
1. It decodes the alert configuration from the HTTP request body.
2. It performs initial validation on the alert query and other
parameters. If validation fails, an error is immediately reported to
the client.
3. It uses a `progress.Tracker` to allow clients to monitor the status
of the long-running dry run operation.
4. Crucially, it launches the actual regression detection in a separate
goroutine. This is essential because regression detection can be a
time-consuming process, and blocking the HTTP handler would lead to
timeouts and poor user experience.
5. It defines a `detectorResponseProcessor` callback function. This
function is invoked by the underlying
`regression.ProcessRegressions` function whenever potential
regressions are found.
- **Why (callback)**: This design decouples the core regression
detection logic from the specifics of how dry run results are
formatted and reported. It allows the `regression` module to
focus on detection, while the `dryrun` module handles the
presentation and progress updates for the dry run scenario.
- **How (callback)**: The callback processes the raw
`ClusterResponse` objects from the regression detection,
converts them into user-friendly `RegressionAtCommit` structures
(which include commit details and the detected regression), and
updates the `Progress` object with these results. This enables
real-time feedback to the user as regressions are identified.
6. The `regression.ProcessRegressions` function is then called in the
goroutine, passing the alert request, the callback, and other
necessary dependencies. This function iterates through the relevant
data, applies the alert's clustering and detection logic, and
invokes the callback for each identified cluster.
7. The handler immediately responds to the client with the initial
`Progress` object, allowing the client to start polling for updates.
- **`RegressionAtCommit` struct**:
- **Why**: Provides a structured way to represent a regression found at a
specific commit. This includes both the commit information (`CID`) and
the details of the regression itself (`Regression`).
- **How**: It's a simple struct used for marshalling the results into JSON
for the client.
### Workflows
#### Dry Run Initiation and Processing:
```
Client (UI/API) --HTTP POST /dryrun/start with AlertConfig--> Requests.StartHandler
|
V
[Validate AlertConfig]
|
+----------------------------------+----------------------------------+
| (Validation Fails) | (Validation Succeeds)
V V
[Update Progress with Error] [Add to Progress Tracker]
| |
V V
Respond to Client with Error Progress Launch Goroutine: regression.ProcessRegressions(...)
|
V
[Iterate through data, detect regressions]
|
V
For each potential regression cluster:
Invoke `detectorResponseProcessor` callback
|
V
Callback: [Convert ClusterResponse to RegressionAtCommit]
[Update Progress with new RegressionAtCommit]
|
V
(Client polls for Progress updates)
|
V
When ProcessRegressions completes:
[Update Progress: Finished or Error]
```
The `StartHandler` effectively acts as a controller that receives the request,
performs initial setup and validation, and then delegates the heavy lifting of
regression detection to the `regression.ProcessRegressions` function, ensuring
the HTTP request can return quickly while the background processing continues.
The callback mechanism allows the `dryrun` module to react to findings from the
`regression` module in a way that's specific to the dry run use case (i.e.,
accumulating and formatting results for client display).
# Module: /go/favorites
## Favorites Module
The `favorites` module provides functionality for users to save and manage
"favorite" configurations or views within the Perf application. This allows
users to quickly return to specific data explorations or commonly used settings.
The core design philosophy is to provide a persistent storage mechanism for
user-specific preferences related to application state (represented as URLs).
This is achieved through a `Store` interface, which abstracts the underlying
data storage, and a concrete SQL-based implementation.
### Key Components and Responsibilities
- **`store.go`**: This file defines the central `Store` interface.
- **Why**: The interface decouples the business logic of managing
favorites from the specific database implementation. This promotes
testability (using mocks) and allows for potential future changes to the
storage backend without impacting the core application logic.
- **How**: It specifies the fundamental CRUD (Create, Read, Update,
Delete) operations for favorites, along with a `List` operation to
retrieve all favorites for a specific user and a `Liveness` check.
- `Favorite`: This struct represents a single favorite item, containing
fields like `ID`, `UserId`, `Name`, `Url`, `Description`, and
`LastModified`. The `Url` is a key piece of data, as it allows the
application to reconstruct the state the user wants to save.
- `SaveRequest`: This struct is used for creating and updating favorites,
encapsulating the data needed for these operations, notably excluding
the `ID` (which is generated or already known) and `LastModified` (which
is handled by the store).
- **Liveness**: The `Liveness` method is a bit of an outlier. It's used to
check the health of the database connection. It was placed in this store
"arbitrarily because of its lack of essential function" in other more
critical stores, making it a relatively safe place to perform this check
without impacting core performance data operations.
- **`sqlfavoritestore/sqlfavoritestore.go`**: This file provides the SQL
implementation of the `Store` interface.
- **Why**: CockroachDB (or a similar SQL database) is used as the
persistent storage for favorites. This choice provides a robust,
scalable, and transactional way to manage this data.
- **How**:
- It defines SQL statements for each operation in the `Store` interface.
These statements interact with a `Favorites` table.
- The `FavoriteStore` struct holds a database connection pool
(`pool.Pool`).
- Methods like `Get`, `Create`, `Update`, `Delete`, and `List` execute
their corresponding SQL statements against the database.
- Timestamps (`LastModified`) are handled automatically during create and
update operations to track when a favorite was last changed.
- Error handling is done using `skerr.Wrapf` to provide context to any
database errors.
- **`sqlfavoritestore/schema/schema.go`**: This file defines the SQL schema
for the `Favorites` table.
- **Why**: It provides a structured, Go-based representation of the
database table. This can be useful for schema management, migrations,
and ORM-like interactions (though a full ORM isn't used here).
- **How**: The `FavoriteSchema` struct uses struct tags (`sql:"..."`) to
define column names, types, constraints (like `PRIMARY KEY`, `NOT
NULL`), and indexes. The `byUserIdIndex` is crucial for efficiently
listing favorites for a specific user.
- **`mocks/Store.go`**: This file contains a generated mock implementation of
the `Store` interface.
- **Why**: Mocks are essential for unit testing components that depend on
the `Store` interface. They allow tests to simulate different store
behaviors (e.g., successful operations, errors) without requiring an
actual database connection.
- **How**: This file is auto-generated by the `mockery` tool. It provides
a `Store` struct that embeds `mock.Mock` from the `testify` library.
Each method of the interface has a corresponding mock function that can
be configured to return specific values or errors.
### Key Workflows
**1. Creating a New Favorite:**
```
User Action (e.g., clicks "Save as Favorite" in UI)
|
V
Application Handler
|
V
[favorites.Store.Create] is called with user ID, name, URL, description
|
V
[sqlfavoritestore.FavoriteStore.Create]
|
V
Generates current timestamp for LastModified
|
V
Executes INSERT SQL statement:
INSERT INTO Favorites (user_id, name, url, description, last_modified) VALUES (...)
|
V
Database stores the new favorite record
|
V
Returns success/error to Application Handler
```
**2. Listing User's Favorites:**
```
User navigates to "My Favorites" page
|
V
Application Handler
|
V
[favorites.Store.List] is called with the current user's ID
|
V
[sqlfavoritestore.FavoriteStore.List]
|
V
Executes SELECT SQL statement:
SELECT id, name, url, description FROM Favorites WHERE user_id=$1
|
V
Database returns rows matching the user ID
|
V
[sqlfavoritestore.FavoriteStore.List] scans rows into []*favorites.Favorite
|
V
Returns list of favorites to Application Handler
|
V
UI displays the list
```
**3. Retrieving a Specific Favorite (e.g., when a user clicks on a favorite to
load it):**
```
User clicks on a specific favorite in their list
|
V
Application Handler (obtains favorite ID)
|
V
[favorites.Store.Get] is called with the favorite ID
|
V
[sqlfavoritestore.FavoriteStore.Get]
|
V
Executes SELECT SQL statement:
SELECT id, user_id, name, url, description, last_modified FROM Favorites WHERE id=$1
|
V
Database returns the single matching favorite row
|
V
[sqlfavoritestore.FavoriteStore.Get] scans row into a *favorites.Favorite struct
|
V
Returns the favorite object to Application Handler
|
V
Application uses the `Url` from the favorite object to restore the application state
```
# Module: /go/file
The `file` module and its submodules are responsible for providing a unified
interface for accessing files from different sources, such as local directories
or Google Cloud Storage (GCS). This abstraction allows the Perf ingestion system
to treat files consistently regardless of their origin.
## Core Concepts
The central idea is to define a `file.Source` interface that abstracts the
origin of files. Implementations of this interface are then responsible for
monitoring their respective sources (e.g., a GCS bucket via Pub/Sub
notifications, or a local directory) and emitting `file.File` objects through a
channel when new files become available.
The `file.File` struct encapsulates the essential information about a file: its
name, an `io.ReadCloser` for its contents, its creation timestamp, and
optionally, the associated `pubsub.Message` if the file originated from a GCS
Pub/Sub notification. This optional field is crucial for acknowledging the
message after successful processing, or nack'ing it if an error occurs, ensuring
reliable message handling in a distributed system.
### `file.go`
This file defines the core `File` struct and the `Source` interface.
- **`File` struct:** Represents a single file.
- `Name`: The identifier for the file (e.g., `gs://bucket/object` or a
local path).
- `Contents`: An `io.ReadCloser` to read the file's content. This design
allows for streaming file data, which is memory-efficient, especially
for large files. The consumer is responsible for closing this reader.
- `Created`: The timestamp when the file was created or last modified
(depending on the source).
- `PubSubMsg`: A pointer to a `pubsub.Message`. This is populated if the
file notification came from a Pub/Sub message (e.g., GCS object change
notifications). It's used to `Ack` or `Nack` the message, indicating
successful processing or a desire to retry/dead-letter.
- **`Source` interface:** Defines the contract for file sources.
- `Start(ctx context.Context) (<-chan File, error)`: This method initiates
the process of watching for new files. It returns a read-only channel
(`<-chan File`) through which `File` objects are sent as they are
discovered. The method is designed to be called only once per `Source`
instance. This design ensures that the resource setup and monitoring
logic (like starting a Pub/Sub subscription listener or initiating a
directory walk) is done once.
## Implementations of `file.Source`
### `dirsource`
The `dirsource` submodule provides an implementation of `file.Source` that reads
files from a local filesystem directory.
- **Purpose:** Primarily intended for testing and demonstration purposes. It
allows developers to simulate file ingestion locally without needing to set
up GCS or Pub/Sub.
- **Mechanism:**
- `New(dir string)`: Constructs a `DirSource` for a given directory path.
It resolves the path to an absolute path.
- `Start(_ context.Context)`: When called, it initiates a `filepath.Walk`
over the specified directory.
- For each regular file encountered, it opens the file and creates a
`file.File` object.
- The `ModTime` of the file is used as the `Created` timestamp, which is a
known simplification for its intended use cases.
- The `file.File` objects are sent to an unbuffered channel.
- The channel is closed after the directory walk is complete.
- **Limitations:**
- It performs a one-time walk of the directory. It does not watch for new
files or changes to existing files after the initial walk.
- It uses the file's modification time as the creation time.
- **Workflow:** `New(directory) -> DirSource instance | V DirSource.Start()
--> Goroutine starts | V filepath.Walk(directory) | +----------------------+
| | V V For each file: For each directory: os.Open(path) (skip) Create
file.File{Name, Contents, ModTime} Send file.File to channel | V Caller
receives file.File from channel`
### `gcssource`
The `gcssource` submodule implements `file.Source` for files stored in Google
Cloud Storage, using Pub/Sub notifications for new file events.
- **Purpose:** This is the production-grade implementation for ingesting files
from GCS. It's designed to be robust and scalable.
- **Mechanism:**
- `New(ctx context.Context, instanceConfig *config.InstanceConfig)`:
- Initializes GCS and Pub/Sub clients using default application
credentials.
- Constructs a Pub/Sub subscription name. It can either use a
pre-configured `Subscription` name from `instanceConfig` or generate one
based on the `Topic` (often adding a suffix like `-prod` or using a
round-robin scheme for load distribution if multiple ingester instances
are running).
- Creates a `sub.Subscription` object to manage receiving messages from
the configured Pub/Sub topic/subscription. A key configuration here is
`ReceiveSettings.MaxExtension = -1`. This disables automatic ack
deadline extension by the Pub/Sub client library. The rationale is that
the `gcssource` itself will explicitly `Ack` or `Nack` messages. If
automatic extension were enabled and the processing of a file took
longer than the extension period, the message might be redelivered while
still being processed, leading to duplicate processing or other issues.
By disabling it, the ingester has full control over the message
lifecycle.
- Initializes a `filter.Filter` based on `AcceptIfNameMatches` and
`RejectIfNameMatches` regular expressions provided in the
`instanceConfig`. This allows for fine-grained control over which files
are processed based on their GCS object names.
- Determines if dead-lettering is enabled based on the instance
configuration.
- `Start(ctx context.Context)`:
- Creates the output channel for `file.File` objects.
- Launches a goroutine that continuously calls
`s.subscription.Receive(ctx, s.receiveSingleEventWrapper)`.
- The `Receive` method blocks until a message is available or the
context is cancelled.
- `receiveSingleEventWrapper` is called for each Pub/Sub message.
- **File Event Processing (`receiveSingleEventWrapper` and
`receiveSingleEvent`):**
1. **Deserialize Event:** The Pub/Sub message `Data` is expected to be a
JSON payload describing a GCS object event (specifically, `{"bucket":
"...", "name": "..."}`).
2. **Filename Construction:** A `gs://` URI is constructed from the bucket
and name.
3. **Filename Filtering:** The `filter.Filter` (configured with regexes) is
applied. If the filename is rejected, the message is acked (as there's
no point retrying), and processing stops for this event.
4. **Source Prefix Check:** The filename is checked against the `Sources`
list in `instanceConfig.IngestionConfig.SourceConfig.Sources`. These are
typically `gs://` prefixes. If the filename doesn't match any of these
prefixes, it's considered an unexpected file, the message is acked, and
processing stops. This ensures that the ingester only processes files
from explicitly configured GCS locations.
5. **Fetch GCS Object Attributes:** `obj.Attrs(ctx)` is called to get
metadata like the creation time. If this fails (e.g., object deleted
between notification and processing, or transient network error), the
message is nacked (if dead-lettering is not enabled) or handled by the
dead-letter policy, as retrying might succeed.
6. **Stream GCS Object Contents:** `obj.NewReader(ctx)` is called to get an
`io.ReadCloser` for the file's content. If this fails, the message is
nacked (or dead-lettered).
7. **Send `file.File`:** A `file.File` struct is created with the GCS path,
the reader, the `attrs.Created` time, and the original `pubsub.Message`.
This `file.File` is sent to the `fileChannel`.
8. **Message Acknowledgement:**
- The `receiveSingleEvent` function returns `true` if the initial
stages of processing (up to sending to the channel) were successful
and the message should be acked from Pub/Sub's perspective (meaning
it was valid, filtered appropriately, and the object was
accessible). It returns `false` for transient errors where a retry
might help (e.g., failing to get object attributes or reader).
- The `receiveSingleEventWrapper` then uses this boolean:
- If dead-lettering is enabled (`s.deadLetterEnabled`):
- If `receiveSingleEvent` returned `false` (transient error or
should retry), the message is `Nack()`-ed. This typically sends
it to a dead-letter topic if configured, or allows Pub/Sub to
redeliver it after a backoff.
- If `receiveSingleEvent` returned `true`, the message is _not_
explicitly `Ack()`-ed here. The acknowledgement is deferred to
the consumer of the `file.File` (i.e., the ingester). This is a
critical design choice: the message is only truly "done" when
the file content has been fully processed by the downstream
system.
- If dead-lettering is _not_ enabled:
- If `receiveSingleEvent` returned `true`, the message is
`Ack()`-ed.
- If `receiveSingleEvent` returned `false`, the message is
`Nack()`-ed.
- **Key Design Choices:**
- **Decoupling from Pub/Sub Ack/Nack:** The `gcssource` itself doesn't
always immediately `Ack` messages upon successful GCS interaction.
Instead, it passes the `*pubsub.Message` along in the `file.File`
struct. This allows the ultimate consumer of the file's content (e.g.,
the Perf ingestion pipeline) to `Ack` the message only after it has
successfully processed and stored the data. This provides end-to-end
processing guarantees. If processing fails downstream, the message can
be `Nack`-ed, leading to a retry or dead-lettering.
- **Filtering:** Multiple layers of filtering (regex-based `filter.Filter`
and prefix-based `SourceConfig.Sources`) ensure that only desired files
are processed.
- **Error Handling:** Distinguishes between errors that warrant an `Ack`
(e.g., file explicitly filtered out) and those that warrant a `Nack`
(e.g., transient GCS errors), especially when dead-letter queues are in
use.
- **Scalability:** Uses a configurable number of parallel receivers
(`maxParallelReceives`) for Pub/Sub messages, although currently set
to 1. This can be tuned for performance.
- **Workflow (Simplified):** `New(config) -> GCSSource instance (GCS/PubSub
clients, filter initialized) | V GCSSource.Start() --> Goroutine starts
PubSub subscription.Receive loop | V PubSub message arrives | V
receiveSingleEventWrapper(msg) | V receiveSingleEvent(msg) | +-> Deserialize
msg data (JSON: bucket, name) -> Error? Ack, return. | +-> Filter filename
(regex) -> Rejected? Ack, return. | +-> Check if filename matches
config.Sources prefixes -> No match? Ack, return. | +-> GCS:
storageClient.Object(bucket, name).Attrs() -> Error? Nack (retryable),
return. | +-> GCS: object.NewReader() -> Error? Nack (retryable), return. |
V Create file.File{Name, Contents, Created, PubSubMsg: msg} Send file.File
to fileChannel | V Caller receives file.File from channel (Caller later
Acks/Nacks msg via file.File.PubSubMsg)`
This modular approach to file sourcing makes the Perf ingestion system flexible
and easier to test and maintain. New file sources can be added by simply
implementing the `file.Source` interface.
# Module: /go/filestore
The `filestore` module provides an abstraction layer for interacting with
different file storage systems. It defines a common interface, leveraging Go's
`io/fs.FS`, allowing the application to read files regardless of whether they
are stored locally or in a cloud storage service like Google Cloud Storage
(GCS). This design promotes flexibility and testability by decoupling file
access logic from the specific storage implementation.
The primary goal is to enable Perf, the performance monitoring system, to
seamlessly access data files from various sources. Perf often deals with large
datasets and trace files, which might be stored in GCS for scalability and
durability or locally during development and testing. By using this module, Perf
components can be written to consume data using the standard `fs.FS` interface
without needing to know the underlying storage details.
Key components:
- **`local`**: This submodule provides an implementation of `fs.FS` for the
local file system.
- **Why**: It's essential for local development, testing, and scenarios
where data is directly available on the machine running Perf.
- **How**: The `local.New(rootDir string)` function initializes a
`filesystem` struct. This struct stores the absolute path to a `rootDir`
and uses `os.DirFS(rootPath)` to create an `fs.FS` instance scoped to
that directory. When `Open(name string)` is called, it calculates the
path relative to `rootDir` and then uses the underlying `os.DirFS` to
open the file. This ensures that file access is contained within the
specified root directory.
- The `local.go` file contains the `filesystem` struct and its methods.
The core logic resides in the `New` function for initialization and the
`Open` method for file access. `filepath.Abs` and `filepath.Rel` are
used to correctly handle and relativize paths.
- **`gcs`**: This submodule implements `fs.FS` for Google Cloud Storage.
- **Why**: GCS is a common choice for storing large amounts of data in a
scalable and accessible manner. Perf relies on GCS for storing trace
files and other performance artifacts.
- **How**: The `gcs.New(ctx context.Context)` function initializes a
`filesystem` struct. It authenticates with GCS using
`google.DefaultTokenSource` to obtain an OAuth2 token source and then
creates a `*storage.Client`. The `Open(name string)` method expects a
GCS URI (e.g., `gs://bucket-name/path/to/file`). It parses this URI into
a bucket name and object path using `parseNameIntoBucketAndPath`. Then,
it uses the `storage.Client` to get a `*storage.Reader` for the
specified object. This reader is wrapped in a custom `file` struct which
implements `fs.File`.
- The `gcs.go` file defines the `filesystem` struct, which holds the
`*storage.Client`, and the `file` struct, which wraps `*storage.Reader`.
The `New` function handles GCS client initialization and authentication.
The `Open` method is responsible for parsing GCS URIs and obtaining a
reader for the object. Notably, the `Stat()` method for `gcs.file` is
intentionally not implemented (returns `ErrNotImplemented`) because
Perf's current usage patterns do not require it, simplifying the
implementation. The `parseNameIntoBucketAndPath` helper function is
crucial for translating the GCS URI format into the bucket and object
path components required by the GCS client library.
**Workflow: Opening a File (Conceptual)**
The client code (e.g., a component within Perf) would typically decide which
filestore implementation to use based on configuration or the nature of the file
path.
1. **Initialization**:
- For local files: `fsImpl, err := local.New("/path/to/data/root")`
- For GCS files: `fsImpl, err := gcs.New(context.Background())`
2. **File Access**:
- The client calls `file, err :=
fsImpl.Open("relative/path/to/file.json")`(for local) or`file, err :=
fsImpl.Open("gs://my-bucket/data/some_trace.json")` (for GCS).
3. **Behind the Scenes**:
- **Local**: `local.Open("relative/path/to/file.json") | V Calculates
absolute path based on rootDir | V Calls
os.DirFS(rootDir).Open("relative/path/to/file.json") | V Returns fs.File
(os.File)` - **GCS**:`gcs.Open("gs://my-bucket/data/some_trace.json") | V
parseNameIntoBucketAndPath("gs://my-bucket/data/some_trace.json") -->
"my-bucket", "data/some_trace.json" | V
gcsClient.Bucket("my-bucket").Object("data/some_trace.json").NewReader()
| V Wraps storage.Reader in gcs.file | V Returns fs.File (gcs.file)`
4. **Reading Data**:
- The client can then use the returned `fs.File` (e.g.,
`file.Read(buffer)`) in a standard way, irrespective of whether it's an
`os.File` or a `gcs.file` wrapping a `storage.Reader`.
This abstraction allows Perf to be agnostic to the underlying storage mechanism
when reading files, simplifying its data processing pipelines.
# Module: /go/frontend
The `frontend` module serves as the backbone for the Perf web UI. It's
responsible for handling HTTP requests, rendering HTML templates, and
interacting with various backend services and data stores to provide a
comprehensive performance analysis platform.
The design philosophy emphasizes a separation of concerns. The core
`frontend.go` file initializes and wires together various components, while the
`api` subdirectory houses specific handlers for different categories of user
interactions (e.g., alerts, graphs, regressions). This modular approach
simplifies development, testing, and maintenance.
**Key Components and Responsibilities:**
- **`frontend.go`**:
- **Initialization (`New`, `initialize`)**: This is the entry point. It
sets up logging, metrics, reads configuration (`config.Config`),
initializes database connections (TraceStore, AlertStore,
RegressionStore, etc.), and establishes connections to external services
like Git and potentially Chrome Perf.
- **Template Handling (`loadTemplates`, `templateHandler`)**: It loads
HTML templates from the `dist` directory (produced by the build system).
These templates are Go templates, allowing for dynamic data injection.
Snippets for Google Analytics (`googleanalytics.html`) and cookie
consent (`cookieconsent.html`) are embedded and can be included in the
rendered pages.
- **Page Context (`getPageContext`)**: This crucial function generates a
JavaScript object (`window.perf`) that is embedded in every HTML page.
This object contains configuration values and settings that the
client-side JavaScript needs to function correctly, such as API URLs,
display preferences, and feature flags. This avoids hardcoding such
values in the JavaScript and allows for easier configuration.
- **Routing (`GetHandler`, `getFrontendApis`)**: It defines the HTTP
routes and associates them with their respective handler functions. This
is where `chi` router is configured. It also instantiates and registers
all the API handlers from the `api` sub-module.
- **Authentication and Authorization (`loginProvider`,
`RoleEnforcedHandler`)**: It integrates with an authentication system
(e.g., `proxylogin`) to determine user identity and roles.
`RoleEnforcedHandler` is a middleware to protect certain endpoints based
on user roles.
- **Long-Running Task Management (`progressTracker`)**: For operations
that might take a significant amount of time (e.g., generating complex
data frames for graphs, running regression detection), it uses a
`progress.Tracker`. This allows the frontend to initiate a task, return
an ID to the client, and let the client poll for status and results,
preventing HTTP timeouts for long operations.
- _Workflow Example (Frame Request):_
1. Client POSTs to `/_/frame/start` with query details.
2. `frameStartHandler` creates a `progress` object, adds it to
`progressTracker`.
3. A goroutine is launched to process the frame request using
`frame.ProcessFrameRequest`.
4. `frameStartHandler` immediately returns the `progress` object's ID.
5. Client polls `/_/status/{id}`.
6. Client fetches results from `/_/frame/results/{id}` (managed by
`progressTracker`) once finished.
- **Redirections (`gotoHandler`, old URL handlers)**: Handles redirects
for old URLs to new ones and provides a `/g/` endpoint to navigate to
specific views based on a Git hash.
- **Liveness Probe (`liveness`)**: Provides a `/liveness` endpoint that
checks the health of critical dependencies (like the database
connection) for Kubernetes.
- **`api` (subdirectory)**: This directory contains the specific HTTP handlers
for various features of Perf. Each API is typically encapsulated in its own
file (e.g., `alertsApi.go`, `graphApi.go`) and implements the `FrontendApi`
interface, primarily its `RegisterHandlers` method. This design promotes
modularity.
- **`alertsApi.go`**: Manages CRUD operations for alert configurations
(`alerts.Alert`). It interacts with `alerts.ConfigProvider` (for
fetching configurations, potentially cached) and `alerts.Store` (for
persistence). It also handles trying out bug filing and notification
sending for alerts. Includes endpoints to list subscriptions and manage
dry-run requests for alert configurations.
- **`anomaliesApi.go`**: Provides endpoints for fetching anomaly data. It
has two modes of operation:
- **Legacy (Chromeperf-backed)**: Proxies requests to an external
Chromeperf instance for sheriff lists, anomaly lists, and group reports.
This was likely an initial integration or for instances that rely on
Chromeperf's anomaly detection. The test name cleaning logic
(`cleanTestName`) addresses potential incompatibilities in test naming
conventions or characters between systems.
- **Skia-internal**: Fetches sheriff (subscription) lists and associated
alerts directly from the instance's own database (`subscription.Store`,
`alerts.Store`). This allows Perf instances to manage their own anomaly
data.
- **`favoritesApi.go`**: Manages user-specific and instance-wide favorite
links. User favorites are stored in `favorites.Store`, while
instance-wide favorites can be defined in the main configuration file
(`config.Config.Favorites`). It provides endpoints to list, create,
delete, and update favorites.
- **`graphApi.go`**: Handles requests related to plotting graphs.
- **Frame Requests (`frameStartHandler`)**: As described above, this
initiates the potentially long process of fetching trace data and
constructing a `dataframe.DataFrame`. It uses
`dfbuilder.DataFrameBuilder` for this.
- **Commit Information (`cidHandler`, `cidRangeHandler`,
`shiftHandler`)**: Provides details about specific commits or ranges of
commits by interacting with `perfgit.Git`.
- **Trace Details (`detailsHandler`, `linksHandler`)**: Fetches raw data
or metadata for a specific trace point at a particular commit. This
involves reading from `tracestore.TraceStore` and potentially the
`ingestedFS` (filesystem where raw ingested data is stored) to get
information like associated benchmark links from the original JSON
files.
- **`pinpointApi.go`**: Facilitates interaction with the Pinpoint
bisection service. It allows users to create bisection jobs (to identify
the commit that caused a performance regression) or try jobs (to test a
patch). It can proxy requests to a legacy Pinpoint service or a newer
backend service.
- **`queryApi.go`**: Supports the query construction UI.
- **Parameter Set (`initpageHandler`, `getParamSet`)**: Provides the
initial set of queryable parameters (keys and their possible values) to
populate the UI. This uses `psrefresh.ParamSetRefresher` which
periodically updates this canonical paramset based on recent data,
ensuring the UI reflects available data.
- **Query Preflighting/Counting (`countHandler`,
`nextParamListHandler`)**: As the user builds a query in the UI, these
handlers can estimate the number of matching traces or provide the next
relevant parameter values based on the current partial query. This gives
users immediate feedback. The `nextParamListHandler` is tailored for UIs
where parameter selection is ordered (e.g., Chromeperf's UI).
- **`regressionsApi.go`**: Deals with detected regressions.
- **Listing/Counting Regressions (`regressionRangeHandler`,
`regressionCountHandler`, `alertsHandler`, `regressionsHandler`)**:
Fetches regression data from `regression.Store` based on time ranges,
alert configurations, or subscriptions. It can filter by user ownership
or category.
- **Triage (`triageHandler`)**: Allows users (editors) to mark regressions
as triaged (e.g., "positive", "negative", "ignored") and associate them
with bug reports. If a regression is marked as negative, it can generate
a bug report URL using a configurable template.
- **Manual Clustering (`clusterStartHandler`)**: Allows users to initiate
the regression detection process for a specific query or set of
parameters. This is also a long-running operation managed by
`progressTracker`.
- **Anomaly/Group Redirection (`anomalyHandler`,
`alertGroupQueryHandler`)**: Provides redirect URLs to the appropriate
graph view for a given anomaly ID or alert group ID from Chromeperf.
This involves generating graph shortcuts.
- **`sheriffConfigApi.go`**: Handles interactions related to LUCI Config
for sheriff configurations.
- **Metadata (`getMetadataHandler`)**: Provides metadata to LUCI Config,
indicating which configuration files (e.g., `skia-sheriff-configs.cfg`)
Perf owns and the URL for validating changes to these files. This is
part of an automated config management system.
- **Validation (`validateConfigHandler`)**: Receives configuration content
from LUCI Config and validates it (e.g., using
`sheriffconfig.ValidateContent`). Returns success or a structured error
message.
- **`shortcutsApi.go`**: Manages the creation and retrieval of shortcuts.
- **Key Shortcuts (`keysHandler`)**: Allows storing a set of trace keys
(queries) and getting a short ID for them. This is used, for example, by
the "Share" button on the explore page.
- **Graph Shortcuts (`getGraphsShortcutHandler`,
`createGraphsShortcutHandler`)**: Manages shortcuts for more complex
graph configurations, which can include multiple queries and formulas.
These are used for sharing multi-graph views.
- **`triageApi.go`**: Provides endpoints for triaging anomalies,
specifically those originating from or managed by Chromeperf. This
includes filing new bugs, associating anomalies with existing bugs, and
performing actions like ignoring or nudging anomalies. It interacts with
`chromeperf.ChromePerfClient` and potentially an
`issuetracker.IssueTracker` implementation.
- **`userIssueApi.go`**: Manages user-reported issues (Buganizer
annotations) associated with specific data points (a trace at a commit).
This allows users to link external bug reports directly to performance
data points in the UI. It uses `userissue.Store` for persistence.
The overall goal of the `frontend` module is to provide a responsive and
informative user interface by efficiently querying and presenting performance
data, while also enabling users to configure alerts, triage regressions, and
collaborate on performance analysis. The interaction with various stores and
services is abstracted to keep the request handling logic focused.
# Module: /go/git
The `go/git` module provides an abstraction layer for interacting with Git
repositories. It is designed to efficiently retrieve and cache commit
information, which is essential for performance analysis in Skia Perf. The
primary goal is to offer a consistent interface for accessing commit data,
regardless of whether the underlying data source is a local Git checkout or a
remote Gitiles API.
**Design Decisions and Implementation Choices:**
- **Database Caching:** To avoid repeated and potentially slow Git operations,
commit information is cached in an SQL database. This allows for quick
lookups of commit details, commit numbers, and commit ranges. The schema for
this database is defined in `/go/git/schema/schema.go`.
- **Provider Abstraction:** The module utilizes a `provider.Provider`
interface (defined in `/go/git/provider/provider.go`). This allows for
different implementations of how Git data is fetched. Currently, two
providers are implemented:
- `git_checkout`: Interacts with a local Git repository by shelling out to
`git` commands. This is suitable for environments where a local checkout
is available and preferred.
- `gitiles`: Uses the Gitiles API to fetch commit data. This is useful
when direct repository access is not feasible or when leveraging
Google's infrastructure for Git operations. The choice of provider is
determined by the instance configuration, as seen in
`/go/git/providers/builder.go`.
- **Commit Numbering:**
- **Sequential:** By default, the system assigns sequential integer
`CommitNumber`s to commits as they are ingested. This provides a simple,
ordered way to refer to commits.
- **Repo-Supplied:** The system can also be configured to extract commit
numbers directly from commit messages using a regular expression
(specified in `instanceConfig.GitRepoConfig.CommitNumberRegex`). This is
useful for repositories like Chromium that embed a commit position in
their messages. The `repoSuppliedCommitNumber` flag in `impl.go`
controls this behavior.
- **LRU Cache:** In addition to the database cache, an in-memory LRU (Least
Recently Used) cache (`cache` in `impl.go`) is used for frequently accessed
commit details (`CommitFromCommitNumber`). This further speeds up lookups
for commonly requested commits. The size of this cache is defined by
`commitCacheSize`.
- **Background Polling:** The `StartBackgroundPolling` method in `impl.go`
initiates a goroutine that periodically calls the `Update` method. This
ensures that the local database cache stays synchronized with the remote
repository.
- **SQL Statements:** All SQL queries are predefined as constants in
`impl.go`. This helps in organizing and managing the queries. Separate
statements are defined for different SQL dialects if needed (e.g., `insert`
vs `insertSpanner`).
- **Error Handling:** The `BadCommit` constant provides a sentinel value for
functions returning `provider.Commit` to indicate an error or an invalid
commit.
**Key Responsibilities and Components:**
- **`interface.go` (Git Interface):**
- Defines the `Git` interface, which is the public contract for this
module. It specifies all the operations that can be performed to
retrieve commit information.
- This interface decouples the consumers of Git data from the specific
implementation details (e.g., whether data comes from a local repo or
Gitiles).
- **`impl.go` (Git Implementation):**
- Contains the `Impl` struct, which is the primary implementation of the
`Git` interface.
- **Data Synchronization (`Update` method):** This is a crucial method
responsible for fetching new commits from the configured
`provider.Provider` and storing them in the SQL database. It determines
the last known commit and fetches all subsequent commits.
- If `repoSuppliedCommitNumber` is true, it parses the commit number from
the commit body using `commitNumberRegex`.
- It handles potential race conditions where multiple services might try
to update simultaneously by checking if a commit already exists before
insertion.
- **Commit Retrieval Methods:** Implements various methods for fetching
commit data, such as:
- `CommitNumberFromGitHash`: Retrieves the sequential `CommitNumber` for a
given Git hash.
- `CommitFromCommitNumber`: Retrieves the full `provider.Commit` details
for a given `CommitNumber`. Uses the LRU cache.
- `CommitNumberFromTime`: Finds the `CommitNumber` closest to (but not
after) a given timestamp.
- `CommitSliceFromTimeRange`, `CommitSliceFromCommitNumberRange`: Fetches
slices of commits based on time or commit number ranges.
- `GitHashFromCommitNumber`: Retrieves the Git hash for a given
`CommitNumber`.
- `PreviousGitHashFromCommitNumber`,
`PreviousCommitNumberFromCommitNumber`: Finds the Git hash or commit
number of the commit immediately preceding a given commit number.
- `CommitNumbersWhenFileChangesInCommitNumberRange`: Identifies commit
numbers within a range where a specific file was modified. This involves
converting commit numbers to hashes and then querying the
`provider.Provider`.
- **URL Generation (`urlFromParts`):** Constructs a URL to view a specific
commit, respecting configurations like `DebouceCommitURL` or custom
`CommitURL` formats.
- **Metrics:** Collects various metrics (e.g., `updateCalled`,
`commitNumberFromGitHashCalled`) to monitor the usage and performance of
different operations.
- **`provider/provider.go` (Provider Interface and Commit Struct):**
- Defines the `provider.Provider` interface, which abstracts the source of
Git commit data. Implementations of this interface (like `git_checkout`
and `gitiles`) handle the actual fetching of data.
- Defines the `provider.Commit` struct, which is the standard
representation of a commit used throughout the `go/git` module and its
providers. It includes fields like `GitHash`, `Timestamp`, `Author`,
`Subject`, and `Body`. The `Body` is particularly important when
`repoSuppliedCommitNumber` is true, as it's parsed to extract the commit
number.
- **`providers/builder.go` (Provider Factory):**
- Contains the `New` function, which acts as a factory for creating
`provider.Provider` instances based on the
`instanceConfig.GitRepoConfig.Provider` setting. This allows the system
to dynamically choose between `git_checkout` or `gitiles` (or
potentially other future providers).
- **`providers/git_checkout/git_checkout.go` (CLI Git Provider):**
- Implements `provider.Provider` by executing `git` command-line
operations.
- Handles cloning the repository if it doesn't exist.
- Manages Git authentication (e.g., via Gerrit) if configured.
- `CommitsFromMostRecentGitHashToHead`: Uses `git rev-list` to get commit
information.
- `GitHashesInRangeForFile`: Uses `git log` to find changes to a specific
file.
- `parseGitRevLogStream`: A helper function to parse the output of `git
rev-list --pretty`.
- **`providers/gitiles/gitiles.go` (Gitiles Provider):**
- Implements `provider.Provider` by interacting with a Gitiles API
endpoint.
- `CommitsFromMostRecentGitHashToHead`: Uses `gr.LogFnBatch` to fetch
commits in batches. It handles logic for main branches versus other
branches and respects the `startCommit`.
- `GitHashesInRangeForFile`: Uses `gr.Log` with appropriate path
filtering.
- `Update` is a no-op for Gitiles as the API always provides the latest
data.
- **`schema/schema.go` (Database Schema):**
- Defines the `Commit` struct with SQL annotations, representing the
structure of the `Commits` table in the database. This table stores the
cached commit information.
- **`gittest/gittest.go` (Test Utilities):**
- Provides helper functions (`NewForTest`) for setting up test
environments. This includes creating a temporary Git repository,
populating it with commits, and initializing a test database. This is
crucial for writing reliable unit and integration tests for the `go/git`
module and its components.
- **`mocks/Git.go` (Mock Implementation):**
- Provides a mock implementation of the `Git` interface, generated by
`mockery`. This is used in tests of other modules that depend on
`go/git`, allowing them to isolate their tests from actual Git
operations or database interactions.
**Key Workflows:**
1. **Initial Population / Update:**
```
Application -> Impl.Update()
|
'-> Provider.Update() (e.g., git pull for git_checkout)
|
'-> Impl.getMostRecentCommit() (from local DB)
|
'-> Provider.CommitsFromMostRecentGitHashToHead(mostRecentDBHash, ...)
|
'-> (For each new commit from Provider)
|
'-> [If repoSuppliedCommitNumber] Impl.getCommitNumberFromCommit(commit.Body)
|
'-> Impl.CommitNumberFromGitHash(commit.GitHash) (Check if already exists)
|
'-> DB.Exec(INSERT INTO Commits ...)
```
2. **Fetching Commit Details by CommitNumber:**
```
Application -> Impl.CommitFromCommitNumber(commitNum)
|
'-> Check LRU Cache (cache.Get(commitNum))
| |
| '-> [If found] Return cached provider.Commit
|
'-> [If not in LRU] DB.QueryRow(SELECT ... FROM Commits WHERE commit_number=$1)
|
'-> Construct provider.Commit
|
'-> Add to LRU Cache (cache.Add(commitNum, commit))
|
'-> Return provider.Commit
```
3. **Finding Commits Where a File Changed:** `Application ->
Impl.CommitNumbersWhenFileChangesInCommitNumberRange(beginNum, endNum, file)
| '-> Impl.PreviousGitHashFromCommitNumber(beginNum) -> beginHash (or
Impl.GitHashFromCommitNumber if beginNum is 0 and start commit is used) |
'-> Impl.GitHashFromCommitNumber(endNum) -> endHash | '->
Provider.GitHashesInRangeForFile(beginHash, endHash, file) ->
changedGitHashes[] | '-> (For each changedGitHash) | '->
Impl.CommitNumberFromGitHash(changedGitHash) -> commitNum | '-> Add
commitNum to result list | '-> Return result list`
This structure allows Perf to efficiently query and manage Git commit
information, supporting its core functionality of tracking performance data
across different versions of the codebase.
# Module: /go/graphsshortcut
The `graphsshortcut` module provides a mechanism for storing and retrieving
shortcuts for graph configurations in Perf. Users often define complex sets of
graphs for analysis. Instead of redefining these configurations each time or
relying on cumbersome URL sharing, this module allows users to save a collection
of graph configurations and access them via a unique, shorter identifier. This
significantly improves usability and sharing of common graph views.
The core idea is to represent a set of graphs, each with its own configuration
(queries, formulas, keys), as a `GraphsShortcut` object. This object can then be
persisted and retrieved using a `Store` interface. A key design decision is the
generation of a unique ID for each `GraphsShortcut`. This ID is a hash (MD5) of
the content of the shortcut, ensuring that identical graph configurations will
always have the same ID. This also provides a form of de-duplication. To ensure
consistent ID generation, the queries and formulas within each graph
configuration are sorted alphabetically before hashing. However, the order of
the `GraphConfig` objects within a `GraphsShortcut` _does_ affect the generated
ID.
```
User defines graph configurations --> [GraphsShortcut object] -- InsertShortcut --> [Store] --> Generates ID (MD5 hash) --> Persists (ID, Shortcut)
^
|
User provides ID -------------------> [Store] -- GetShortcut --------+------> [GraphsShortcut object] --> Display Graphs
```
### Key Components:
- **`graphsshortcut.go`**: This file defines the central data structures and
the `Store` interface.
- `GraphConfig`: Represents the configuration for a single graph. It
contains:
- `Queries`: A slice of strings, where each string represents a query used
to fetch data for the graph.
- `Formulas`: A slice of strings, representing any formulas applied to the
data.
- `Keys`: A string, likely representing a pre-selected set of traces or
keys to focus on.
- `GraphsShortcut`: This is the primary object that is stored and
retrieved. It's essentially a list of `GraphConfig` objects.
- `GetID()`: A method on `GraphsShortcut` that calculates a unique MD5
hash based on its content. This method is crucial for identifying and
de-duplicating shortcuts. It sorts queries and formulas within each
`GraphConfig` before hashing to ensure that the order of these internal
elements doesn't change the ID.
- `Store`: An interface defining the contract for persisting and
retrieving `GraphsShortcut` objects. It has two methods:
- `InsertShortcut`: Takes a `GraphsShortcut` and stores it, returning its
generated ID.
- `GetShortcut`: Takes an ID and returns the corresponding
`GraphsShortcut`.
- **`graphsshortcutstore/`**: This subdirectory contains implementations of
the `graphsshortcut.Store` interface.
- **`graphsshortcutstore.go` (`GraphsShortcutStore`)**: This provides an
SQL-backed implementation of the `Store`.
- **Why SQL?**: SQL databases offer robust, persistent storage suitable
for production environments where data integrity and concurrent access
are important.
- **How it works**:
- It uses a connection pool (`sql.Pool`) to manage database
connections.
- `InsertShortcut`: Marshals the `GraphsShortcut` object into JSON and
stores it as a string in the `GraphsShortcuts` table along with its
pre-computed ID. It uses `ON CONFLICT (id) DO NOTHING` to avoid
errors if the same shortcut (and thus same ID) is inserted multiple
times.
- `GetShortcut`: Retrieves the JSON string from the database based on
the ID and unmarshals it back into a `GraphsShortcut` object.
- **`cachegraphsshortcutstore.go` (`cacheGraphsShortcutStore`)**: This
provides an in-memory cache-backed implementation of the `Store`.
- **Why a cache implementation?**: This is primarily useful for local
development or testing scenarios, especially when connecting to a
production database. It allows developers to use features that rely on
graph shortcuts (like multigraph) without needing write access (or
breakglass permissions) to the production database. The shortcuts are
stored locally and ephemerally.
- **How it works**:
- It utilizes a generic `cache.Cache` client.
- `InsertShortcut`: Marshals the `GraphsShortcut` to JSON and stores
it in the cache using the shortcut's ID as the cache key.
- `GetShortcut`: Retrieves the JSON string from the cache by ID and
unmarshals it.
- **`schema/schema.go`**: Defines the SQL table schema for
`GraphsShortcuts`. The table primarily stores the `id` (TEXT, PRIMARY
KEY) and the `graphs` (TEXT, storing the JSON representation of the
`GraphsShortcut`).
- **`graphsshortcuttest/graphsshortcuttest.go`**: This file provides a suite
of common tests that can be run against any implementation of the
`graphsshortcut.Store` interface.
- **Why shared tests?**: This promotes consistency and ensures that all
store implementations adhere to the same contract. It makes it easier to
add new store implementations and verify their correctness.
- **Key Tests**:
- `InsertGet`: Verifies that a shortcut can be inserted and then
retrieved, and that the retrieved shortcut is identical to the original
(accounting for sorted queries/formulas).
- `GetNonExistent`: Ensures that attempting to retrieve a shortcut with an
unknown ID results in an error.
- **`mocks/Store.go`**: This file contains a mock implementation of the
`graphsshortcut.Store` interface, generated by the `testify/mock` library.
- **Why mocks?**: Mocks are essential for unit testing components that
depend on the `Store` interface without needing a real database or
cache. They allow for controlled testing of different scenarios, such as
simulating errors from the store.
In summary, the `graphsshortcut` module provides a flexible way to save and
share complex graph views by defining a clear data structure (`GraphsShortcut`),
a standardized way to identify them (`GetID`), and an interface (`Store`) for
various persistence mechanisms, with current implementations for SQL databases
and in-memory caches.
# Module: /go/ingest
The `/go/ingest` module is responsible for the entire process of taking
performance data files, parsing them, and storing the data into a trace store.
This involves identifying the format of the input file, extracting relevant
measurements and metadata, associating them with specific commits, and then
writing this information to the configured data storage backend.
A key design principle is to support multiple ingestion file formats and to be
resilient to errors in individual files. The system attempts to parse files in a
specific order, falling back to legacy formats if the primary parsing fails.
This allows for graceful upgrades of the ingestion format over time without
breaking existing data producers.
The ingestion process also handles trybot data, extracting issue and patchset
information, which is crucial for pre-submit performance analysis.
## Key Components and Files
### `/go/ingest/filter/filter.go`
This component provides a mechanism to selectively process or ignore input files
based on their names using regular expressions.
**Why:** In many scenarios, not all files in a data source are relevant for
performance analysis. For example, temporary files, logs, or files matching
specific patterns might need to be excluded. This filter allows for fine-grained
control over which files are ingested.
**How:**
- It uses two regular expressions: `accept` and `reject`.
- An `accept` regex, if provided, means only filenames matching this regex
will be considered for processing. If empty, all files are initially
accepted.
- A `reject` regex, if provided, means any filename matching this regex will
be ignored, even if it matched the `accept` regex. If empty, no files are
rejected based on this rule.
- The `Reject(name string) bool` method implements this logic: a file is
rejected if it _doesn't_ match the `accept` regex (if one is provided) OR if
it _does_ match the `reject` regex (if one is provided).
**Workflow:**
```
File Name -> Filter.Reject()
|
+-- accept_regex_exists? -- Yes -> name_matches_accept? -- No -> REJECT
| |
| +-------------------------- Yes --+
+----------------------------- No -----------------------------+
|
V
reject_regex_exists? -- Yes -> name_matches_reject? -- Yes -> REJECT
| |
| +-- No --+
+----------------------------- No -----+
|
V
ACCEPT
```
### `/go/ingest/format/format.go` and `/go/ingest/format/legacyformat.go`
These files define the structure of the data files that the ingestion system can
understand. `format.go` defines the current standard format (Version 1), while
`legacyformat.go` defines an older format primarily used by nanobench.
**Why:** A well-defined input format is essential for reliable data ingestion.
Versioning allows the format to evolve while maintaining backward compatibility
or clear error handling for older, unsupported versions. The current format
(`Format` struct) is designed to be flexible, allowing for common metadata (like
git hash, issue/patchset), global key-value pairs applicable to all results, and
a list of individual results. Each result can have its own set of keys and
either a single measurement or a map of "sub-measurements" (e.g., min, max,
median for a single test). This structure allows for rich and varied performance
data to be represented. The legacy format (`BenchData`) exists to support older
systems that still produce data in that schema.
**How:**
- **`format.go` (Version 1):**
- `Format` struct: The top-level structure. Contains `Version`, `GitHash`,
optional trybot info (`Issue`, `Patchset`), a global `Key` map, a slice
of `Result` structs, and global `Links`.
- `Result` struct: Represents one or more measurements. It has its own
`Key` map (which gets merged with the global `Key`), and critically,
either a single `Measurement` (float32) or a `Measurements` map.
- `SingleMeasurement` struct: Used within `Measurements` map. It allows
associating a `value` (e.g., "min", "median") with a `Measurement`
(float32) and optional `Links`. This is how multiple metrics for a
single conceptual test run are represented.
- `Parse(r io.Reader)`: Decodes JSON data from a reader into a `Format`
struct. It specifically checks `fileFormat.Version ==
FileFormatVersion`.
- `Validate(r io.Reader)`: Uses a JSON schema (`formatSchema.json`) to
validate the structure of the input data. This ensures that incoming
files adhere to the expected contract, preventing malformed data from
causing issues downstream.
- `GetLinksForMeasurement(traceID string)`: Retrieves links associated
with a specific measurement, combining global links with
measurement-specific ones.
- **`legacyformat.go`:**
- `BenchData` struct: Defines the older nanobench format. It has fields
like `Hash`, `Issue`, `PatchSet`, `Key`, `Options`, and `Results`. The
`Results` are nested maps leading to `BenchResult`.
- `BenchResult`: A map representing individual test results, typically
`map[string]interface{}` where values are float64s, except for an
"options" key.
- `ParseLegacyFormat(r io.Reader)`: Decodes JSON data into a `BenchData`
struct.
The system will first attempt to parse an input file using `format.Parse`. If
that fails (e.g., due to a version mismatch or JSON parsing error), it may then
attempt to parse it using `format.ParseLegacyFormat` as a fallback.
### `/go/ingest/format/formatSchema.json`
This file contains the JSON schema definition for the `Format` struct defined in
`format.go`.
**Why:** A JSON schema provides a formal, machine-readable definition of the
expected data structure. This is used for validation, ensuring that ingested
files conform to the specified format. This helps catch errors early and
provides clear feedback on what is wrong with a non-conforming file.
**How:** It's a standard JSON Schema file. The `format.Validate` function uses
this schema to check the structure and types of the fields in an incoming JSON
file. The schema is embedded into the Go binary.
### `/go/ingest/format/generate/main.go`
This is a utility program used to automatically generate `formatSchema.json`
from the Go `Format` struct definition.
**Why:** Manually keeping a JSON schema synchronized with Go struct definitions
is error-prone. This generator ensures that the schema always accurately
reflects the Go types.
**How:** It uses the `go.skia.org/infra/go/jsonschema` library, which can
reflect on Go structs and produce a corresponding JSON schema. The
`//go:generate` directive in the file allows this program to be run easily
(e.g., via `go generate`).
### `/go/ingest/parser/parser.go`
This is the core component responsible for taking an input file (as
`file.File`), attempting to parse it using the defined formats, and extracting
the performance data into a standardized intermediate representation.
**Why:** This component decouples the specifics of file formats from the process
of writing data to the trace store. It handles the logic of trying different
parsers, extracting common information like Git hashes and trybot details, and
transforming the data into lists of parameter maps (`paramtools.Params`) and
corresponding measurement values (`float32`). It also enforces rules like branch
name filtering and parameter key/value validation.
**How:**
- **`New(...)`**: Initializes a `Parser` with instance-specific
configurations, such as recognized branch names and a regex for invalid
characters in parameter keys/values.
- **`Parse(ctx context.Context, file file.File)`**: This is the main entry
point for processing a regular data file.
1. It first attempts to parse the file using `extractFromVersion1File`
(which uses `format.Parse`).
2. If that fails, it falls back to `extractFromLegacyFile` (which uses
`format.ParseLegacyFormat`).
3. It checks if the branch name (if present in the file's common keys) is
in the allowed list. If not, it returns `ErrFileShouldBeSkipped`.
4. It ensures that the extracted parameter keys and values are valid,
potentially modifying them using `query.ForceValidWithRegex` based on
the `invalidParamCharRegex` from the instance configuration. This is
crucial because trace IDs (which are derived from these parameters)
often have restrictions on allowed characters.
5. Returns `params` (a slice of `paramtools.Params`), `values` (a slice of
`float32`), the `gitHash`, any global `links` from the file, and an
error.
- **`ParseTryBot(file file.File)`**: A specialized function to extract only
the `Issue` and `Patchset` information from a file, trying both V1 and
legacy formats. This is likely used for systems that only need to identify
the tryjob associated with a file without processing all the measurement
data.
- **`ParseCommitNumberFromGitHash(gitHash string)`**: Extracts an integer
commit number from a specially formatted git hash string (e.g., "CP:12345"
-> 12345). This supports systems that use such commit identifiers.
- Helper functions like `getParamsAndValuesFromLegacyFormat` and
`getParamsAndValuesFromVersion1Format` do the actual work of traversing the
parsed file structures (`BenchData` or `Format`) and flattening them into
the `params` and `values` slices.
- For the V1 format, it iterates through `f.Results`. If a `Result` has a
single `Measurement`, it combines `f.Key` and `result.Key` to form the
`paramtools.Params`.
- If a `Result` has `Measurements` (a map of `string` to
`[]SingleMeasurement`), it iterates through this map. For each entry, it
takes the map's key and the `Value` from `SingleMeasurement` to add more
key-value pairs to the `paramtools.Params`.
- **`GetSamplesFromLegacyFormat(b *format.BenchData)`**: Extracts raw sample
data (if present) from the legacy format. This seems to be for specific use
cases where individual sample values, rather than just aggregated metrics,
are needed.
**Key Workflow (Simplified `Parse`):**
```
Input: file.File
Output: ([]paramtools.Params, []float32, gitHash, links, error)
1. Read file contents.
2. Attempt Parse as Version 1 Format:
`f, err := format.Parse(contents)`
If success:
`params, values := getParamsAndValuesFromVersion1Format(f, p.invalidParamCharRegex)`
`gitHash = f.GitHash`
`links = f.Links`
`commonKeys = f.Key`
Else (error):
Reset reader.
Attempt Parse as Legacy Format:
`benchData, err := format.ParseLegacyFormat(contents)`
If success:
`params, values := getParamsAndValuesFromLegacyFormat(benchData)`
`gitHash = benchData.Hash`
`links = nil` (legacy format doesn't have global links in the same way)
`commonKeys = benchData.Key`
Else (error):
Return error.
3. `branch, ok := p.checkBranchName(commonKeys)`
If !ok:
Return `ErrFileShouldBeSkipped`.
4. If len(params) == 0:
Return `ErrFileShouldBeSkipped`.
5. Return `params, values, gitHash, links, nil`.
```
### `/go/ingest/process/process.go`
This component orchestrates the entire ingestion pipeline. It takes files from a
source (e.g., a directory, GCS bucket), uses the `parser` to extract data,
interacts with `git` to resolve commit information, and then writes the
processed data to a `tracestore.TraceStore` and `tracestore.MetadataStore`. It
also handles sending Pub/Sub events for ingested files.
**Why:** This provides the high-level control flow for ingestion. It manages
concurrency (multiple worker goroutines), error handling at a macro level
(retries for writing to the store), and integration with external systems like
Git and Pub/Sub.
**How:**
- **`Start(...)`**:
1. Initializes tracing, Pub/Sub client (if a topic is configured), the
`file.Source` (to get files), the `tracestore.TraceStore` and
`tracestore.MetadataStore` (to write data), and `perfgit.Git` (to map
git hashes to commit numbers).
2. Starts a number of `worker` goroutines specified by
`numParallelIngesters`.
3. Each `worker` listens on a channel provided by the `file.Source`.
- **`worker(...)`**:
1. Creates a `parser.Parser` instance.
2. Enters a loop, receiving `file.File` objects from the channel.
3. For each file, it calls `workerInfo.processSingleFile`.
- **`workerInfo.processSingleFile(f file.File)`**: This is the heart of the
per-file processing.
1. Increments metrics for files received.
2. Calls `p.Parse(ctx, f)` to get `params`, `values`, `gitHash`, and
`fileLinks`.
3. Handles errors from `Parse`:
- If `parser.ErrFileShouldBeSkipped`, acks the Pub/Sub message (if
any) and skips.
- For other parsing errors, increments metrics and nacks the Pub/Sub
message (if dead-lettering is enabled, allowing for retries or
manual inspection).
4. If `gitHash` is empty, logs an error and nacks.
5. If the Git repo supplies commit numbers directly (e.g. "CP:12345"), it
calls `p.ParseCommitNumberFromGitHash`.
6. Calls `g.GetCommitNumber(ctx, gitHash, commitNumberFromFile)` to resolve
the `gitHash` (or verify the supplied commit number) against the Git
repository. It includes logic to update the local Git repository clone
if the hash isn't initially found. If the commit cannot be resolved, it
logs an error, acks the Pub/Sub message (as retrying won't help for an
unknown commit), and skips.
7. Builds a `paramtools.ParamSet` from all the extracted `params`.
8. Writes the data to the `tracestore.TraceStore` using `store.WriteTraces`
or `store.WriteTraces2` (depending on
`instanceConfig.IngestionConfig.TraceValuesTableInlineParams`). This
involves retries in case of transient store errors.
- `WriteTraces2` suggests an optimized path where some parameter data
might be stored directly with trace values, potentially for
performance reasons.
9. If writing fails after retries, increments metrics and nacks.
10. If writing succeeds, acks the Pub/Sub message and increments success
metrics.
11. Calls `sendPubSubEvent` to publish information about the ingested file
(trace IDs, paramset, filename) to a configured Pub/Sub topic. This
allows other services to react to new data ingestion.
12. If `fileLinks` were present in the input, it calls
`metadataStore.InsertMetadata` to store these links.
- **`sendPubSubEvent(...)`**: If a `FileIngestionTopicName` is configured,
this function constructs an `ingestevents.IngestEvent` containing the trace
IDs, the overall `ParamSet` for the file, and the filename. It then
publishes this event to the specified Pub/Sub topic.
**Overall Ingestion Workflow:**
```
File Source (e.g., GCS bucket watcher)
|
v
[ file.File channel ] -> Worker Goroutine(s)
|
v
processSingleFile(file)
|
+--------------------------+--------------------------+
| | |
v v v
Parser.Parse(file) --> Git.GetCommitNumber(hash) --> TraceStore.WriteTraces(...)
| ^ | | ^
| | (if parsing fails)| | | (retries)
| +-------------------| (update repo if needed) | |
| | | |
+-----> ParamSet Creation +--------------------------+ |
| |
v |
sendPubSubEvent (if success) ------------------------------+
|
v
MetadataStore.InsertMetadata (if links exist)
```
This architecture allows for robust and scalable ingestion of performance data
from various sources and formats, with clear separation of concerns between
parsing, data transformation, Git interaction, and storage. The use of Pub/Sub
facilitates downstream processing and real-time reactions to newly ingested
data.
# Module: /go/ingestevents
The `ingestevents` module is designed to facilitate the communication of
ingestion completion events via PubSub. This is a critical part of the
event-driven alerting system within Perf, where the completion of data ingestion
for a file triggers subsequent processes like regression detection in a
clusterer.
The core of this module revolves around the `IngestEvent` struct. This struct
encapsulates the necessary information to be transmitted when a file has been
successfully ingested. It includes:
- `TraceIDs`: A slice of strings representing all the unencoded trace
identifiers found within the ingested file. These IDs are fundamental for
identifying the specific data points that have been processed.
- `ParamSet`: An unencoded, read-only representation of the
`paramtools.ParamSet` that summarizes the `TraceIDs`. This provides a
consolidated view of the parameters associated with the ingested traces.
- `Filename`: The name of the file that was ingested. This helps in tracking
the source of the ingested data.
To handle the transmission of `IngestEvent` data over PubSub, the module
provides two key functions:
- `CreatePubSubBody`: This function takes an `IngestEvent` struct as input and
prepares it for PubSub transmission. The "how" here involves a two-step
process:
1. The `IngestEvent` is first encoded into a JSON format. This provides a
structured and widely compatible representation of the data.
2. The resulting JSON data is then compressed using gzip. The "why" for
this step is to ensure that the message size stays within the PubSub
message size limits (currently 10MB). This is particularly important
when dealing with files that contain a large number of traces, as the
raw JSON representation could exceed the limit. The function returns the
gzipped JSON data as a byte slice.
```
IngestEvent (struct) ---> JSON Encoding ---> Gzip Compression ---> []byte (for PubSub)
```
- `DecodePubSubBody`: This function performs the reverse operation. It takes a
byte slice (presumably received from a PubSub message) and decodes it back
into an `IngestEvent` struct. The process is:
1. The input byte slice is first decompressed using gzip.
2. The decompressed data, which is expected to be in JSON format, is then
decoded into an `IngestEvent` struct. Error handling is incorporated at
each step to manage potential issues during decompression or JSON
decoding.
```
[]byte (from PubSub) ---> Gzip Decompression ---> JSON Decoding ---> IngestEvent (struct)
```
The primary responsibility of this module is therefore to provide a standardized
and efficient way to serialize and deserialize ingestion event information for
PubSub communication. The design choice of using JSON for structure and gzip for
compression balances readability, interoperability, and an efficient use of
PubSub resources.
The file `ingestevents.go` contains the definition of the `IngestEvent` struct
and the implementation of the `CreatePubSubBody` and `DecodePubSubBody`
functions. The corresponding test file, `ingestevents_test.go`, ensures that the
encoding and decoding processes work correctly, verifying that an `IngestEvent`
can be successfully round-tripped through the serialization and deserialization
process.
# Module: /go/initdemo
The `initdemo` module provides a command-line application designed to initialize
a database instance, specifically targeting CockroachDB or a Spanner emulator,
for demonstration or development purposes.
Its primary purpose is to automate the creation of a named database and the
application of the latest database schema. This ensures a consistent and
ready-to-use database environment, removing the manual steps often required for
setting up a database for applications like Skia Perf.
The core functionality revolves around connecting to a specified database URL,
attempting to create the database (gracefully handling cases where it already
exists), and then executing the appropriate schema definition. The choice of
schema (standard SQL or Spanner-specific) is determined by a command-line flag.
**Key Components and Responsibilities:**
- **`main.go`**: This is the entry point and sole Go source file for the
application.
- **Flag Parsing**: It defines and parses command-line flags to configure
the database connection and behavior.
- `--databasename`: Specifies the name of the database to be created
(defaults to "demo"). This allows users to customize the database name
for different environments or purposes.
- `--database_url`: Provides the connection string for the CockroachDB
instance (defaults to a local instance
`postgresql://root@127.0.0.1:26257/?sslmode=disable`). This allows
connection to different database servers or configurations.
- `--spanner`: A boolean flag that, when set, instructs the application to
use the Spanner-specific schema. This is crucial for ensuring
compatibility when targeting a Spanner emulator, which may have
different SQL syntax or feature support compared to CockroachDB.
- **Database Connection**: It establishes a connection to the database
using the `pgxpool` library, which is a PostgreSQL driver and connection
pool for Go. This library was chosen for its robustness and performance
in handling PostgreSQL-compatible databases like CockroachDB.
- **Database Creation**: It attempts to execute a `CREATE DATABASE` SQL
statement. The implementation includes error handling to log an
informational message if the database already exists, rather than
failing, making the script idempotent in terms of database creation.
- **Database Selection (CockroachDB specific)**: If not targeting Spanner,
it executes `SET DATABASE` to switch the current session's context to
the newly created (or existing) database. This is a CockroachDB-specific
command.
- **Schema Selection**: Based on the `--spanner` flag, it selects the
appropriate schema definition.
- If `--spanner` is false, it uses `sql.Schema` from the `//perf/go/sql`
module, which contains the standard SQL schema for Perf.
- If `--spanner` is true, it uses `spanner.Schema` from the
`//perf/go/sql/spanner` module, which contains the schema adapted for
Spanner. This separation allows maintaining distinct schema versions
tailored to the nuances of each database system.
- **Schema Application**: It executes the selected schema DDL statements
against the connected database. This step creates all the necessary
tables, indexes, and other database objects required by the Perf
application.
- **Connection Closure**: Finally, it closes the database connection pool
to release resources.
**Workflow:**
The typical workflow of the `initdemo` application can be visualized as:
1. **Parse Flags**: `Application Start` -> `Read --databasename,
--database_url, --spanner`
2. **Connect to Database**: `Use --database_url` -> `pgxpool.Connect()` ->
`Connection Pool (conn)`
3. **Create Database**: `conn` + `Use --databasename` -> `Execute "CREATE
DATABASE <name>"` `|` `+-- Success` `|` `+-- Error (e.g., already exists)`
-> `Log Info "Database <name> already exists."`
4. **Set Active Database (if not Spanner)**: `Is --spanner false?` `|` `+--
Yes` -> `conn` + `Use --databasename` -> `Execute "SET DATABASE <name>"` `|
|` `| +-- Error` -> `sklog.Fatal()` `|` `+-- No (Spanner enabled)` -> `Skip
this step`
5. **Select Schema**: `Is --spanner true?` `|` `+-- Yes` -> `dbSchema =
spanner.Schema` `|` `+-- No` -> `dbSchema = sql.Schema`
6. **Apply Schema**: `conn` + `dbSchema` -> `Execute schema DDL` `|` `+--
Error` -> `sklog.Fatal()`
7. **Close Connection**: `conn.Close()` -> `Application End`
This process ensures that a target database is either created or confirmed to
exist, and then the correct schema is applied, making it ready for use. The
choice of using `pgxpool` for database interaction and providing separate schema
definitions for standard SQL and Spanner demonstrates a design focused on
supporting multiple database backends for the Perf system. The error handling,
particularly for the database creation step, aims for robust and user-friendly
operation.
# Module: /go/issuetracker
## Perf Issue Tracker Module
This module provides an interface and implementation for interacting with the
Google Issue Tracker API, specifically tailored for Perf's needs. The primary
goal is to abstract the complexities of the Issue Tracker API and provide a
simpler, more focused way to retrieve issue details and add comments to existing
issues. This allows other parts of the Perf system to integrate with issue
tracking without needing to directly handle API authentication, request
formatting, or response parsing.
### Core Functionality and Design
The module is designed around the `IssueTracker` interface, which defines the
core operations:
1. **Listing Issues (`ListIssues`)**: This function allows retrieving details
for a set of specified issue IDs.
- **Why**: Perf often needs to fetch information about bugs that have been
filed (e.g., to display their status or link to them from alerts).
Providing a bulk retrieval mechanism based on IDs is efficient.
- **How**: The implementation takes a `ListIssuesRequest` containing a
slice of integer issue IDs. It constructs a query string by joining
these IDs with " | " (OR operator in Issue Tracker query language) and
prepending "id:()". This formatted query is then sent to the Issue
Tracker API.
- **Example Workflow**: `Perf System --- ListIssuesRequest (IDs: [123,
456]) ---> issuetracker Module | v Construct Query: "id:(123 | 456)" | v
Issue Tracker API <--- GET Request --- issueTrackerImpl | v Perf System
<--- []\*issuetracker.Issue --- Response Parsing <--- API Response`
2. **Creating Comments (`CreateComment`)**: This function allows adding a new
comment to an existing issue.
- **Why**: Perf might need to automatically update bugs with new
information, such as when a regression is fixed or when more data about
an alert becomes available.
- **How**: It takes a `CreateCommentRequest` containing the `IssueId` and
the `Comment` string. The implementation constructs an
`issuetracker.IssueComment` object and uses the Issue Tracker client
library to post this comment to the specified issue.
- **Example Workflow**: `Perf System --- CreateCommentRequest (ID: 789,
Comment: "...") ---> issuetracker Module | v Issue Tracker API <--- POST
Request --- issueTrackerImpl | v Perf System <--- CreateCommentResponse
<--- Response Parsing <--- API Response`
### Key Components
- **`issuetracker.go`**:
- **`IssueTracker` interface**: Defines the contract for interacting with
the issue tracker. This allows for decoupling the client code from the
specific implementation and facilitates testing using mocks.
- **`issueTrackerImpl` struct**: The concrete implementation of the
`IssueTracker` interface. It holds an instance of the
`issuetracker.Service` client, which is the generated Go client for the
Google Issue Tracker API.
- **`NewIssueTracker` function**: This is the factory function for
creating an `issueTrackerImpl` instance.
- **Authentication**: It handles the authentication by fetching an API key
from Google Secret Manager. The secret project and name are configurable
via `config.IssueTrackerConfig`. It then uses `google.DefaultClient`
with the "https://www.googleapis.com/auth/buganizer" scope to obtain an
authenticated HTTP client. This client and the API key are then used to
initialize the `issuetracker.Service`.
- **Configuration**: The `BasePath` of the `issuetracker.Service` is
explicitly set to "https://issuetracker.googleapis.com" to ensure it
points to the correct API endpoint.
- **Request/Response Structs (`ListIssuesRequest`, `CreateCommentRequest`,
`CreateCommentResponse`)**: These simple structs define the data
structures for requests and responses, making the interface clear and
easy to use. They are designed to be minimal and specific to the needs
of the Perf system.
- **`mocks/IssueTracker.go`**:
- This file contains a mock implementation of the `IssueTracker`
interface, generated using the `testify/mock` library.
- **Why**: Mocks are crucial for unit testing components that depend on
the `issuetracker` module. They allow tests to simulate various
responses (success, failure, specific data) from the issue tracker
without making actual API calls. This makes tests faster, more reliable,
and independent of external services.
- **How**: The `IssueTracker` mock struct embeds `mock.Mock` and provides
mock implementations for `ListIssues` and `CreateComment`. The
`NewIssueTracker` function in this file is a constructor for the mock,
which also sets up test cleanup to assert that all expected mock calls
were made.
### Design Decisions and Trade-offs
- **Interface-based design**: Using an interface (`IssueTracker`) promotes
loose coupling and testability. Consumers depend on the abstraction rather
than the concrete implementation.
- **Simplified API**: The module exposes only the functionality currently
needed by Perf (listing issues by ID and creating comments). It doesn't
attempt to be a full-fledged Issue Tracker client, which simplifies its own
implementation and usage. If more advanced features are needed in the
future, the interface can be extended.
- **Secret Management for API Key**: Storing the API key in Google Secret
Manager is a security best practice, preventing it from being hardcoded or
checked into version control.
- **Error Handling**: The module uses `skerr.Wrapf` to wrap errors, providing
context and making debugging easier. It also includes input validation for
`CreateCommentRequest` to prevent invalid API calls.
- **Logging**: Debug logs (`sklog.Debugf`) are included to trace requests and
responses, which can be helpful during development and troubleshooting.
The module relies on the external `go.skia.org/infra/go/issuetracker/v1`
library, which is the auto-generated client for the Google Issue Tracker API.
This design choice leverages existing, well-tested client libraries instead of
reimplementing API interaction from scratch.
# Module: /go/kmeans
## K-Means Clustering Module
This module provides a generic implementation of the k-means clustering
algorithm. The primary goal is to offer a flexible way to group a set of data
points (observations) into a predefined number of clusters (k) based on their
similarity. The "similarity" is determined by a distance metric, and the
"center" of each cluster is represented by a centroid.
### Design and Implementation Choices
The module is designed with **generality** in mind. Instead of being tied to a
specific data type or distance metric, it uses interfaces (`Clusterable`,
`Centroid`) and a function type (`CalculateCentroid`). This approach allows
users to define their own data structures and distance calculations, making the
k-means algorithm applicable to a wide variety of problems.
**Interfaces for Flexibility:**
- **`Clusterable`**: This is a marker interface. Any data type that needs to
be clustered must satisfy this interface. In practice, this means you can
use `interface{}` and then perform type assertions within your custom
distance and centroid calculation functions. This design choice prioritizes
ease of use for simple cases, where the same type might represent both an
observation and a centroid.
- **`Centroid`**: This interface defines the contract for centroids.
- `AsClusterable() Clusterable`: This method is crucial for situations
where a centroid itself can be treated as a data point (e.g., when
calculating distances or when a centroid is part of the initial
observation set). It allows the algorithm to seamlessly integrate
centroids into lists of clusterable items. If a centroid cannot be
meaningfully converted to a `Clusterable`, it returns `nil`.
- `Distance(c Clusterable) float64`: This method is the core of the
similarity measure. It calculates the distance between the centroid and
a given `Clusterable` data point. The user provides the specific
implementation for this, enabling the use of various distance metrics
(Euclidean, Manhattan, etc.).
- **`CalculateCentroid func([]Clusterable) Centroid`**: This function type
defines how a new centroid is computed from a set of `Clusterable` items
belonging to a cluster. This allows users to implement different strategies
for centroid calculation, such as taking the mean, median, or other
representative points.
**Lloyd's Algorithm Implementation:**
The core clustering logic is implemented in the `Do` function, which performs a
single iteration of Lloyd's algorithm. This is a common and relatively
straightforward iterative approach to k-means.
The `KMeans` function orchestrates multiple iterations of `Do`. A key design
consideration here is the **convergence criteria**. Currently, it runs for a
fixed number of iterations (`iters`). A more sophisticated approach, would be to iterate until the total error (or the change in
centroid positions) falls below a certain threshold, indicating that the
clusters have stabilized. This was likely deferred for simplicity in the initial
implementation, but it's an important aspect for practical applications to avoid
unnecessary computations or premature termination.
**Why modify centroids in-place in `Do`?**
The `Do` function modifies the `centroids` slice passed to it. The documentation
explicitly advises calling it as `centroids = Do(observations, centroids, f)`.
This design choice might have been made for efficiency, avoiding the allocation
of a new centroids slice in every iteration if the number of centroids remains
the same. However, it also means the caller needs to be aware of this side
effect. The function does return the potentially new slice of centroids, which
is important because centroids can be "lost" if a cluster becomes empty.
### Key Responsibilities and Components
- **`kmeans.go`**: This is the sole source file and contains all the logic for
the k-means algorithm.
- **`Clusterable` (interface)**: Defines the contract for data points that
can be clustered. Its main purpose is to allow generic collections of
items.
- **`Centroid` (interface)**: Defines the contract for cluster centers,
including how to calculate their distance to data points and how to
treat them as data points themselves.
- **`CalculateCentroid` (function type)**: A user-provided function that
defines the logic for computing a new centroid from a group of data
points. This separation of concerns is key to the module's flexibility.
- **`closestCentroid(observation Clusterable, centroids []Centroid) (int,
float64)`**: A helper function that finds the index of the centroid
closest to a given observation and the distance to it. This is a
fundamental step in assigning observations to clusters.
- **`Do(observations []Clusterable, centroids []Centroid, f
CalculateCentroid) []Centroid`**:
- **Responsibility**: Performs a single iteration of the k-means algorithm
(Lloyd's algorithm).
- **How it works**: 1. Assigns each observation to its nearest centroid, forming temporary
clusters. `Observations --> [Find Closest Centroid for each] -->
Temporary Cluster Assignments` 2. For each temporary cluster, it recalculates a new centroid using the
user-provided `f` function. `Temporary Cluster Assignments -->
[Group by Cluster] --> Sets of Clusterable items | V [Apply 'f'] -->
New Centroids` 3. If a cluster becomes empty (no observations are closest to its
centroid), that centroid is effectively removed in this iteration,
as `f` will not be called for an empty set of `Clusterable` items,
and `newCentroids` will not include it.
- **Design Rationale**: Encapsulates one core step of the iterative
process, making the overall `KMeans` function clearer. The in-place
modification (and return value) addresses the potential for the number
of centroids to change.
- **`GetClusters(observations []Clusterable, centroids []Centroid)
([][]Clusterable, float64)`**:
- **Responsibility**: Organizes the observations into their final clusters
based on the provided (presumably converged) centroids and calculates
the total sum of squared errors (or whatever distance metric is used).
- **How it works**:
1. Initializes a list of clusters, where each cluster initially
contains only its centroid (if `AsClusterable()` is not nil).
2. Iterates through all observations, assigning each to its closest
centroid and adding it to the corresponding cluster list.
3. Accumulates the distance from each observation to its assigned
centroid to compute the `totalError`.
- **Design Rationale**: Provides a way to retrieve the actual cluster
memberships after the algorithm has run. The inclusion of the centroid
as the first element in each returned cluster is a convention for easy
identification.
- **`KMeans(observations []Clusterable, centroids []Centroid, k, iters
int, f CalculateCentroid) ([]Centroid, [][]Clusterable)`**:
- **Responsibility**: The main entry point for running the k-means
algorithm for a specified number of iterations.
- **How it works**: `Initial Centroids --(iter 1)--> Do() --(updates)-->
Centroids' | --(iter 2)--> Do() --(updates)--> Centroids'' ... --(iter
'iters')--> Do() --(updates)--> Final Centroids | V GetClusters() -->
Final Clusters`
- **Design Rationale**: Provides a simple interface to run the entire
process. The fixed number of iterations (`iters`) is a straightforward
stopping condition, though, as mentioned, convergence-based stopping
would be more robust. The `k` parameter seems redundant given that the
initial number of centroids is determined by `len(centroids)`. If `k`
was intended to specify the _desired_ number of clusters and the initial
`centroids` were just starting points, the implementation would need to
handle cases where `len(centroids)` != `k`. However, the current `Do`
function naturally adjusts the number of centroids if some clusters
become empty.
- **`TotalError(observations []Clusterable, centroids []Centroid)
float64`**:
- **Responsibility**: Calculates the sum of distances from each
observation to its closest centroid. This is often used as a measure of
the "goodness" of the clustering.
- **How it works**: It simply calls `GetClusters` and returns the
`totalError` computed by it.
- **Design Rationale**: Provides a convenient way to evaluate the
clustering quality without needing to manually iterate and sum
distances.
### Key Workflows
**1. Single K-Means Iteration (`Do` function):**
```
Input: Observations (O), Current Centroids (C_curr), CalculateCentroid function (f)
1. For each Observation o in O:
Find c_closest in C_curr such that Distance(o, c_closest) is minimized.
Assign o to the cluster associated with c_closest.
---> Result: A mapping of each Observation to a Centroid index.
2. Initialize NewCentroids (C_new) as an empty list.
3. For each unique Centroid index j (from 0 to k-1):
a. Collect all Observations (O_j) assigned to cluster j.
b. If O_j is not empty:
Calculate new_centroid_j = f(O_j).
Add new_centroid_j to C_new.
---> Potentially, some original centroids might not have any observations assigned,
so C_new might have fewer centroids than C_curr.
Output: New Centroids (C_new)
```
**2. Full K-Means Clustering (`KMeans` function):**
```
Input: Observations (O), Initial Centroids (C_init), Number of Iterations (iters), CalculateCentroid function (f)
1. Set CurrentCentroids = C_init.
2. Loop 'iters' times:
CurrentCentroids = Do(O, CurrentCentroids, f) // Perform one iteration
---> CurrentCentroids are updated.
3. FinalCentroids = CurrentCentroids.
4. Clusters, TotalError = GetClusters(O, FinalCentroids)
---> Assigns each observation to its final cluster based on FinalCentroids.
The first element of each sub-array in Clusters is the centroid itself.
Output: FinalCentroids, Clusters
```
The unit tests in `kmeans_test.go` provide excellent examples of how to
implement the `Clusterable`, `Centroid`, and `CalculateCentroid` requirements
for a simple 2D point scenario. They demonstrate the expected behavior of the
`Do` and `KMeans` functions, including edge cases like empty inputs or losing
centroids when clusters become empty.
# Module: /go/maintenance
## Maintenance Module Documentation
### High-Level Overview
The `maintenance` module in Perf is responsible for executing a set of
long-running background processes that are essential for the health and
operational integrity of a Perf instance. These tasks ensure that data is kept
up-to-date, system configurations are current, and storage is managed
efficiently. The module is designed to be started once and run continuously,
performing its duties at predefined intervals.
### Design Rationale and Implementation Choices
The core design principle behind the `maintenance` module is to centralize
various periodic tasks that would otherwise be scattered or require manual
intervention. By consolidating these operations, the system becomes more robust
and easier to manage.
Key design choices include:
- **Asynchronous Operations:** Most maintenance tasks are designed to run in
separate goroutines, triggered by timers. This allows the main application
thread (if any) to remain responsive and prevents one maintenance task from
blocking others.
- **Configurability via Flags and Instance Configuration:** The behavior of
the maintenance tasks (e.g., whether to perform regression migration,
refresh query cache, or delete old data) is controlled by command-line flags
(`config.MaintenanceFlags`) and the instance-specific configuration
(`config.InstanceConfig`). This provides flexibility for different Perf
deployments and operational needs.
- **Dependency Injection:** Components like database connections
(`builders.NewDBPoolFromConfig`), Git interfaces
(`builders.NewPerfGitFromConfig`), and caching mechanisms
(`builders.GetCacheFromConfig`) are created and passed into the respective
maintenance tasks. This promotes modularity and testability.
- **Error Handling and Logging:** Each maintenance task incorporates error
handling and logging (`sklog`) to provide visibility into its operations and
to aid in diagnosing issues. While errors in one task might be logged, the
overall `Start` function aims to keep other independent tasks running.
- **Idempotency (Implicit):** While not explicitly stated for all tasks, many
maintenance operations are inherently idempotent or designed to be safe to
run repeatedly (e.g., schema migration, data deletion based on age).
- **Phased Introduction of Features:** Features like regression migration or
Sheriff config integration are gated by flags (`flags.MigrateRegressions`,
`instanceConfig.EnableSheriffConfig`). This allows for gradual rollouts and
testing in production environments.
### Responsibilities and Key Components
The `maintenance` module orchestrates several distinct background processes.
**1. Core Initialization and Schema Management (`maintenance.go`)**
- **Why:** Before any maintenance tasks can run, essential services like
tracing need to be initialized. Crucially, the database schema must be
validated and migrated to the expected version. This ensures that all
subsequent database operations are performed against a compatible and
up-to-date schema.
- **How:**
- `tracing.Init`: Sets up the distributed tracing system.
- `builders.NewDBPoolFromConfig`: Establishes a connection pool to the
database.
- `expectedschema.ValidateAndMigrateNewSchema`: Checks the current
database schema version against the expected version defined in the
codebase. If they don't match, it applies the necessary migrations to
bring the schema up to date. This is a critical step to prevent data
corruption or application errors due to schema mismatches.
**2. Git Repository Synchronization (`maintenance.go`)**
- **Why:** Perf relies on an up-to-date view of the monitored Git repository
to associate performance data with specific commits. This process ensures
that new commits are continuously ingested into the Perf system.
- **How:**
- `builders.NewPerfGitFromConfig`: Creates an instance of `perfgit.Git`,
which provides an interface to the Git repository.
- `g.StartBackgroundPolling(ctx, gitRepoUpdatePeriod)`: This method
launches a goroutine within the `perfgit` component. This goroutine
periodically fetches the latest changes from the remote Git repository
(origin) and updates the local representation, typically also updating a
`Commits` table in the database with new commit information. The
`gitRepoUpdatePeriod` constant (e.g., 1 minute) defines how frequently
this update occurs.
**3. Regression Schema Migration (`maintenance.go`)**
- **Why:** Over time, the way regression data is stored might need to be
changed for performance, new features, or data integrity reasons. This
component handles the migration of existing regression data from an older
schema/table to a newer one. This is often a long-running process for
instances with a large history of regressions.
- **How:**
- Controlled by the `flags.MigrateRegressions` flag.
- `migration.New`: Creates a `Migrator` instance, likely configured with
database connections for both the old and new regression storage
mechanisms.
- `migrator.RunPeriodicMigration(regressionMigratePeriod,
regressionMigrationBatchSize)`: Starts a goroutine that, at intervals
defined by `regressionMigratePeriod`, processes a
`regressionMigrationBatchSize` number of regressions, moving them from
the old storage to the new. This batching approach prevents overwhelming
the database and allows the migration to proceed incrementally.
**4. Sheriff Configuration Import (`maintenance.go`)**
- **Why:** Perf allows defining alert configurations (Sheriff configs) that
specify how and when alerts should be triggered for performance regressions.
These configurations can be managed externally (e.g., via LUCI Config). This
component ensures that Perf stays synchronized with the latest
configurations.
- **How:**
- Conditional on `instanceConfig.EnableSheriffConfig` and a non-empty
`instanceConfig.InstanceName`.
- It initializes `AlertStore` and `SubscriptionStore` for managing alert
and subscription data within Perf.
- `luciconfig.NewApiClient`: Creates a client to communicate with the LUCI
Config service.
- `sheriffconfig.New`: Initializes the `SheriffConfig` service, which
encapsulates the logic for fetching, parsing, and applying Sheriff
configurations.
- `sheriffConfig.StartImportRoutine(configImportPeriod)`: Launches a
goroutine that periodically (every `configImportPeriod`) polls the LUCI
Config service for the specified instance. If new or updated
configurations are found, they are processed and stored/updated in
Perf's database (e.g., in the `Alerts` and `Subscriptions` tables).
**5. Query Cache Refresh (`maintenance.go`)**
- **Why:** To speed up common queries (e.g., retrieving the set of available
trace parameters, known as ParamSets), Perf can cache this information. This
component is responsible for periodically rebuilding and refreshing these
caches.
- **How:**
- Controlled by the `flags.RefreshQueryCache` flag.
- `builders.NewTraceStoreFromConfig`: Gets an interface to the trace data.
- `dfbuilder.NewDataFrameBuilderFromTraceStore`: Creates a utility for
building data frames from traces, which is likely used to derive the
ParamSet.
- `psrefresh.NewDefaultParamSetRefresher`: Initializes a component
specifically designed to refresh ParamSets. It uses the
`DataFrameBuilder` to scan trace data and determine the current set of
unique parameter key-value pairs.
- `psRefresher.Start(time.Hour)`: Starts a goroutine to refresh the
primary ParamSet (perhaps stored directly in the database or an
in-memory representation) hourly.
- `builders.GetCacheFromConfig`: If a distributed cache like Redis is
configured, this obtains a client for it.
- `psrefresh.NewCachedParamSetRefresher`: Wraps the primary `psRefresher`
with a caching layer.
- `cacheParamSetRefresher.StartRefreshRoutine(redisCacheRefreshPeriod)`:
Starts another goroutine that takes the ParamSet generated by
`psRefresher` and populates the external cache (e.g., Redis) at
`redisCacheRefreshPeriod` intervals (e.g., every 4 hours). This provides
a faster lookup path for frequently accessed ParamSet data.
Workflow:
```
Trace Data --> DataFrameBuilder --> ParamSetRefresher (generates primary ParamSet)
|
v
CachedParamSetRefresher --> External Cache (e.g., Redis)
```
**6. Old Data Deletion (`deletion/deleter.go`, `maintenance.go`)**
- **Why:** Over time, Perf accumulates a large amount of data, including
regression information and associated shortcuts (which are often links or
identifiers for specific data views). To manage storage costs and maintain
system performance, very old data that is unlikely to be accessed needs to
be periodically deleted.
- **How:**
- Controlled by the `flags.DeleteShortcutsAndRegressions` flag.
- **`deletion.New(db, ...)`:** Initializes a `Deleter` object. This object
encapsulates the logic for identifying and removing outdated regressions
and shortcuts. It takes a database connection pool (`db`) and the
datastore type. Internally, it creates instances of `sqlregressionstore`
and `sqlshortcutstore` to interact with the respective database tables.
- **`deleter.RunPeriodicDeletion(deletionPeriod, deletionBatchSize)`:**
This method in `maintenance.go` calls the `RunPeriodicDeletion` method
on the `Deleter` instance.
- Inside `deleter.go`, `RunPeriodicDeletion` starts a goroutine.
- This goroutine ticks at intervals specified by `deletionPeriod` (e.g.,
every 15 minutes).
- On each tick, it calls `d.DeleteOneBatch(deletionBatchSize)`.
- **`Deleter.DeleteOneBatch(shortcutBatchSize)`:**
* Calls `d.getBatch(ctx, shortcutBatchSize)` to identify a batch of
regressions and shortcuts eligible for deletion.
- **`Deleter.getBatch(...)`:**
- Finds the oldest commit number present in the `Regressions`
table.
- Iteratively queries the `Regressions` table for ranges of
commits, starting from the oldest.
- For each regression found, it checks the timestamp of its `Low`
and `High` `StepPoint`s.
- If a `StepPoint`'s timestamp is older than the defined `ttl`
(Time-To-Live, currently -18 months), the associated shortcut
and the commit number of the regression are marked for deletion.
- It continues collecting these until the number of shortcuts to
be deleted reaches approximately `shortcutBatchSize`.
- Returns the list of commit numbers whose regressions will be
deleted and the list of shortcut IDs to be deleted.
* Calls `d.deleteBatch(ctx, commitNumbers, shortcuts)` to perform the
actual deletion.
- **`Deleter.deleteBatch(...)`:**
- Starts a database transaction.
- Iterates through the `commitNumbers` and calls
`d.regressionStore.DeleteByCommit()` for each, removing the
regression data associated with that commit.
- Iterates through the `shortcuts` and calls
`d.shortcutStore.DeleteShortcut()` for each, removing the
shortcut entry.
- If all deletions are successful, it commits the transaction. If
any error occurs, it rolls back the transaction to ensure data
consistency.
Deletion Workflow:
```
Timer (every deletionPeriod) --> DeleteOneBatch
|
v
getBatch (identifies old data based on TTL)
|
| Returns (commit_numbers_to_delete, shortcut_ids_to_delete)
v
deleteBatch (deletes in a transaction)
|
+--> RegressionStore.DeleteByCommit
+--> ShortcutStore.DeleteShortcut
```
The `ttl` variable in `deleter.go` is set to -18 months, meaning regressions
and their associated shortcuts older than 1.5 years are targeted for
deletion. This value was determined based on stakeholder requirements for
data retention.
The `select {}` at the end of the `Start` function in `maintenance.go` is a
common Go idiom to make the main goroutine (the one that called `Start`) block
indefinitely. Since all the actual work is done in background goroutines
launched by `Start`, this prevents the `Start` function from returning and thus
keeps the maintenance processes alive.
# Module: /go/notify
The `notify` module in Perf is responsible for handling notifications related to
performance regressions. It provides a flexible framework for formatting and
sending notifications through various channels like email, issue trackers, or
custom endpoints like Chromeperf.
**Core Concepts and Design:**
The notification system is built around a few key abstractions:
1. **`Notifier` Interface (`notify.go`):** This is the central interface for
sending notifications. It defines methods for:
- `RegressionFound`: Called when a new regression is detected.
- `RegressionMissing`: Called when a previously detected regression is no
longer found (e.g., due to new data or fixes).
- `ExampleSend`: Used for sending test/dummy notifications to verify
configuration.
- `UpdateNotification`: For updating an existing notification (e.g.,
adding a comment to an issue).
2. **`Formatter` Interface (`notify.go`):** This interface is responsible for
constructing the content (body and subject) of a notification.
Implementations exist for:
- `HTMLFormatter` (`html.go`): Generates HTML-formatted notifications,
suitable for email.
- `MarkdownFormatter` (`markdown.go`): Generates Markdown-formatted
notifications, suitable for issue trackers or other systems that support
Markdown. The formatters use Go's `text/template` package, allowing for
customizable notification messages. Templates can access a
`TemplateContext` (or `AndroidBugTemplateContext` for Android-specific
notifications) which provides data about the regression, commit, alert,
etc.
3. **`Transport` Interface (`notify.go`):** This interface defines how a
formatted notification is actually sent. Implementations include:
- `EmailTransport` (`email.go`): Sends notifications via email using the
`emailclient` module.
- `IssueTrackerTransport` (`issuetracker.go`): Interacts with an issue
tracking system (configured for Google's Issue Tracker/Buganizer) to
create or update issues. It uses the `go/issuetracker/v1` client and
requires an API key for authentication.
- `NoopTransport` (`noop.go`): A "do nothing" implementation, useful for
disabling notifications or for testing.
4. **`NotificationDataProvider` Interface (`notification_provider.go`):** This
interface is responsible for gathering the necessary data to populate the
notification templates.
- The `defaultNotificationDataProvider` uses a `Formatter` to generate the
notification body and subject based on `RegressionMetadata`.
- `androidNotificationProvider` (`android_notification_provider.go`) is a
specialized provider for Android-specific bug reporting. It uses its own
`AndroidBugTemplateContext` which includes Android-specific details like
Build ID diff URLs. It leverages the `MarkdownFormatter` for content
generation but with Android-specific templates.
**Workflow for Sending a Notification (Simplified):**
1. A regression is detected (e.g., by the `alerter` module).
2. The `Notifier`'s `RegressionFound` method is called with details about the
regression (commit, alert configuration, cluster summary, etc.).
3. The `Notifier` (typically `defaultNotifier`) uses its
`NotificationDataProvider` to get the raw notification data (body and
subject).
- The `NotificationDataProvider` populates a context object (e.g.,
`TemplateContext` or `AndroidBugTemplateContext`).
- It then uses a `Formatter` (e.g., `MarkdownFormatter`) to execute the
appropriate template with this context, producing the final body and
subject.
4. The `Notifier` then calls its `Transport`'s `SendNewRegression` method,
passing the formatted body and subject.
5. The `Transport` implementation handles the actual sending (e.g., makes an
API call to the issue tracker or sends an email).
```
Regression Detected --> Notifier.RegressionFound(...)
|
v
NotificationDataProvider.GetNotificationDataRegressionFound(...)
|
| (Populates Context, e.g., TemplateContext)
v
Formatter.FormatNewRegressionWithContext(...)
| (Uses Go templates)
v
Formatted Body & Subject
|
v
Transport.SendNewRegression(body, subject)
|
+------------------> EmailTransport --> Email Server
|
+------------------> IssueTrackerTransport --> Issue Tracker API
|
+------------------> NoopTransport --> (Does nothing)
```
**Key Files and Responsibilities:**
- **`notify.go`**:
- Defines the core interfaces: `Notifier`, `Formatter`, `Transport`.
- Provides the `defaultNotifier` implementation, which orchestrates the
notification process by composing a `NotificationDataProvider`,
`Formatter`, and `Transport`.
- Contains the `New()` factory function that constructs the appropriate
`Notifier` based on the `NotifyConfig`. This is the main entry point for
creating a notifier.
- Defines `TemplateContext` used by generic formatters.
- Includes logic in `getRegressionMetadata` to fetch additional
information like source file links from `TraceStore` if the alert is for
an individual trace.
- **`notification_provider.go`**:
- Defines the `NotificationDataProvider` interface.
- Provides `defaultNotificationDataProvider` which uses a generic
`Formatter`.
- The purpose is to abstract the data gathering logic for notifications,
allowing for different data providers (like the Android-specific one)
without changing the core `Notifier` or `Transport` mechanisms.
- **`android_notification_provider.go`**:
- Implements `NotificationDataProvider` specifically for Android bug
creation.
- Uses `AndroidBugTemplateContext` to provide Android-specific data to
templates, such as `GetBuildIdUrlDiff` for generating links to compare
Android build CLs.
- Relies on `MarkdownFormatter` but configures it with Android-specific
notification templates defined in the `NotifyConfig`. This allows
Android teams to customize their bug reports.
- **`markdown.go` & `html.go`**:
- Implement the `Formatter` interface for Markdown and HTML respectively.
- Define default templates for new regressions and when regressions go
missing.
- `MarkdownFormatter` can be configured with custom templates via
`NotifyConfig`. It also provides a `buildIDFromSubject` template
function, specifically designed for Android's commit message format, to
extract build IDs.
- `viewOnDashboard` is a utility function to construct a URL to the Perf
explore page for the given regression.
- **`email.go` & `issuetracker.go` & `noop.go`**:
- Implement the `Transport` interface.
- `email.go`: Uses `emailclient` to send emails. Splits
comma/space-separated recipient lists.
- `issuetracker.go`: Interacts with the Google Issue Tracker API. It
requires API key secrets (configured via `NotifyConfig`) and uses OAuth2
for authentication. It can create new issues and update existing ones
(e.g., to mark them obsolete).
- `noop.go`: A null implementation for disabling notifications.
- **`chromeperfnotifier.go`**:
- Implements the `Notifier` interface directly, without using the
`Formatter` or `Transport` abstractions in the same way as
`defaultNotifier`. This is because it communicates directly with the
Chrome Performance Dashboard's Anomaly API.
- It translates Perf's regression data into the format expected by the
Chromeperf API (`ReportRegression`).
- Includes logic (`isParamSetValid`, `getTestPath`) to ensure the data
conforms to Chromeperf's requirements (e.g., specific param keys like
`master`, `bot`, `benchmark`, `test`).
- Determines if a regression is an improvement based on the
`improvement_direction` parameter and the step direction.
- **`commitrange.go`**:
- Provides `URLFromCommitRange`, a utility function to generate a URL for
a commit or a range of commits. If a `commitRangeURLTemplate` is
provided (e.g., via configuration), it will be used to create a URL
showing the diff between two commits. Otherwise, it defaults to the
individual commit's URL. This is used by formatters to create links in
notifications.
- **`common/notificationData.go`**:
- Defines `NotificationData` (simple struct for body and subject) and
`RegressionMetadata` (a comprehensive struct holding all relevant
information about a regression needed for notification generation). This
promotes a common data structure for passing regression details.
**Configuration and Customization (`NotifyConfig`):**
The behavior of the `notify` module is heavily influenced by
`config.NotifyConfig`. This configuration allows users to:
- Choose the notification type (`Notifications` field): `None`, `HTMLEmail`,
`MarkdownIssueTracker`, `ChromeperfAlerting`, `AnomalyGrouper`.
- Specify the `NotificationDataProvider`: `DefaultNotificationProvider` or
`AndroidNotificationProvider`.
- Customize the subject and body of notifications using Go templates
(`Subject`, `Body`, `MissingSubject`, `MissingBody`). This is particularly
relevant for `MarkdownFormatter` and `androidNotificationProvider`.
- Provide settings for `IssueTrackerTransport` (API key secret locations).
This design allows for flexibility in how notifications are generated and
delivered, catering to different needs and integrations. For instance, the
Android team can have highly customized bug reports, while other users might
prefer standard email notifications. The `ChromeperfNotifier` demonstrates a
direct integration with another system, bypassing some of the general-purpose
formatting/transport layers when a specific API is targeted.
# Module: /go/notifytypes
## Perf Notifytypes Module
### Overview
The `notifytypes` module in Perf defines the various types of notification
mechanisms that can be triggered in response to performance regressions or other
significant events. It also defines types for data providers that supply the
necessary information for these notifications. This module serves as a central
point for enumerating and categorizing notification strategies, enabling
flexible and extensible notification handling within the Perf system.
### Why: Design Decisions
The primary goal of this module is to provide a structured and type-safe way to
manage notification types.
- **Extensibility:** By defining notification types as constants of a custom
`Type` string, new notification methods can be easily added in the future
without requiring significant code changes in consuming modules. This
promotes loose coupling and allows the notification system to evolve
independently.
- **Clarity and Readability:** Using named constants (e.g., `HTMLEmail`,
`MarkdownIssueTracker`) instead of raw strings makes the code more
self-documenting and reduces the likelihood of errors due to typos.
- **Centralized Definition:** Having all notification types defined in one
place simplifies maintenance and provides a clear overview of the available
notification options.
- **Separation of Concerns:** The `NotificationDataProviderType` allows for
different sources or formats of data to be used for generating
notifications, separating the concern of _what_ data is needed from _how_
the notification is delivered. This is crucial, for example, when different
platforms (like Android) might require specific data formatting or
additional information.
### How: Implementation Choices
- **`Type` (string alias):** The `Type` is defined as an alias for `string`.
This allows for string-based storage and transmission of notification types
(e.g., in configuration files or database entries) while still providing a
degree of type safety within Go code.
- **Constants for Notification Types:** Specific notification mechanisms are
defined as constants of type `Type`. This ensures that only valid,
predefined notification types can be used.
- `HTMLEmail`: Indicates notifications sent as HTML-formatted emails. This
is suitable for rich content and direct user communication.
- `MarkdownIssueTracker`: Represents notifications formatted in Markdown,
intended for integration with issue tracking systems. This facilitates
automated ticket creation or updates.
- `ChromeperfAlerting`: Specifies that regression data should be sent to
the Chromeperf alerting system. This allows for integration with a
specialized alerting infrastructure.
- `AnomalyGrouper`: Designates that regressions should be processed by an
anomaly grouping logic, which then determines the appropriate action.
This enables more sophisticated handling of multiple related anomalies.
- `None`: A special type indicating that no notification should be sent.
This is useful for disabling notifications in certain contexts or for
configurations where alerting is not desired.
- **`AllNotifierTypes` Slice:** This public variable provides a convenient way
for other parts of the system to iterate over or validate against all known
notification types.
- **`NotificationDataProviderType` (string alias):** Similar to `Type`, this
defines the kind of data provider to use for notifications.
- `DefaultNotificationProvider`: Represents the standard or default data
provider.
- `AndroidNotificationProvider`: Indicates a specialized data provider
tailored for Android-specific notification requirements. This might
involve fetching different metrics, formatting data in a particular way,
or including Android-specific metadata.
### Responsibilities and Key Components
- **`notifytypes.go`:** This is the sole file in the module and contains all
the definitions.
- **Defines Notification Types:** Its primary responsibility is to
enumerate the supported notification mechanisms (`HTMLEmail`,
`MarkdownIssueTracker`, `ChromeperfAlerting`, `AnomalyGrouper`, `None`).
This acts as a contract for other modules that implement or consume
notification functionalities.
- **Defines Data Provider Types:** It also defines the types of data
providers (`DefaultNotificationProvider`, `AndroidNotificationProvider`)
that can be used to source information for notifications. This allows
the notification system to adapt to different data sources or formats.
- **Provides an Exhaustive List:** The `AllNotifierTypes` variable makes
it easy for other components to get a list of all valid notification
types, for example, for display in a UI or for validation purposes.
### Key Workflows/Processes
While this module itself doesn't implement workflows, it underpins them. A
typical conceptual workflow where these types would be used is:
1. **Regression Detected:** The Perf system identifies a performance
regression. `Regression Event -->`
2. **Configuration Checked:** The system checks the configuration associated
with the metric/test that regressed. This configuration would specify a
`notifytypes.Type`. `Configuration Lookup (specifies notifytypes.Type, e.g.,
HTMLEmail) -->`
3. **Notifier Selected:** Based on the `notifytypes.Type` from the
configuration, the appropriate notifier implementation is selected.
`Notification System -->`
4. **Data Provider Selected (if applicable):** If the configuration also
specifies a `notifytypes.NotificationDataProviderType`, the corresponding
data provider is chosen. `Data Provider (e.g., AndroidNotificationProvider)
-->`
5. **Notification Sent:** The selected notifier uses the data (potentially from
the selected data provider) to construct and send the notification.
`Notification Delivered (e.g., Email Sent)`
For example, if a regression is detected for an Android benchmark and the
configuration specifies `HTMLEmail` as the `Type` and
`AndroidNotificationProvider` as the `NotificationDataProviderType`:
`Regression Event` -> `Config: {Type: HTMLEmail, DataProvider:
AndroidNotificationProvider}` -> `Select EmailNotifier` -> `Select
AndroidDataProvider` -> `AndroidDataProvider fetches data` -> `EmailNotifier
formats and sends HTML email`
# Module: /go/perf-tool
The `perf-tool` module provides a command-line interface (CLI) for interacting
with various aspects of the Perf performance monitoring system. It allows
developers and administrators to manage configurations, inspect data, perform
database maintenance tasks, and validate ingestion files.
The primary motivation behind `perf-tool` is to offer a centralized and
scriptable way to perform common Perf operations that would otherwise require
manual intervention or direct database interaction. This simplifies workflows
and enables automation of routine tasks.
The core functionality is organized into subcommands, each addressing a specific
area of Perf:
- **`config`**: Manages Perf instance configurations.
- `create-pubsub-topics-and-subscriptions`: Sets up the necessary Google
Cloud Pub/Sub topics and subscriptions required for data ingestion. This
is crucial for ensuring that Perf instances can receive and process
performance data.
- `validate`: Checks the syntax and validity of a Perf instance
configuration file. This helps prevent deployment of misconfigured
instances.
- **`tiles`**: Interacts with the tiled data storage used by Perf's
`tracestore`. Tiles are segments of time-series data.
- `last`: Displays the index of the most recent tile, providing insight
into the current state of data ingestion.
- `list`: Shows a list of recent tiles and the number of traces they
contain, useful for understanding data volume and distribution.
- **`traces`**: Allows querying and exporting trace data.
- `list`: Retrieves and displays the IDs of traces that match a given
query within a specific tile. This is useful for ad-hoc data
exploration.
- `export`: Exports trace data matching a query and commit range to a JSON
file. This enables external analysis or data migration.
- **`ingest`**: Manages the data ingestion process.
- `force-reingest`: Triggers the re-ingestion of data files from Google
Cloud Storage (GCS) for a specified time range. This is useful for
reprocessing data after configuration changes or to fix ingestion
errors. The workflow is:
* Parse start and stop time parameters.
* Iterate through configured GCS source prefixes.
* For each prefix, determine hourly GCS directories within the time range.
* List files in each directory.
* For each file, create a Pub/Sub message with the GCS object attributes
(bucket and name).
* Publish these messages to the configured ingestion topic. This simulates
the GCS notification events that trigger ingestion.
- `validate`: Validates the format and content of an ingestion file
against the expected schema and parsing rules. This helps ensure data
quality before ingestion.
- **`database`**: Provides tools for backing up and restoring Perf database
components. This is critical for disaster recovery and data migration.
- `backup`:
- `alerts`: Backs up alert configurations to a zip file.
- `shortcuts`: Backs up saved shortcut configurations to a zip file.
- `regressions`: Backs up regression data (detected performance changes)
and associated shortcuts to a zip file. It backs up data up to a
specified date (defaulting to four weeks ago). The process involves
iterating backward through commits in batches, fetching regressions for
each commit range, and storing them along with any shortcuts referenced
in those regressions.
- `restore`:
- `alerts`: Restores alert configurations from a backup file.
- `shortcuts`: Restores shortcut configurations from a backup file.
- `regressions`: Restores regression data and their associated shortcuts
from a backup file. It's important to note that restoring regressions
also attempts to re-create the associated shortcuts.
- **`trybot`**: Contains experimental functionality related to trybot
(pre-submit testing) data.
- `reference`: Generates a synthetic nanobench reference file. This file
is constructed by loading a specified trybot results file, identifying
all trace IDs within it, and then fetching historical sample data for
these traces from the main Perf instance (specifically, from the last N
ingested files). The aggregated historical samples are then formatted
into a new nanobench JSON file. This allows for comparing trybot results
against a baseline derived from recent production data using tools like
`nanostat`.
- **`markdown`**: Generates Markdown documentation for the `perf-tool` CLI
itself.
The `main.go` file sets up the CLI application using the `urfave/cli` library.
It defines flags, commands, and subcommands, and maps them to corresponding
functions in the `application` package. It handles flag parsing, configuration
loading (from a file, with optional connection string overrides), and
initialization of logging.
The `application/application.go` file defines the `Application` interface and
its concrete implementation `app`. This interface abstracts the core logic for
each command, promoting testability and separation of concerns. The `app` struct
implements methods that interact with various Perf components like `tracestore`,
`alertStore`, `shortcutStore`, `regressionStore`, and GCS.
Key design choices include:
- **Interface-based application logic (`Application` interface):** This allows
for mocking the application logic during testing (as seen in `main_test.go`
and `application/mocks/Application.go`), ensuring that the CLI command
parsing and flag handling can be tested independently of the actual backend
operations.
- **Configuration-driven:** Most operations require an instance configuration
file (`--config_filename`), which defines data store connections, GCS
sources, etc. This makes the tool adaptable to different Perf deployments.
- **Use of helper builders:** Functions from `perf/go/builders` are used to
instantiate components like `TraceStore`, `AlertStore`, etc., based on the
provided instance configuration. This centralizes component creation logic.
- **Zip format for backups:** Database backups for alerts, shortcuts, and
regressions are stored in zip files. Inside these zip files, data is
typically serialized using `encoding/gob`. This provides a simple and
portable backup solution.
- **Batching for large operations:** When backing up regressions, data is
fetched in batches of commits (`regressionBatchSize`) to manage memory and
avoid overwhelming the database.
- **Pub/Sub for re-ingestion:** The `ingest force-reingest` command leverages
Pub/Sub by publishing messages that mimic GCS notifications, effectively
triggering the standard ingestion pipeline.
The `application/mocks/Application.go` file contains a mock implementation of
the `Application` interface, generated by the `mockery` tool. This is used in
`main_test.go` to test the command-line argument parsing and dispatch logic
without actually performing the underlying operations.
# Module: /go/perfclient
The `perfclient` module provides an interface for sending performance data to
Skia Perf's ingestion system. The primary goal of this module is to abstract the
complexities of interacting with Google Cloud Storage (GCS), which is the
underlying mechanism Perf uses for data ingestion. By providing a dedicated
client, it simplifies the process for other applications and services that need
to report performance metrics.
The core design centers around a `ClientInterface` and its concrete
implementation, `Client`. This approach allows for easy mocking and testing,
promoting loose coupling between the `perfclient` and its consumers.
**Key Components and Responsibilities:**
- **`perf_client.go`**:
- **`ClientInterface`**: This interface defines the contract for pushing
performance data. The key method is `PushToPerf`. The decision to use an
interface here is crucial for testability and dependency injection. It
allows consumers to use a real GCS-backed client in production and a
mock client in tests.
- **`Client`**: This struct is the concrete implementation of
`ClientInterface`. It holds a `gcs.GCSClient` instance, which is
responsible for the actual communication with Google Cloud Storage, and
a `basePath` string that specifies the root directory within the GCS
bucket where performance data will be stored. The constructor `New`
takes these as arguments, allowing users to configure the GCS bucket and
the top-level folder for their data.
- **`PushToPerf` method**: This is the workhorse of the module.
* It takes a `time.Time` object (`now`), a `folderName`, a `filePrefix`,
and a `format.BenchData` struct (which represents the performance
metrics).
* The `format.BenchData` is first marshaled into a JSON string. This is
the standard format Perf expects for ingestion.
* The JSON data is then compressed using `gzip`. This is a performance
optimization, as GCS can automatically decompress gzipped files with the
correct `ContentEncoding` header, reducing storage costs and transfer
times.
* A deterministic GCS object path is constructed using the `objectPath`
helper function. This path incorporates the `basePath`, the current
timestamp (formatted as `YYYY/MM/DD/HH/`), the `folderName`, and a
filename composed of the `filePrefix`, an MD5 hash of the JSON data, and
a millisecond-precision timestamp. The inclusion of the MD5 hash helps
in avoiding duplicate uploads of identical data and can be useful for
debugging or data verification. The timestamp in the path and filename
ensures that data from different runs or times are stored separately and
can be easily queried.
* Finally, the compressed data is uploaded to GCS using the
`storageClient.SetFileContents` method. Crucially, it sets
`ContentEncoding: "gzip"` and `ContentType: "application/json"` in the
`gcs.FileWriteOptions`. This metadata informs GCS about the compression
and data type, enabling features like automatic decompression.
- **`objectPath` function**: This helper function is responsible for
constructing the unique GCS path for each performance data file. The
rationale for this specific path structure
(`basePath/YYYY/MM/DD/HH/folderName/filePrefix_hash_timestamp.json`) is
to organize data chronologically and by task, making it easier to
browse, query, and manage within GCS. The hash ensures uniqueness and
integrity.
- **`mock_perf_client.go`**:
- **`MockPerfClient`**: This provides a mock implementation of
`ClientInterface` using the `testify/mock` library. This is essential
for unit testing components that depend on `perfclient` without
requiring actual GCS interaction. It allows developers to define
expected calls to `PushToPerf` and verify that their code interacts with
the client correctly. The `NewMockPerfClient` constructor returns a
pointer to ensure that the methods provided by `mock.Mock` (like `On`
and `AssertExpectations`) are accessible.
**Workflow: Pushing Performance Data**
The primary workflow involves a client application using `perfclient` to send
performance data:
```
Client App perfclient.Client gcs.GCSClient
| | |
| -- Call PushToPerf(now, | |
| folder, prefix, data) ->| |
| | -- Marshal data to JSON |
| | -- Compress JSON (gzip) |
| | -- Construct GCS objectPath |
| | (includes time, folder, |
| | prefix, data hash) |
| | |
| | -- Call SetFileContents(path, |
| | options, compressed_data) -> |
| | | -- Upload to GCS
| | | with gzip encoding
| | | and JSON content type
| | <-------------------------------| -- Return success/error
| <--------------------------| |
| -- Receive success/error | |
```
The design emphasizes creating a clear separation of concerns: the `perfclient`
handles the formatting, compression, and path generation logic specific to
Perf's ingestion requirements, while the underlying `gcs.GCSClient` handles the
raw GCS communication. This makes the `perfclient` a focused and reusable
component for any system needing to integrate with Skia Perf.
# Module: /go/perfresults
## Module Overview
The `perfresults` module is responsible for fetching, parsing, and processing
performance results data generated by Telemetry-based benchmarks in the Chromium
project. This data typically resides in `perf_results.json` files. The module
provides functionalities to:
1. **Load Performance Data**: Retrieve performance results from various
sources, primarily Buildbucket builds. This involves interacting with
Buildbucket to get build information, Swarming to identify relevant tasks
and their outputs, and RBE-CAS (Content Addressable Storage) to download the
actual `perf_results.json` files.
2. **Parse Performance Data**: Interpret the structure of `perf_results.json`
files. These files contain sets of histograms, where each histogram
represents a specific benchmark measurement. The parser extracts these
histograms and associated metadata.
3. **Process and Transform Data**: Convert the parsed performance data into a
format suitable for ingestion by other systems, such as the Perf ingestion
pipeline. This includes aggregating histogram samples (e.g., calculating
mean, max, min) and structuring the data according to a defined schema.
The primary goal is to provide a reliable and efficient way to access and
utilize Chromium's performance data for analysis and monitoring.
## Design Decisions and Implementation Choices
### Data Loading Workflow
The process of loading performance results from a Buildbucket build involves
several steps:
`Buildbucket ID -> BuildInfo -> Swarming Task ID -> Child Swarming Task IDs ->
CAS Outputs -> PerfResults`
1. **Buildbucket Interaction (`buildbucket.go`)**:
- **Why**: Buildbucket is the entry point for CI/CQ builds. It contains
information about the build, including the associated Swarming task and
crucial metadata like git revision and commit position.
- **How**: The `bbClient` interacts with the Buildbucket PRPC API to fetch
build details using a given `buildID`. It specifically requests fields
like `builder`, `status`, `infra.backend.task.id` (for the Swarming task
ID), `output.properties` (for git revision information), and
`input.properties` (for `perf_dashboard_machine_group`).
- The `BuildInfo` struct is populated with this information, providing a
consolidated view of the build's context. The `GetPosition()` method on
`BuildInfo` is crucial as it determines the commit identifier (either
commit position or git hash) used for associating the performance data
with a specific point in the codebase.
2. **Swarming Interaction (`swarming.go`)**:
- **Why**: The main Buildbucket task often spawns multiple child Swarming
tasks, each running a subset of benchmarks. We need to identify all
these child tasks to gather all performance results.
- **How**: The `swarmingClient` uses the Swarming PRPC API.
- `findChildTaskIds`: Given a parent Swarming task ID (obtained from
`BuildInfo`), this function lists all child tasks by querying for
tasks with a matching `parent_task_id` tag. The query is scoped by
the parent task's creation and completion timestamps to narrow down
the search.
- `findTaskCASOutputs`: For each child task ID, this function
retrieves the task result, specifically looking for the
`CasOutputRoot`. This reference points to the RBE-CAS location where
the task's output files (including `perf_results.json`) are stored.
3. **RBE-CAS Interaction (`rbecas.go`)**:
- **Why**: `perf_results.json` files are stored in RBE-CAS. RBE-CAS
provides efficient and reliable storage for large build artifacts.
- **How**: The `RBEPerfLoader` uses the RBE SDK to interact with CAS.
- `fetchPerfDigests`: Given a CAS reference (pointing to the root
directory of a task's output), this function:
* Reads the root `Directory` proto.
* Retrieves the entire directory tree using `GetDirectoryTree`.
* Flattens the tree to get a map of file paths to their digests.
* Filters for files named `perf_results.json`. The path structure is
expected to be `benchmark_name/perf_results.json`, allowing
association of results with a specific benchmark.
- `loadPerfResult`: Given a digest for a `perf_results.json` file,
this reads the blob from CAS and parses it using `NewResults`.
- `LoadPerfResults`: This orchestrates the loading for multiple CAS
references (from multiple child Swarming tasks). It iterates through
each CAS reference, fetches the digests of `perf_results.json`
files, loads each file, and then merges results from the same
benchmark. Merging is important because a single benchmark might
have its results split across multiple files or tasks.
4. **Orchestration (`perf_loader.go`)**:
- **Why**: A central loader is needed to tie together the interactions
with Buildbucket, Swarming, and RBE-CAS.
- **How**: The `loader.LoadPerfResults` method coordinates the entire
workflow:
1. Initializes `bbClient` to get `BuildInfo`.
2. Initializes `swarmingClient` to find child task IDs and then their
CAS outputs.
3. It performs a sanity check (`checkCasInstances`) to ensure all CAS
outputs come from the same RBE instance, simplifying client
initialization.
4. Initializes `RBEPerfLoader` (via `rbeProvider` for testability) for
the determined CAS instance.
5. Calls `RBEPerfLoader.LoadPerfResults` with the list of CAS
references to fetch and parse all `perf_results.json` files.
- The use of `rbeProvider` is a good example of dependency injection,
allowing tests to mock the RBE-CAS interaction.
### Performance Data Parsing (`perf_results_parser.go`)
- **Why**: `perf_results.json` files have a specific, somewhat complex
structure. A dedicated parser is needed to extract meaningful data
(histograms and their metadata).
- **How**:
- The `PerfResults` struct is the main container, holding a map of
`TraceKey` to `Histogram`.
- `TraceKey` uniquely identifies a trace, composed of `ChartName` (metric
name), `Unit`, `Story` (user journey/test case), `Architecture`, and
`OSName`. These fields are extracted from the histogram's own properties
and its associated "diagnostics" which are references to other metadata
objects within the JSON file.
- `Histogram` stores the `SampleValues` (the actual measurements).
- **Streaming JSON Decoding**: `NewResults` uses `json.NewDecoder` to
process the input `io.Reader` in a streaming fashion.
- **Why Streaming?**: `perf_results.json` files can be very large (10MB+).
Reading the entire file into memory before parsing would be inefficient
and could lead to high memory usage. Streaming allows processing the
JSON array element by element.
- **Implementation**:
1. It first expects and consumes the opening `[` of the JSON array.
2. It then iterates while `decoder.More()` is true, decoding each
element into a `singleEntry` struct.
3. `singleEntry` is a union-like struct that can hold different types
of objects found in the JSON (histograms, generic sets, date ranges,
related name maps). This is determined by checking fields like
`Name` (present for histograms) or `Type`.
4. If an entry is a histogram (`entry.Name != ""`), it's converted to
`TraceKey` and `Histogram` via
`histogramRaw.asTraceKeyAndHistogram`. This conversion involves
looking up GUIDs from the histogram's `Diagnostics` map in a locally
maintained `metadata` map (`md`).
5. Other entry types (`GenericSet`, `DateRange`, `RelatedNameMap`) are
stored in the `md` map, keyed by their `GUID`, so they can be
referenced by histograms later in the stream.
6. Parsed histograms are merged into `pr.Histograms`. If a `TraceKey`
already exists, sample values are appended.
7. Finally, it consumes the closing `]` of the JSON array.
- **Aggregation**: The `Histogram` type provides methods for common
aggregations (Min, Max, Mean, Stddev, Sum, Count). `AggregationMapping`
provides a convenient way to access these aggregation functions by
string keys, which is used by downstream consumers like the ingestion
module.
- **Legacy `UnmarshalJSON`**: An `UnmarshalJSON` method exists, which
reads the entire byte slice into memory. This is less efficient and
marked for deprecation in favor of `NewResults`.
### Data Ingestion Preparation (`ingest/`)
This submodule focuses on transforming the parsed `PerfResults` into the
`format.Format` structure required by the Perf ingestion system.
- **`json.go` (`ConvertPerfResultsFormat`)**:
- **Why**: The raw `PerfResults` structure is not directly ingestible. It
needs to be reshaped.
- **How**:
* Iterates through each `(TraceKey, Histogram)` pair in the input
`PerfResults`.
* For each pair, it creates a `format.Result`. The `Key` map within
`format.Result` is populated from `TraceKey` fields (chart, unit, story,
arch, os).
* The `Measurements` map within `format.Result` is populated by calling
`toMeasurement` on the `Histogram`.
* `toMeasurement` iterates through `perfresults.AggregationMapping`,
applying each aggregation function to the histogram's samples. Each
resulting aggregation (e.g., "max", "mean") becomes a
`format.SingleMeasurement` with the aggregation type as its `Value` and
the computed metric as its `Measurement`.
* The final `format.Format` object includes the version, commit hash
(`GitHash`), and any provided headers and links.
- **`gcs.go`**:
- **Why**: Provides utilities for determining the correct Google Cloud
Storage (GCS) path where the transformed JSON files should be stored.
This is based on conventions used by the Perf ingestion system.
- **How**:
- `convertPath`: Constructs a GCS path like
`gs://<bucket>/ingest/<time_path>/<build_info_path>/<benchmark>`.
- `convertTime`: Formats a `time.Time` into `YYYY/MM/DD/HH` (UTC).
- `convertBuildInfo`: Formats `BuildInfo` into
`<MachineGroup>/<BuilderName>`. It defaults `MachineGroup` to
"ChromiumPerf" and `BuilderName` to "BuilderNone" if they are empty.
- `isInternal`: Determines if the results are internal or public based on
the `BuilderName`. It checks against a list of known external bot
configurations (`pinpoint/go/bot_configs`). If not found, it defaults to
internal. This determines whether `PublicBucket` (`chrome-perf-public`)
or `InternalBucket` (`chrome-perf-non-public`) is used.
## Key Components and Files
- **`perf_loader.go`**: Orchestrates the loading of performance results from
Buildbucket. `NewLoader().LoadPerfResults()` is the main entry point.
- **`buildbucket.go`**: Handles interaction with the Buildbucket API to fetch
build metadata. Defines `BuildInfo`.
- **`swarming.go`**: Handles interaction with the Swarming API to find child
tasks and their CAS outputs.
- **`rbecas.go`**: Handles interaction with RBE-CAS to download and parse
`perf_results.json` files. Defines `RBEPerfLoader`.
- **`perf_results_parser.go`**: Parses the content of `perf_results.json`
files. Defines `PerfResults`, `TraceKey`, `Histogram`, and the streaming
`NewResults` parser.
- **`ingest/json.go`**: Transforms parsed `PerfResults` into the
`format.Format` structure for ingestion.
- **`ingest/gcs.go`**: Provides utilities to determine GCS paths for storing
transformed results.
- **`cli/main.go`**: A command-line interface utility that uses the
`perfresults` library to fetch results for a given Buildbucket ID and
outputs them as JSON files in the ingestion format. This serves as a
practical example and a tool for ad-hoc data retrieval.
- **`testdata/`**: Contains JSON files used for replaying HTTP and gRPC
interactions during tests (`*.json`, `*.rpc`), and sample
`perf_results.json` files for parser testing. `replay_test.go` sets up the
replay mechanism.
## Workflows
### Primary Workflow: Loading Perf Results from Buildbucket
```
User/System --Buildbucket ID--> perf_loader.LoadPerfResults()
|
+--> buildbucket.findBuildInfo() --PRPC call--> Buildbucket API
| (Returns BuildInfo: Swarming Task ID, Git Revision, Machine Group, etc.)
|
+--> swarming.findChildTaskIds() --PRPC call--> Swarming API (using Parent Task ID)
| (Returns list of Child Swarming Task IDs)
|
+--> swarming.findTaskCASOutputs() --PRPC calls--> Swarming API (for each Child Task ID)
| (Returns list of CASReference objects)
|
(Error if CAS instances differ for CASReferences)
|
+--> rbecas.RBEPerfLoader.LoadPerfResults() (with list of CASReferences)
|
+--> For each CASReference:
| |
| +--> rbecas.fetchPerfDigests() --RBE SDK calls--> RBE-CAS
| | (Returns map of benchmark_name to digest of perf_results.json)
| |
| +--> For each (benchmark_name, digest):
| |
| +--> rbecas.loadPerfResult() --RBE SDK call (ReadBlob)--> RBE-CAS
| | |
| | +--> perf_results_parser.NewResults() (Parses JSON stream)
| | (Returns PerfResults object for this file)
| |
| +--> (Merge with existing PerfResults for the same benchmark_name)
|
(Returns map[benchmark_name]*PerfResults and BuildInfo)
```
### CLI Workflow: Fetching and Converting Perf Results
```
CLI User --Build ID, Output Dir--> cli/main.main()
|
+--> perfresults.NewLoader().LoadPerfResults(Build ID)
| (Executes the Primary Workflow described above)
| (Returns BuildInfo, map[benchmark]*PerfResults)
|
+--> For each (benchmark, perfResult) in results:
|
+--> ingest.ConvertPerfResultsFormat(perfResult, buildInfo.GetPosition(), headers, links)
| (Transforms PerfResults to ingest.Format)
|
+--> Marshal ingest.Format to JSON
|
+--> Write JSON to output file: <outputDir>/<benchmark>_<BuildID>.json
|
+--> Print output filename to stdout
```
### Temporal Worker (Placeholder)
The `workflows/worker/main.go` file sets up a Temporal worker. Currently, it's a
basic skeleton that initializes a worker and connects to a Temporal server. It
doesn't register any specific activities or workflows from the `perfresults`
module itself. Its presence suggests an intention to integrate `perfresults`
functionalities into Temporal workflows in the future, possibly for automated
ingestion or processing tasks. The worker itself is a generic Temporal worker
setup.
## Testing Strategy
The module employs a robust testing strategy:
- **Unit Tests**: Each Go file generally has a corresponding `_test.go` file
with unit tests for its specific logic. For example,
`perf_results_parser_test.go` tests the JSON parsing, and
`buildbucket_test.go` tests `BuildInfo` logic.
- **Replay Testing (`replay_test.go`, `testdata/`)**:
- **Why**: Directly calling external services (Buildbucket, Swarming,
RBE-CAS) in tests makes them slow, flaky, and dependent on external
state. Replay testing records actual interactions once and then
"replays" them during subsequent test runs.
- **How**:
- HTTP interactions (with Buildbucket and Swarming PRPC servers) are
replayed using `cloud.google.com/go/httpreplay`. Recorded interactions
are stored as `.json` files in `testdata/`.
- gRPC interactions (with RBE-CAS) are replayed using
`cloud.google.com/go/rpcreplay`. Recorded interactions are stored as
gzipped `.rpc` files in `testdata/`.
- A command-line flag (`-record_path`) controls whether tests run in
replay mode (reading from `testdata/`) or record mode (writing new
replay files to the specified path). This allows updating replay files
when external APIs change or new test cases are needed.
- `setupReplay()` and `newRBEReplay()` in `replay_test.go` are helper
functions that configure the HTTP client and RBE client for either
recording or replaying.
- **Test Data (`testdata/perftest/`)**: Contains various `perf_results.json`
files (e.g., `full.json`, `empty.json`, `merged.json`) to test different
scenarios for the `perf_results_parser.go`. This ensures the parser
correctly handles different valid and edge-case inputs.
- **Example Usage as Test (`cli/main.go`)**: The CLI itself serves as an
integration test for the core loading and conversion logic. Its tests
(`perf_loader_test.go` for example) often use the replay mechanism to test
the end-to-end flow from Build ID to parsed `PerfResults`.
This combination ensures both isolated unit correctness and reliable integration
testing without external dependencies during typical test runs.
# Module: /go/perfserver
The `perfserver` module serves as the central executable for the Perf
performance monitoring system. It consolidates various essential components into
a single command-line tool, simplifying deployment and management. The primary
goal is to provide a unified entry point for running the web UI, data ingestion
processes, regression detection, and maintenance tasks. This approach avoids the
complexity of managing multiple separate services and their configurations.
The module leverages the `urfave/cli` library to define and manage sub-commands,
each corresponding to a distinct functional area of Perf. This design allows for
clear separation of concerns while maintaining a single binary. Configuration
for each sub-command is handled through flags, with the `config` package
providing structured types for these flags.
Key components and their responsibilities:
- **`main.go`**: This is the entry point of the `perfserver` executable.
- **Why**: It orchestrates the initialization and execution of the
different Perf sub-systems.
- **How**: It defines a `cli.App` with several sub-commands:
- **`frontend`**: This sub-command launches the main web user interface
for Perf.
- **Why**: To provide users with a visual way to explore performance
data, configure alerts, and view regressions.
- **How**: It initializes and runs the `frontend` component (from
`//perf/go/frontend`). Configuration is passed via
`config.FrontendFlags`. The `frontend` component itself handles
serving HTTP requests and rendering the UI.
- **`maintenance`**: This sub-command starts background maintenance tasks.
- **Why**: Certain operations, like data cleanup, schema migrations,
or periodic recalculations, are necessary for the long-term health
and efficiency of the Perf system. These tasks often need to be run
as singletons to avoid conflicts.
- **How**: It initializes and runs the `maintenance` component (from
`//perf/go/maintenance`). It first validates the instance
configuration (using `//perf/go/config/validate`) and then starts
the maintenance routines. Prometheus metrics are exposed for
monitoring.
- **`ingest`**: This sub-command runs the data ingestion process.
- **Why**: To continuously import performance data from various
sources (e.g., build artifacts, test results) and populate the
central data store (TraceStore).
- **How**: It initializes and runs the ingestion process logic (from
`//perf/go/ingest/process`). Similar to `maintenance`, it validates
the instance configuration. It supports parallel ingestion for
improved throughput. Prometheus metrics are also exposed.
- Data Ingestion Workflow: `Configured Sources --> [Ingest Process]
--Parses/Validates--> [TraceStore] | Handles incoming files
Populates data`
- **`cluster`**: This sub-command runs the regression detection process.
- **Why**: To automatically analyze incoming performance data against
configured alerts and identify significant performance regressions.
- **How**: Interestingly, this sub-command also utilizes the
`frontend.New` and `f.Serve()` mechanism, similar to the `frontend`
sub-command. This suggests that the regression detection logic might
be tightly coupled with or exposed through the same underlying
service framework as the main UI, potentially for sharing
configuration or common infrastructure. It uses
`config.FrontendFlags` but specifically for clustering-related
settings (indicated by `AsCliFlags(true)`).
- Regression Detection Workflow: `[TraceStore] --New Data--> [Cluster
Process] --Applies Alert Rules--> [Alerts/Notifications] ^ | |
Identifies Regressions +-------------------------+`
- **`markdown`**: A utility sub-command to generate Markdown documentation
for `perfserver` itself.
- **Why**: To provide up-to-date command-line help in a portable
format.
- **How**: It uses the `ToMarkdown()` method provided by the
`urfave/cli` library.
- **Logging**: The `Before` hook in the `cli.App` configures `sklog` to
output logs to standard output, ensuring that operational messages from
any sub-command are visible.
- **Configuration Loading**: For sub-commands like `ingest` and
`maintenance`, instance configuration is loaded from a specified file
(`ConfigFilename` flag) and validated using `//perf/go/config/validate`.
The database connection string can be overridden via a command-line
flag.
- **Metrics**: The `ingest` and `maintenance` sub-commands initialize
Prometheus metrics, allowing for monitoring of their operational health
and performance.
The design emphasizes modularity by delegating the core logic of each function
(UI, ingestion, clustering, maintenance) to dedicated packages
(`//perf/go/frontend`, `//perf/go/ingest/process`, `//perf/go/maintenance`).
`perfserver` acts as the conductor, parsing command-line arguments, loading
appropriate configurations, and invoking the correct sub-system. This structure
makes the overall Perf system more maintainable and easier to understand, as
each component has a well-defined responsibility.
# Module: /go/pinpoint
The `/go/pinpoint` module provides a Go client for interacting with the Pinpoint
service, which is part of Chromeperf. Pinpoint is a performance testing and
analysis tool used to identify performance regressions and improvements. This
client enables other Go applications within the Skia infrastructure to
programmatically trigger Pinpoint jobs.
**Core Functionality:**
The primary purpose of this module is to abstract the complexities of making
HTTP requests to the Pinpoint API. It handles authentication, request
formatting, and response parsing. This allows other services to easily initiate
two main types of Pinpoint jobs:
1. **Bisect Jobs:** These jobs are used to identify the specific commit that
caused a performance regression or improvement between two given git
revisions. The client constructs the appropriate URL and parameters for the
`pinpointURL` endpoint.
2. **Try Jobs (A/B Testing):** These jobs compare the performance of a base
commit (or patch) against an experimental commit (or patch). This is
particularly useful for evaluating the performance impact of a pending code
change. The client uses the `pinpointLegacyURL` for these types of jobs.
**Design Decisions and Implementation Choices:**
- **Separate Endpoints for Bisect and Try Jobs:** The Pinpoint service has
distinct API endpoints for creating bisect jobs (`pinpointURL`) and legacy
try jobs (`pinpointLegacyURL`). The client reflects this by having separate
methods (`CreateBisect` and `CreateTryJob`) and corresponding request URL
builder functions (`buildBisectRequestURL` and `buildTryJobRequestURL`).
This design choice directly maps to the underlying Pinpoint API structure,
making it clear which type of job is being created.
- **URL Parameter Encoding:** Both types of Pinpoint jobs are initiated via
HTTP GET requests where all parameters are encoded in the URL query string.
The `buildBisectRequestURL` and `buildTryJobRequestURL` functions are
responsible for constructing these URLs by populating `url.Values` and then
encoding them. This is a direct consequence of how the Pinpoint API is
designed.
- **Authentication:** The client utilizes Google's default token source for
authentication (`google.DefaultTokenSource`) with the
`auth.ScopeUserinfoEmail` scope. This is a standard approach for
service-to-service authentication within the Google Cloud ecosystem,
ensuring secure communication with the Pinpoint API.
- **Metrics Collection:** The client integrates with `go/metrics2` to track
the number of times bisect and try jobs are called and the number of times
these calls fail. This is crucial for monitoring the reliability and usage
of the Pinpoint integration.
- **Error Handling:** The module uses `go/skerr` for wrapping errors. This
provides more context to errors, making debugging easier. For example, if a
Pinpoint request fails, the HTTP status code and response body are included
in the error message.
- **Dependency on `pinpoint/go/bot_configs`:** For try jobs, the `target`
parameter is required by the Pinpoint API. This `target` is derived from the
`Configuration` (bot) and `Benchmark` using the
`bot_configs.GetIsolateTarget` function. This indicates a specific
configuration setup for running the performance tests.
- **`test_path` Parameter for Bisect Jobs:** The Pinpoint API requires a
`test_path` parameter for bisect jobs. This parameter is constructed by
joining several components like "ChromiumPerf", configuration, benchmark,
chart, and story. This specific formatting is a legacy requirement of the
Chromeperf API.
- **Mandatory `bug_id` for Bisect Jobs:** The Pinpoint API mandates the
`bug_id` parameter for bisect jobs. If not provided by the caller, the
client defaults it to `"null"`. This reflects a specific constraint of the
upstream service.
- **`tags` Parameter:** Both job types include a `tags` parameter set to
`{"origin":"skia_perf"}`. This helps in tracking and filtering jobs
originating from the Skia infrastructure within the Pinpoint system.
**Key Components/Files:**
- **`pinpoint.go`:** This is the sole Go file in the module and contains all
the logic.
- **`Client` struct:** Represents the Pinpoint client. It holds the
authenticated `http.Client` and counters for metrics.
- **`New()` function:** The constructor for the `Client`. It initializes
the HTTP client with appropriate authentication.
- **`CreateLegacyTryRequest` and `CreateBisectRequest` structs:** Define
the structure of the data required to create try jobs and bisect jobs,
respectively. These fields directly map to the parameters expected by
the Pinpoint API.
- **`CreatePinpointResponse` struct:** Defines the structure of the JSON
response from Pinpoint, which includes the `JobID` and `JobURL`.
- **`CreateTryJob()` method:**
- Takes a `CreateLegacyTryRequest` and a `context.Context`.
- Calls `buildTryJobRequestURL` to construct the request URL.
- Makes an HTTP POST request (though parameters are in URL, Pinpoint
endpoint expects POST for creation) to the `pinpointLegacyURL`.
- Parses the JSON response into a `CreatePinpointResponse`.
- Handles errors and increments metrics.
- **`CreateBisect()` method:**
- Similar to `CreateTryJob()`, but takes a `CreateBisectRequest`.
- Calls `buildBisectRequestURL`.
- Makes an HTTP POST request to the `pinpointURL`.
- Parses the response and handles errors/metrics.
- **`buildTryJobRequestURL()` function:**
- Takes a `CreateLegacyTryRequest`.
- Validates required fields like `Benchmark` and `Configuration`.
- Retrieves the `target` using `bot_configs.GetIsolateTarget`.
- Populates `url.Values` with all relevant parameters from the request,
including hardcoded values like `comparison_mode` and `tags`.
- Returns the fully formed URL string.
- **`buildBisectRequestURL()` function:**
- Takes a `CreateBisectRequest`.
- Populates `url.Values` with parameters from the request.
- Sets a default value for `bug_id` if not provided.
- Constructs the `test_path` parameter based on available request fields.
- Includes the `tags` parameter.
- Returns the fully formed URL string.
**Key Workflows:**
1. **Creating a Bisect Job:**
```
Application Code go/pinpoint.Client Pinpoint API
---------------- ------------------ ------------
1. CreateBisectRequest data ---->
2. Calls client.CreateBisect() -->
3. buildBisectRequestURL()
(constructs URL with params)
4. HTTP POST to pinpointURL -------->
5. Processes request
6. Returns JSON response
<----------------------------------- 7. Receives HTTP response
8. Parses JSON into
CreatePinpointResponse
<--------------------------------- 9. Returns CreatePinpointResponse
```
2. **Creating a Try Job (A/B Test):**
```
Application Code go/pinpoint.Client Pinpoint API (Legacy)
---------------- ------------------ ---------------------
1. CreateLegacyTryRequest data ->
2. Calls client.CreateTryJob() -->
3. buildTryJobRequestURL()
(gets 'target' from bot_configs,
constructs URL with params)
4. HTTP POST to pinpointLegacyURL ----->
5. Processes request
6. Returns JSON response
<---------------------------------------- 7. Receives HTTP response
8. Parses JSON into
CreatePinpointResponse
<--------------------------------- 9. Returns CreatePinpointResponse
```
# Module: /go/pivot
## Pivot Module Documentation
### High-Level Overview
The `pivot` module provides functionality analogous to pivot tables in
spreadsheets or `GROUP BY` operations in SQL. Its primary purpose is to
aggregate and summarize trace data within a `DataFrame` based on specified
grouping criteria and operations. This allows users to transform raw trace data
into more insightful, summarized views, facilitating comparisons and analysis
across different dimensions of the data. For example, one might want to compare
the performance of 'arm' architecture machines against 'intel' architecture
machines by summing or averaging their respective performance metrics.
### Design and Implementation
The core of the `pivot` module revolves around the `Request` struct and the
`Pivot` function.
**`Request` Struct:**
The `Request` struct encapsulates the parameters for a pivot operation. It
defines:
- **`GroupBy`**: A slice of strings representing the parameter keys to group
the traces by. This is the fundamental dimension along which the data will
be aggregated. For instance, if `GroupBy` is `["arch"]`, all traces with the
same 'arch' value will be grouped together.
- **`Operation`**: An `Operation` type (e.g., `Sum`, `Avg`, `Geo`) that
specifies how the values within each group of traces should be combined.
This operation is applied to each point in the traces within a group,
resulting in a new, summarized trace for that group.
- **`Summary`**: An optional slice of `Operation` types. If provided, these
operations are applied to the _resulting_ traces from the `GroupBy` step.
Each `Summary` operation generates a single value (a column in the final
output if viewed as a table) for each grouped trace. If `Summary` is empty,
the output is a `DataFrame` where each row is a summarized trace (suitable
for plotting).
**`Pivot` Function Workflow:**
The `Pivot` function executes the aggregation and summarization process. Here's
a breakdown of its key steps and the reasoning behind them:
1. **Input Validation (`req.Valid()`):**
- **Why:** To ensure the request is well-formed before proceeding with
potentially expensive computations. This prevents errors due to missing
`GroupBy` keys or invalid `Operation` or `Summary` values.
- **How:** It checks if `GroupBy` is non-empty and if the specified
`Operation` and `Summary` operations are among the predefined valid
operations (`AllOperations`).
2. **Initialization and Grouping Structure (`groupedTraceSets`):**
- **Why:** To efficiently organize traces into their respective groups. A
map is used where keys are the group identifiers (e.g., ",arch=arm,")
and values are `types.TraceSet` containing traces belonging to that
group.
- **How:**
- It pre-populates `groupedTraceSets` by determining all possible
unique combinations of values for the `GroupBy` keys present in the
input `DataFrame`'s `ParamSet`. This is done using
`df.ParamSet.CartesianProduct(req.GroupBy)`. This pre-population
ensures that even groups with no matching traces are considered,
although they will be filtered out later if they remain empty.
- It then iterates through each trace in the input `DataFrame`
(`df.TraceSet`).
- For each trace, it extracts the relevant parameter values specified
in `req.GroupBy` to form a `groupKey` using `groupKeyFromTraceKey`.
This function ensures that only traces containing _all_ the
`GroupBy` keys contribute to a group. If a trace is missing a
`GroupBy` key, it's ignored.
- The trace is then added to the `types.TraceSet` associated with its
`groupKey` in `groupedTraceSets`.
```
Input DataFrame (df.TraceSet)
|
v
For each traceID, trace in df.TraceSet:
Parse traceID into params
groupKey = groupKeyFromTraceKey(params, req.GroupBy)
If groupKey is valid:
Add trace to groupedTraceSets[groupKey]
|
v
Grouped Traces (groupedTraceSets)
```
3. **Applying the GroupBy Operation:**
- **Why:** To perform the primary aggregation based on the
`req.Operation`.
- **How:**
- It iterates through the `groupedTraceSets`.
- For each non-empty group, it applies the `groupByOperation` function
corresponding to `req.Operation` (obtained from `opMap`) to the
`types.TraceSet` of that group. The `opMap` is a crucial design
choice, mapping `Operation` constants to their respective
implementation functions (one for grouping traces, another for
summarizing single traces). This provides a clean and extensible way
to manage different aggregation functions.
- The result of this operation is a single summarized trace for that
group, which is stored in the `ret.TraceSet` of the new `DataFrame`.
- Context cancellation (`ctx.Err()`) is checked periodically to allow
for early termination if the operation is cancelled.
```
Grouped Traces (groupedTraceSets)
|
v
For each groupID, traces in groupedTraceSets:
If len(traces) > 0:
summarizedTrace = opMap[req.Operation].groupByOperation(traces)
ret.TraceSet[groupID] = summarizedTrace
|
v
DataFrame with GroupBy Applied (ret)
```
4. **Building ParamSet for the Result:**
- **Why:** The resulting `DataFrame` needs its own `ParamSet` reflecting
the new structure where trace keys only contain the `GroupBy`
parameters.
- **How:** `ret.BuildParamSet()` is called.
5. **Applying Summary Operations (Optional):**
- **Why:** To further reduce the data into single summary values per group
if `req.Summary` is specified. This is useful for generating tabular
summaries rather than plots.
- **How:**
- If `req.Summary` is empty, the original `DataFrame`'s `Header` is
used for the new `DataFrame`, and the function returns. The result
is a `DataFrame` of summarized traces.
- If `req.Summary` is not empty:
- It iterates through each summarized trace in `ret.TraceSet`.
- For each trace, it creates a new `types.Trace` (called
`summaryValues`) whose length is equal to the number of `Summary`
operations.
- For each `Operation` in `req.Summary`, it applies the corresponding
`summaryOperation` function (from `opMap`) to the current grouped
trace. The result is stored in `summaryValues`.
- The original summarized trace in `ret.TraceSet[groupKey]` is
replaced with `summaryValues`.
- The `Header` of the `ret` `DataFrame` is rebuilt. Each column in the
header now corresponds to one of the `Summary` operations, with
offsets from 0 to `len(req.Summary) - 1`.
```
DataFrame with GroupBy Applied (ret)
|
v
If len(req.Summary) > 0:
For each groupKey, trace in ret.TraceSet:
summaryValues = new Trace of length len(req.Summary)
For i, op in enumerate(req.Summary):
summaryValues[i] = opMap[op].summaryOperation(trace)
ret.TraceSet[groupKey] = summaryValues
Adjust ret.Header to match Summary operations
|
v
Final Pivoted DataFrame (ret)
```
**Operations (`Operation` type and `opMap`):**
The module defines a set of standard operations like `Sum`, `Avg`, `Geo`, `Std`,
`Count`, `Min`, `Max`.
- **Why:** To provide common aggregation methods.
- **How:**
- Each `Operation` is a string constant.
- The `opMap` is a map where each `Operation` key maps to an
`operationFunctions` struct. This struct holds two function pointers:
- `groupByOperation`: Takes a `types.TraceSet` (a group of traces) and
returns a single aggregated `types.Trace`. These functions are typically
sourced from the `go/calc` module.
- `summaryOperation`: Takes a single `[]float32` (a trace) and returns a
single `float32` summary value. These functions are typically sourced
from `go/vec32` or defined locally (like `stdDev`).
- This design makes it easy to add new operations by defining the constant
and adding corresponding entries to `opMap` with the appropriate
implementation functions.
**Error Handling:**
- **Why:** To provide clear feedback on invalid inputs or issues during
processing.
- **How:** The `Pivot` function returns an error if `req.Valid()` fails or if
an error occurs during grouping (e.g., a `GroupBy` key is not found in the
`ParamSet` of the input DataFrame). Context cancellation is also handled,
allowing long-running pivot operations to be interrupted. Errors are wrapped
using `skerr.Wrap` to provide context.
### Key Components and Files
- **`pivot.go`**: This is the main file containing all the logic for the pivot
functionality.
- **`Request` struct**: Defines the parameters for a pivot operation. Its
design allows for flexible grouping and summarization.
- **`Operation` type and constants**: Define the set of available
aggregation operations.
- **`opMap` variable**: A critical data structure mapping `Operation`
types to their respective implementation functions for both grouping and
summarizing. This is the heart of how different operations are
dispatched.
- **`Pivot` function**: The primary public function that performs the
pivot operation. Its step-by-step process of grouping, applying the main
operation, and then optionally applying summary operations is central to
its functionality.
- **`groupKeyFromTraceKey` function**: A helper function responsible for
constructing the group identifier for each trace based on the `GroupBy`
keys. It handles cases where a trace might not have all the required
keys.
- **`Valid()` method on `Request`**: Ensures that the pivot request is
well-formed before processing begins.
- **`pivot_test.go`**: Contains unit tests for the `pivot` module.
- **Why:** To ensure the correctness and robustness of the pivot logic
under various scenarios, including valid inputs, invalid inputs, edge
cases (like empty groups or traces not matching any group), and context
cancellation.
- **How:** It uses the `testify` assertion library and defines test cases
that cover different aspects of the `Request` validation,
`groupKeyFromTraceKey` logic, and the `Pivot` function itself with
various combinations of `Operation` and `Summary` settings. The
`dataframeForTesting()` helper function provides a consistent dataset
for testing.
This module is designed to be a general-purpose tool for transforming and
understanding large datasets of traces by allowing users to aggregate data along
arbitrary dimensions and apply various statistical operations.
# Module: /go/progress
## Module: /go/progress
The `/go/progress` module provides a mechanism for tracking the progress of
long-running tasks on the backend and exposing this information to the UI. This
is crucial for user experience in applications where operations like data
queries or complex computations can take a significant amount of time. Without
progress tracking, users might perceive the application as unresponsive or
encounter timeouts.
### Why: The Need for Asynchronous Task Monitoring
Many backend operations, such as those initiated by API endpoints like
`/frame/start` or `/dryrun/start`, are asynchronous. The initial HTTP request
might return quickly, but the actual work continues in the background. This
module addresses the need to:
1. **Provide Feedback:** Inform the user that a task is ongoing and how it's
progressing.
2. **Avoid Timeouts:** Prevent HTTP requests from timing out while waiting for
a long task to complete. The UI can poll for updates instead of holding a
connection open.
3. **Communicate Complex State:** Allow tasks to report detailed, multi-stage
progress information, not just a simple percentage. For example, a "dry run"
might involve several distinct steps, each with its own status and relevant
data.
### How: Design and Implementation
The core idea is to represent the state of a long-running task as a `Progress`
object. This object can be updated by the task as it executes. A `Tracker` then
manages multiple `Progress` objects, making them accessible via HTTP polling.
**Key Components:**
- **`progress.go`**: Defines the `Progress` interface and its concrete
implementation `progress`.
- **`Progress` Interface**: This is the central abstraction for a single
long-running task.
- **Why an interface?** It allows for potential future extensions or
alternative implementations (e.g., different storage mechanisms for
progress data if needed, though the current implementation is
in-memory).
- **Key Methods:**
- `Message(key, value string)`: Allows the task to report arbitrary
key-value string pairs. This is flexible enough to accommodate
diverse progress information (e.g., current step, commit being
processed, number of items filtered). If a key already exists, its
value is updated.
- `Results(interface{})`: Stores intermediate or final results of the
task. The `interface{}` type allows any JSON-serializable data to be
stored. This is useful for showing partial results or accumulating
data incrementally.
- `Error(string)`: Marks the task as failed and stores an error
message.
- `Finished()`: Marks the task as successfully completed.
- `FinishedWithResults(interface{})`: Atomically sets the results and
marks the task as finished. This is preferred over separate
`Results()` and `Finished()` calls to avoid race conditions where
the UI might poll between the two calls.
- `Status() Status`: Returns the current status (`Running`,
`Finished`, `Error`).
- `URL(string)`: Sets the URL that the client should poll for further
updates. This is typically set by the `Tracker`.
- `JSON(w io.Writer) error`: Serializes the current progress state
(status, messages, results, next URL) into JSON and writes it to the
provided writer.
- **`progress` struct (concrete implementation)**:
- Uses a `sync.Mutex` to ensure thread-safe updates to its internal
`SerializedProgress` state. This is critical because long-running tasks
often execute in separate goroutines, and the `Progress` object might be
accessed concurrently by the task updating its state and by the
`Tracker` serving HTTP requests.
- Maintains its state in a `SerializedProgress` struct, which is designed
for easy JSON serialization.
- **State Transitions:** A `Progress` object starts in the `Running`
state. Once it transitions to `Finished` or `Error`, it becomes
immutable. Any attempt to modify it (e.g., calling `Message()` or
`Results()` again) will result in a panic. This design simplifies
reasoning about the lifecycle of a task's progress.
- **`SerializedProgress` struct**: Defines the JSON structure sent to the
client. It includes the `Status`, an array of `Message` (key-value
pairs), the `Results` (if any), and the `URL` for the next poll.
- **`Status` enum**: `Running`, `Finished`, `Error`.
- **`tracker.go`**: Defines the `Tracker` interface and its concrete
implementation `tracker`.
- **`Tracker` Interface**: Manages a collection of `Progress` objects.
- `Add(prog Progress)`: Registers a new `Progress` object with the
tracker. The tracker assigns a unique ID to this progress and sets its
polling URL.
- `Handler(w http.ResponseWriter, r *http.Request)`: An HTTP handler
function that clients use to poll for progress updates. It extracts the
progress ID from the request URL, retrieves the corresponding `Progress`
object, and sends its JSON representation.
- `Start(ctx context.Context)`: Starts a background goroutine for periodic
cleanup of completed tasks from the cache.
- **`tracker` struct (concrete implementation)**:
- **`lru.Cache`**: Uses a Least Recently Used (LRU) cache
(`github.com/hashicorp/golang-lru`) to store `cacheEntry` objects.
- **Why LRU?** To prevent unbounded memory growth if many tasks are
tracked. Older, completed tasks are eventually evicted.
- **`basePath`**: A string prefix for the polling URLs (e.g.,
`/_/status/`). Each progress object gets a unique ID appended to this
base path to form its polling URL.
- **`cacheEntry` struct**: Wraps a `Progress` object and a `Finished`
timestamp. The timestamp is used by the cleanup routine to determine
when a completed task can be removed from the cache.
- **Cleanup Mechanism**:
- The `Start` method launches a goroutine that periodically calls
`singleStep`.
- `singleStep` iterates through the cache:
- It updates the `Finished` timestamp in a `cacheEntry` when the
corresponding `Progress` object transitions out of the `Running`
state.
- It removes entries from the cache if they have been in a `Finished`
or `Error` state for longer than `cacheDuration` (currently 5
minutes). This prevents the cache from holding onto completed tasks
indefinitely.
- This ensures that resources are eventually freed up while still
allowing clients a reasonable window to fetch the final results of a
completed task.
- **UUIDs for IDs**: Uses `github.com/google/uuid` to generate unique IDs
for each tracked `Progress`. This makes the polling URLs distinct and
hard to guess.
### Key Workflows
1. **Starting and Tracking a Long-Running Task:**
```
Backend HTTP Handler (e.g., /api/start_long_task)
|
| 1. Create a new Progress object:
| prog := progress.New()
|
| 2. Add it to the global Tracker instance:
| trackerInstance.Add(prog) // Tracker sets prog.URL() internally
|
| 3. Respond to the initial HTTP request with the Progress JSON.
| // The client now has prog.URL() to poll.
| prog.JSON(w)
|
V
Goroutine (executing the long-running task)
|
| 1. Periodically update progress:
| prog.Message("Step", "Processing item X")
| prog.Message("PercentComplete", "30%")
| prog.Results(partialData) // Optional: intermediate results
|
| 2. When finished:
| If error:
| prog.Error("Something went wrong")
| Else:
| prog.FinishedWithResults(finalData)
```
2. **Client Polling for Updates:**
```
Client (e.g., browser UI)
|
| 1. Receives initial response with prog.URL (e.g., /_/status/some-uuid)
|
| 2. Makes a GET request to prog.URL
V
Backend Tracker.Handler
|
| 1. Extracts "some-uuid" from the request path.
|
| 2. Looks up the Progress object in its cache using "some-uuid".
| If not found --> HTTP 404 Not Found
|
| 3. Calls prog.JSON(w) to send the current state.
V
Client
|
| 1. Receives JSON with current status, messages, results.
|
| 2. If Status is "Running", schedules another poll to prog.URL.
|
| 3. If Status is "Finished" or "Error", displays final results/error and stops polling.
```
3. **Tracker Cache Management (Background Process):**
```
Tracker.Start()
|
V
Goroutine (periodic execution, e.g., every minute)
|
| Calls tracker.singleStep()
| |
| V
| Iterate through cache entries:
| - If Progress.Status() is not "Running" AND cacheEntry.Finished is zero:
| Set cacheEntry.Finished = now()
| - If cacheEntry.Finished is not zero AND now() > cacheEntry.Finished + cacheDuration:
| Remove entry from cache
| - Update metrics (numEntriesInCache)
|
V
(Loop back to periodic execution)
```
This system provides a robust and flexible way to communicate the progress of
backend tasks to the user interface, improving the overall user experience for
operations that might otherwise seem opaque or unresponsive. The use of JSON for
data interchange makes it easy for web frontends to consume the progress
information.
# Module: /go/psrefresh
The `psrefresh` module is designed to manage and provide access to
`paramtools.ParamSet` instances, which are collections of key-value pairs
representing the parameters of traces in a performance monitoring system. The
primary goal is to efficiently retrieve and cache these parameter sets,
especially for frequently accessed queries, to reduce database load and improve
response times.
The module addresses the need for up-to-date parameter sets by periodically
fetching data from a trace store (represented by the `OPSProvider` interface).
It combines parameter sets from recent time intervals (tiles) to provide a
comprehensive view of available parameters.
A key challenge is handling potentially large and complex parameter sets. To
mitigate this, the module offers a caching layer (`CachedParamSetRefresher`).
This caching mechanism is configurable and can pre-populate caches (e.g., local
in-memory or Redis) with filtered parameter sets based on predefined query
levels. This pre-population significantly speeds up queries that match these
common filter patterns.
**Key Components and Responsibilities:**
- **`psrefresh.go`**:
- Defines the core interfaces `OPSProvider` and `ParamSetRefresher`.
- `OPSProvider`: Abstractly represents a source of ordered parameter sets
(e.g., a trace data store). It provides methods to get the latest "tile"
(a time-based segment of data) and the parameter set for a specific
tile. This abstraction allows `psrefresh` to be independent of the
underlying data storage implementation.
- `ParamSetRefresher`: Defines the contract for components that can
provide the full parameter set and parameter sets filtered by a query.
It also includes a `Start` method to initiate the refresh process.
- Implements `defaultParamSetRefresher`, which is the standard
implementation of `ParamSetRefresher`.
- **Why**: This struct is responsible for the fundamental logic of
periodically fetching parameter sets from the `OPSProvider`. It merges
parameter sets from a configurable number of recent tiles to create a
comprehensive view.
- **How**: It uses a background goroutine (`refresh`) that periodically
calls `oneStep`. The `oneStep` method fetches the latest tile, then
iterates backward through the configured number of previous tiles,
retrieving and merging their parameter sets using
`paramtools.ParamSet.AddParamSet`. The resulting merged set is then
normalized and stored.
- A `sync.Mutex` is used to protect concurrent access to the `ps`
(paramtools.ReadOnlyParamSet) field, ensuring thread safety when
`GetAll` is called.
- `GetParamSetForQuery` delegates the actual filtering and counting of
traces to a `dataframe.DataFrameBuilder`, demonstrating a separation of
concerns.
- `UpdateQueryValueWithDefaults` is a helper to automatically add default
parameter selections to queries if configured, simplifying common query
patterns.
- **`cachedpsrefresh.go`**:
- Implements `CachedParamSetRefresher`, which wraps a
`defaultParamSetRefresher` and adds a caching layer.
- **Why**: To improve performance for common queries by avoiding repeated
database lookups or expensive filtering operations. For frequently
accessed subsets of data (e.g., specific benchmarks or configurations),
retrieving pre-computed parameter sets from a cache is much faster.
- **How**: It takes a `cache.Cache` instance (which could be local, Redis,
etc.) and a `defaultParamSetRefresher`.
- `PopulateCache`: This is a crucial method that proactively fills the
cache. It uses the `QueryCacheConfig` (part of `config.QueryConfig`) to
determine which levels of parameter sets to cache.
- It starts by getting the full parameter set from the underlying
`psRefresher`.
- It then iterates through configured "Level 1" parameter keys and
their specified values. For each combination, it performs a
`PreflightQuery` (via the `dfBuilder`) to get the filtered parameter
set and the count of matching traces.
- Both the filtered parameter set (as a string) and the count are
stored in the cache using distinct keys.
- If "Level 2" keys and values are configured, it recursively calls
`populateChildLevel` to cache parameter sets for combinations of
Level 1 and Level 2 parameters.
- The cache keys are generated by `paramSetKey` and `countKey`,
ensuring a consistent naming scheme.
- `GetParamSetForQuery`: When a query is made,
`getParamSetForQueryInternal` first tries to retrieve the result from
the cache.
- It determines the appropriate cache key based on the query
parameters and the configured cache levels (`getParamSetKey`). It
only attempts to serve from the cache if the query matches the
configured cache levels (1 or 2 parameters, potentially adjusted for
default parameters).
- If a cache hit occurs, it reconstructs the `paramtools.ParamSet`
from the cached string and retrieves the count.
- If there's a cache miss or an error, it falls back to the underlying
`psRefresher.GetParamSetForQuery`.
- `StartRefreshRoutine`: This method starts a goroutine that periodically
calls `PopulateCache` to keep the cached data fresh.
**Key Workflows:**
1. **Initialization and Periodic Refresh (Default Refresher):**
```
NewDefaultParamSetRefresher(opsProvider, ...) -> pf
pf.Start(refreshPeriod)
-> pf.oneStep() // Initial fetch
-> opsProvider.GetLatestTile() -> latestTile
-> LOOP (numParamSets times):
-> opsProvider.GetParamSet(tile) -> individualPS
-> mergedPS.AddParamSet(individualPS)
-> tile = tile.Prev()
-> mergedPS.Normalize()
-> pf.ps = mergedPS.Freeze()
-> GO pf.refresh()
-> LOOP (every refreshPeriod):
-> pf.oneStep() // Subsequent fetches
```
2. **Cache Population (Cached Refresher):**
```
NewCachedParamSetRefresher(defaultRefresher, cacheImpl) -> cr
cr.StartRefreshRoutine(cacheRefreshPeriod)
-> cr.PopulateCache() // Initial population
-> defaultRefresher.GetAll() -> fullPS
-> // For each configured Level 1 key/value:
-> qValues = {level1Key: [level1Value]}
-> defaultRefresher.UpdateQueryValueWithDefaults(qValues) // If applicable
-> query.New(qValues) -> lv1Query
-> defaultRefresher.dfBuilder.PreflightQuery(ctx, lv1Query, fullPS) -> count, filteredPS
-> psCacheKey = paramSetKey(qValues, [level1Key])
-> cr.addToCache(ctx, psCacheKey, filteredPS.ToString(), count)
-> // If Level 2 is configured:
-> cr.populateChildLevel(ctx, level1Key, level1Value, filteredPS, level2Key, level2Values)
-> // For each configured Level 2 value:
-> qValues = {level1Key: [level1Value], level2Key: [level2Value]}
-> ... (similar PreflightQuery and addToCache)
-> GO LOOP (every cacheRefreshPeriod):
-> cr.PopulateCache() // Subsequent cache refreshes
```
3. **Querying with Cache:** `cr.GetParamSetForQuery(ctx, queryObj, queryValues)
-> cr.getParamSetForQueryInternal(ctx, queryObj, queryValues) ->
cr.getParamSetKey(queryValues) -> cacheKey, err -> IF cacheKey is valid AND
exists: -> cache.GetValue(ctx, cacheKey) -> cachedParamSetString ->
cache.GetValue(ctx, countKey(cacheKey)) -> cachedCountString ->
paramtools.FromString(cachedParamSetString) -> paramSet ->
strconv.ParseInt(cachedCountString) -> count -> RETURN count, paramSet, nil
-> ELSE (cache miss or invalid key for caching): ->
defaultRefresher.GetParamSetForQuery(ctx, queryObj, queryValues) -> count,
paramSet, err -> RETURN count, paramSet, err`
The use of `config.QueryConfig` and `config.Experiments` allows for
instance-specific tuning of caching behavior (which keys/values to pre-populate)
and handling of default parameters. The separation between
`defaultParamSetRefresher` and `CachedParamSetRefresher` promotes modularity,
allowing the caching layer to be optional or replaced with different caching
strategies if needed.
# Module: /go/redis
The `redis` module in Skia Perf is designed to manage interactions with Redis
instances, primarily to support and optimize the query UI. It leverages Redis
for caching frequently accessed data, thereby improving the responsiveness and
performance of the Perf frontend.
The core idea is to periodically fetch information about available Redis
instances within a Google Cloud Project and then interact with a specific,
configured Redis instance to store or retrieve cached data. This cached data
typically represents results of expensive computations or frequently requested
data points, like recent trace data for specific queries.
**Key Responsibilities and Components:**
- **`redis.go`**: This is the central file of the module.
- **`RedisWrapper` interface**: Defines the contract for Redis-related
operations. This abstraction allows for easier testing and potential
future replacements of the underlying Redis client implementation. The
key methods are:
- `StartRefreshRoutine`: Initiates a background process (goroutine) that
periodically discovers and interacts with the configured Redis instance.
- `ListRedisInstances`: Retrieves a list of all Redis instances available
within a specified GCP project and location.
- **`RedisClient` struct**: This is the concrete implementation of the
`RedisWrapper` interface.
- It holds a `gcp_redis.CloudRedisClient` for interacting with the Google
Cloud Redis API (e.g., listing instances).
- It also has a reference to `tracestore.TraceStore`, which is likely used
to fetch the data that needs to be cached in Redis.
- The `tilesToCache` field suggests that the caching strategy might
involve pre-calculating and storing "tiles" of data, which is a common
pattern in Perf systems for displaying graphs over time.
- **`NewRedisClient`**: The constructor for `RedisClient`.
- **`StartRefreshRoutine`**: - **Why**: To ensure that Perf is always aware of the correct Redis
instance to use and to periodically update the cache. Network
configurations or instance details might change, and this routine
helps adapt to such changes. - **How**: It takes a `refreshPeriod` and a `config.InstanceConfig`
(which is actually `redis_client.RedisConfig` in the current
implementation, indicating the target project, zone, and instance
name). It then starts a goroutine that, at regular intervals defined
by `refreshPeriod`:
_ Calls `ListRedisInstances` to get all Redis instances in the
configured project/zone.
_ Iterates through the instances to find the one matching the
`config.Instance` name. \* If the target instance is found, it calls `RefreshCachedQueries`.
`[StartRefreshRoutine] | V (Goroutine - Ticks every 'refreshPeriod')
| V [ListRedisInstances] -> (GCP API Call) -> [List of Redis
Instances] | V (Find Target Instance by Name) | V (If Target Found)
[RefreshCachedQueries]`
- **`ListRedisInstances`**:
- **Why**: To discover available Redis instances within the specified
GCP project and location. This is the first step before Perf can
connect to and use a specific Redis instance.
- **How**: It uses the `gcpClient` (an instance of
`cloud.google.com/go/redis/apiv1.CloudRedisClient`) to make an API
call to GCP to list instances under the given `parent` (e.g.,
"projects/my-project/locations/us-central1"). It iterates through
the results and returns a slice of `redispb.Instance` objects.
- **`RefreshCachedQueries`**:
- **Why**: This is the heart of the caching mechanism. Its purpose is
to update the data stored in the target Redis instance. The specific
data to be cached would depend on the needs of the Perf query UI.
- **How**:
* It establishes a connection to the specified Redis instance
(`instance.Host` and `instance.Port`) using
`github.com/redis/go-redis/v9`.
* It acquires a mutex (`r.mutex.Lock()`) to prevent concurrent
modifications to the cache or shared resources, though the current
implementation only has placeholder logic.
* The current implementation contains placeholder logic:
- It attempts to `GET` a key named "FullPS".
- It then `SET`s the key "FullPS" to the current time, with an
expiration of 30 seconds.
* **Future Work (as hinted by `TODO(wenbinzhang)` and
`tilesToCache`)**: This method is expected to be expanded to:
- Identify which queries or data segments are good candidates for
caching.
- Fetch the necessary data, potentially using the `traceStore`.
- Store this data in Redis, likely with appropriate keys and
expiration times. The `tilesToCache` parameter suggests it might
pre-cache a certain number of recent "tiles" of trace data.
- **`mocks/RedisWrapper.go`**: This file contains a mock implementation of the
`RedisWrapper` interface, generated by the `mockery` tool.
- **Why**: To facilitate unit testing of components that depend on
`RedisWrapper`. By using a mock, tests can simulate various Redis
behaviors (e.g., successful connection, instance not found, errors)
without needing an actual Redis instance or GCP connectivity.
- **How**: It provides a `RedisWrapper` struct that embeds `mock.Mock`
from the `testify` library. For each method in the `RedisWrapper`
interface, there's a corresponding method in the mock that records calls
and can be configured to return specific values or errors, allowing test
authors to define expected interactions.
**Design Decisions and Rationale:**
- **Interface-based Design (`RedisWrapper`)**: Using an interface decouples
the rest of the Perf system from the concrete Redis client implementation.
This is good for:
- **Testability**: As seen with the `mocks` package.
- **Flexibility**: If Skia decides to switch to a different Redis client
library or even a different caching technology in the future, the
changes would be localized to the implementation of `RedisWrapper`
without affecting its consumers.
- **Periodic Refresh Routine**: Instead of connecting to Redis on-demand for
every operation or assuming a static configuration, the
`StartRefreshRoutine` provides a more robust approach.
- It handles potential changes in the Redis instance's availability or
address.
- It centralizes the logic for keeping the cache up-to-date.
- **Separation of Concerns**:
- The module clearly separates GCP Redis instance management (listing
instances via GCP API) from data interaction with a specific Redis
instance (using a Redis client library like `go-redis`).
- **Use of Standard Libraries**:
- `cloud.google.com/go/redis/apiv1` for GCP infrastructure management.
- `github.com/redis/go-redis/v9` for standard Redis data operations. This
ensures reliance on well-maintained and feature-rich libraries.
**Workflow: Cache Refresh Process**
The primary workflow driven by this module is the periodic refresh of cached
data:
```
System Starts
|
V
Initialize RedisClient (NewRedisClient)
|
V
Call StartRefreshRoutine
|
V
[Background Goroutine - Loop every 'refreshPeriod']
|
|--> 1. List GCP Redis Instances (ListRedisInstances)
| - Input: GCP project, location
| - Output: List of *redispb.Instance
|
|--> 2. Identify Target Redis Instance
| - Based on configuration (e.g., instance name)
|
|--> 3. If Target Instance Found: Refresh Cache (RefreshCachedQueries)
|
|--> a. Connect to Target Redis (using go-redis)
| - Host, Port from *redispb.Instance
|
|--> b. Determine data to cache (e.g., recent trace data for popular queries)
| - Likely involves `traceStore`
|
|--> c. Write data to Redis (SET commands)
| - Use appropriate keys and expiration times
|
|--> (Current placeholder: SET "FullPS" = current_time with 30s TTL)
```
This module provides the foundational components for integrating Redis as a
caching layer in Skia Perf, aiming to improve UI performance by serving
frequently requested data quickly from an in-memory store. The current
implementation focuses on instance discovery and has placeholder logic for the
actual caching, which is expected to be expanded based on Perf's specific
caching needs.
# Module: /go/regression
The `/go/regression` module is responsible for detecting, storing, and managing
performance regressions in Skia. It analyzes performance data over time,
identifies significant changes (regressions or improvements), and provides
mechanisms for triaging and tracking these changes.
**Core Functionality & Design:**
The primary goal is to automatically flag performance changes that might
indicate a problem or an unexpected improvement. This involves:
1. **Data Analysis:** Analyzing time-series performance data (traces) across
different commits.
2. **Clustering:** Grouping similar traces together to identify patterns and
changes affecting multiple tests or configurations.
3. **Step Detection:** Identifying abrupt changes (steps) in performance
metrics that signify a potential regression or improvement.
4. **Alerting & Notification:** Informing relevant parties when a potential
regression is detected.
5. **Persistence:** Storing detected regressions and their triage status.
**Key Components & Files:**
- **`detector.go`**: This file contains the core logic for processing
regression detection requests.
- **Why:** It orchestrates the process of fetching data, applying
clustering algorithms, and identifying regressions. It's designed to
handle potentially large datasets and long-running analyses.
- **How:** `ProcessRegressions` is the main entry point. It takes a
`RegressionDetectionRequest` (which specifies the alert configuration
and the time domain to analyze) and a `DetectorResponseProcessor`
callback.
- It can expand a single alert configuration with `GroupBy` parameters
into multiple, more specific requests using
`allRequestsFromBaseRequest`. This allows for targeted analysis of
specific trace groups.
- It iterates over data using `dfiter.DataFrameIterator`, which provides
dataframes for analysis.
- For each dataframe, it filters out traces with too much missing data
(`tooMuchMissingData`) to ensure the reliability of the detection.
- It applies a clustering algorithm (either K-Means via
`clustering2.CalculateClusterSummaries` or individual StepFit via
`StepFit`) to identify clusters of traces exhibiting similar behavior.
The choice of K (number of clusters for K-Means) can be automatic or
user-specified.
- It generates `RegressionDetectionResponse` objects containing the
cluster summaries and the relevant data frame. These responses are
passed to the `DetectorResponseProcessor`.
- Shortcuts for identified clusters are created using `shortcutFromKeys`
for easier referencing.
- **Workflow:** `RegressionDetectionRequest -> Expand (if GroupBy) ->
Multiple Requests | V For each Request: DataFrameIterator -> DataFrame
-> Filter Traces -> Apply Clustering (KMeans or StepFit) | | V V
Shortcut Creation <- ClusterSummaries -> DetectorResponseProcessor`
- **`regression.go`**: Defines the primary data structures for representing
regressions and their triage status.
- **Why:** Provides a standardized way to model regressions, including
details about the low (performance degradation) and high (performance
improvement) steps, the associated data frame, and triage information.
- **How:**
- `Regression`: The central struct holding `Low` and `High`
`ClusterSummary` objects (from `clustering2`), the `FrameResponse` (data
context), and `TriageStatus` for both low and high. It also includes
fields for the newer `regression2` schema (like `Id`, `CommitNumber`,
`AlertId`, `MedianBefore`, `MedianAfter`).
- `TriageStatus`: Represents whether a regression is `Untriaged`,
`Positive` (expected/acceptable), or `Negative` (a bug).
- `AllRegressionsForCommit`: A container for all regressions found for a
specific commit, keyed by the alert ID.
- `Merge`: A method to combine information from two `Regression` objects,
typically used when new data provides a more significant regression for
an existing alert.
- **`types.go`**: Defines the `Store` interface, which abstracts the
persistence layer for regressions.
- **Why:** Decouples the regression detection logic from the specific
database implementation, allowing for different storage backends.
- **How:** The `Store` interface specifies methods for:
- `Range`: Retrieving regressions within a commit range.
- `SetHigh`/`SetLow`: Storing newly detected high/low regressions.
- `TriageHigh`/`TriageLow`: Updating the triage status of regressions.
- `Write`: Bulk writing of regressions.
- `GetRegressionsBySubName`, `GetByIDs`: Retrieving regressions based on
subscription names or specific IDs (primarily for the `regression2`
schema).
- `GetOldestCommit`, `GetRegression`: Utility methods for fetching
specific data.
- `DeleteByCommit`: Removing regressions associated with a commit.
- **`stepfit.go`**: Implements an alternative regression detection strategy
that analyzes each trace individually using step fitting.
- **Why:** Useful when `GroupBy` is used in an alert, or when K-Means
clustering is not the desired approach. It focuses on finding
significant steps in individual time series.
- **How:** The `StepFit` function iterates through each trace in the input
`DataFrame`.
- For each trace, it calls `stepfit.GetStepFitAtMid` to determine if
there's a significant step (low or high) around the midpoint of the
trace.
- If an "interesting" step is found (based on `stddevThreshold` and
`interesting` parameters), the trace is added to either the `low` or
`high` `ClusterSummary`.
- The `low` and `high` summaries collect all traces that show a downward
or upward step, respectively.
- Parametric summaries (`ParamSummaries`) are generated for the keys
within these clusters.
- **`fromsummary.go`**: Provides a utility function to convert a
`RegressionDetectionResponse` into a `Regression` object.
- **Why:** Bridges the output of the detection process with the structured
`Regression` type used for storage and display.
- **How:** `RegressionFromClusterResponse` takes a
`RegressionDetectionResponse`, an `alerts.Alert` configuration, and a
`perfgit.Git` instance.
- It identifies the commit at the midpoint of the response's data frame.
- It iterates through the `ClusterSummary` objects in the response.
- If a cluster's step point matches the midpoint commit and meets the
alert's criteria (minimum number of traces, direction), it populates the
`Low` or `High` fields of the `Regression` object. It prioritizes the
regression with the largest absolute magnitude if multiple are found.
**Submodules:**
- **`continuous/` (`continuous.go`)**: Manages the continuous, background
detection of regressions.
- **Why:** Ensures that new performance data is promptly analyzed for
regressions as it arrives, without requiring manual intervention.
- **How:**
- `Continuous` struct: Holds dependencies like `perfgit.Git`,
`regression.Store`, `alerts.ConfigProvider`, `notify.Notifier`, etc.
- `Run()`: The main entry point, which starts either event-driven or
polling-based regression detection.
- **Event-Driven (`RunEventDrivenClustering`)**:
- Listens to Pub/Sub messages from `FileIngestionTopicName` indicating
new data ingestion (`ingestevents.IngestEvent`).
- For each event, it identifies matching alert configurations using
`getTraceIdConfigsForIngestEvent` (which calls
`matchingConfigsFromTraceIDs`).
- `matchingConfigsFromTraceIDs` refines alert queries if `GroupBy` is
present to be more specific to the incoming trace.
- It then calls `ProcessAlertConfig` (or `ProcessAlertConfigForTraces`
if `StepFitGrouping` is used) for each matching config and the
specific traces.
- **Polling (`RunContinuousClustering`)**:
- Periodically (defined by `pollingDelay`), fetches all alert
configurations using `buildConfigAndParamsetChannel`.
- Shuffles the configs to distribute the load if multiple detectors
are running.
- Calls `ProcessAlertConfig` for each configuration.
- `ProcessAlertConfig()`:
- Sets the current config being processed.
- Optionally performs a "smoketest" for `GroupBy` alerts to ensure the
query is valid and returns data.
- Calls `regression.ProcessRegressions` to perform the actual
detection.
- The `clusterResponseProcessor` (which is `reportRegressions`) is
called with the detection results.
- `reportRegressions()`:
- For each detected regression (`RegressionDetectionResponse`), it
determines the commit and previous commit details.
- It checks if the regression meets the alert criteria (direction,
minimum number).
- It calls `updateStoreAndNotification` to persist the regression and
send notifications.
- `updateStoreAndNotification()`:
- Checks if the regression already exists in the `regression.Store`.
- If new, it stores the regression (using `store.SetLow` or
`store.SetHigh`) and sends a notification via
`notifier.RegressionFound`. The notification ID is stored with the
regression.
- If existing, but the direction of the regression has changed (e.g.,
was low, now also high), it updates the store and the notification
using `notifier.UpdateNotification`.
- If existing and the direction is the same, it only updates the
store.
- **Key Decision:** The system supports both event-driven (preferred for
responsiveness) and polling-based detection (as a fallback or for
periodic full checks). The choice is controlled by the
`EventDrivenRegressionDetection` flag.
- **Workflow (Event-Driven):** `Pub/Sub Message (New Data) -> Decode
IngestEvent -> Get Matching Alert Configs | V For each (Config, Matched
Traces): ProcessAlertConfig -> regression.ProcessRegressions | V
reportRegressions -> updateStoreAndNotification | | V V Store Notifier`
- **`migration/` (`migrator.go`)**: Handles the data migration from an older
`regressions` table schema to the newer `regressions2` schema.
- **Why:** Facilitates schema evolution without data loss. The newer
schema (`Regression2Schema`) aims to store regression data more
granularly, typically one row per detected step (high or low), rather
than combining high and low for the same commit/alert into a single JSON
blob.
- **How:**
- `RegressionMigrator`: Contains instances of the legacy
`sqlregressionstore.SQLRegressionStore` and the new
`sqlregression2store.SQLRegression2Store`.
- `RunPeriodicMigration`: Sets up a ticker to periodically run
`RunOneMigration`.
- `RunOneMigration` / `migrateRegressions`:
- Fetches a batch of unmigrated regressions from the legacy store
(`legacyStore.GetRegressionsToMigrate`).
- For each legacy `Regression` object:
- It begins a database transaction.
- It populates fields specific to the `regression2` schema (e.g.,
`Id`, `PrevCommitNumber`, `MedianBefore`, `MedianAfter`,
`IsImprovement`, `ClusterType`) if they are not already present from
the legacy data. This is crucial as the
`sqlregression2store.WriteRegression` expects these.
- The `sqlregression2store.WriteRegression` function might split a
single legacy `Regression` object (if it has both `High` and `Low`
components) into two separate entries in the `Regressions2` table,
one for `HighClusterType` and one for `LowClusterType`.
- It then marks the corresponding row in the legacy `Regressions`
table as migrated using `legacyStore.MarkMigrated`, storing the new
regression ID.
- Commits the transaction. If any step fails, the transaction is
rolled back.
- **Key Decision:** Migration is performed in batches and within
transactions to ensure atomicity and prevent data duplication or loss
during the migration process.
- **`sqlregressionstore/`**: Implements the `regression.Store` interface using
a generic SQL database. This is the _older_ SQL storage mechanism.
- **Why:** Provides a persistent storage solution for regressions
identified by the detection system.
- **How:**
- `SQLRegressionStore`: The main struct, holding a database connection
pool (`pool.Pool`) and prepared SQL statements. It supports different
SQL dialects (e.g., CockroachDB via `statements`, Spanner via
`spannerStatements`).
- The schema (`sqlregressionstore/schema/RegressionSchema.go`) typically
stores one row per `(commit_number, alert_id)` pair. The actual
`regression.Regression` object (which might contain both high and low
details, along with the frame) is serialized into a JSON string and
stored in a `regression TEXT` column.
- `readModifyWrite`: A core helper function that encapsulates the common
pattern of reading a `Regression` from the DB, allowing a callback to
modify it, and then writing it back. This is done within a transaction
to prevent lost updates. If `mustExist` is true, it errors if the
regression isn't found; otherwise, it creates a new one.
- `SetHigh`/`SetLow`: Use `readModifyWrite` to update the `High` or `Low`
part of the JSON-serialized `Regression` object. They also update the
triage status to `Untriaged` if it was previously `None`.
- `TriageHigh`/`TriageLow`: Use `readModifyWrite` to update the
`HighStatus` or `LowStatus` within the JSON-serialized `Regression`.
- `GetRegressionsToMigrate`: Fetches regressions that haven't been
migrated to the `regression2` schema.
- `MarkMigrated`: Updates a row to indicate it has been migrated, storing
the new `regression_id` from the `regression2` table.
- **Limitation:** Storing the entire `Regression` object as JSON can make
querying for specific aspects of the regression (e.g., only high
regressions, or regressions with a specific triage status) less
efficient and more complex. This is one of the motivations for the
`sqlregression2store`.
- **`sqlregression2store/`**: Implements the `regression.Store` interface
using a newer SQL schema (`Regressions2`).
- **Why:** Addresses limitations of the older `sqlregressionstore` by
storing regression data in a more normalized and queryable way.
- **How:**
- `SQLRegression2Store`: The main struct.
- Schema (`sqlregression2store/schema/Regression2Schema.go`): Designed to
store each regression step (high or low) as a separate row. Key columns
include `id` (UUID, primary key), `commit_number`, `prev_commit_number`,
`alert_id`, `creation_time`, `median_before`, `median_after`,
`is_improvement`, `cluster_type` (e.g., "high", "low"),
`cluster_summary` (JSONB), `frame` (JSONB), `triage_status`, and
`triage_message`.
- `writeSingleRegression`: The core writing function. It takes a
`regression.Regression` object and writes its relevant parts (either
high or low, but not both in the same DB row) to the `Regressions2`
table.
- `convertRowToRegression`: Converts a database row from `Regressions2`
back into a `regression.Regression` object. Depending on the
`cluster_type` in the row, it populates either the `High` or `Low` part
of the `Regression` object.
- `SetHigh`/`SetLow`:
- These methods now interact with `updateBasedOnAlertAlgo`.
- `updateBasedOnAlertAlgo`: This function is crucial. It considers the
`Algo` type of the alert (`KMeansGrouping` vs. `StepFitGrouping`).
- For `KMeansGrouping`, it expects to potentially update an existing
regression for the same `(commit_number, alert_id)` as new data
might refine the cluster. It uses `readModifyWriteCompat` to achieve
this.
- For `StepFitGrouping` (individual trace analysis), it generally
expects to create a _new_ regression entry if one doesn't exist for
the exact frame, avoiding updates to pre-existing ones unless it's
truly a new detection.
- The `updateFunc` passed to `updateBasedOnAlertAlgo` populates the
necessary fields in the `regression.Regression` object (e.g.,
setting `r.High` or `r.Low`, and calling
`populateRegression2Fields`).
- `populateRegression2Fields`: This helper populates the fields specific
to the `Regressions2` schema (like `PrevCommitNumber`, `MedianBefore`,
`MedianAfter`, `IsImprovement`) from the `ClusterSummary` and
`FrameResponse` within the `Regression` object.
- `WriteRegression` (used by migrator): If a legacy `Regression` object
has both `High` and `Low` components, this function splits it and calls
`writeSingleRegression` twice, creating two rows in `Regressions2`.
- `Range`: When retrieving regressions, if multiple rows from
`Regressions2` correspond to the same `(commit_number, alert_id)` (e.g.,
one for high, one for low), it merges them back into a single
`regression.Regression` object for compatibility with how the rest of
the system might expect the data.
- **Key Improvement:** Storing regression components (high/low) as
separate rows with dedicated columns for medians, triage status, etc.,
allows for much more efficient and direct SQL querying compared to
parsing JSON in the older store.
**Overall Workflow Example (Simplified):**
1. **Continuous Detection (`continuous.go`):**
- New data arrives (e.g., via Pub/Sub).
- `Continuous` identifies relevant `alerts.Alert` configurations.
- `ProcessAlertConfig` is called.
2. **Regression Processing (`detector.go`):**
- `ProcessRegressions` fetches data, builds `DataFrame`s.
- Clustering (KMeans or `stepfit.go`) is applied.
- `RegressionDetectionResponse`s are generated.
3. **Reporting & Storing (`continuous.go` calls back into `regression`
store):**
- `reportRegressions` processes these responses.
- `updateStoreAndNotification` interacts with a `regression.Store`
implementation (e.g., `sqlregression2store.go`):
- Checks if the regression is new or an update.
- Calls `SetLow` or `SetHigh` on the store.
- The store (`sqlregression2store`) writes the data to the
`Regressions2` table, potentially creating a new row or updating an
existing one based on the alert's algorithm type.
- A notification might be sent.
The system is designed to be modular, with interfaces like `regression.Store`
and `alerts.ConfigProvider` allowing for flexibility in implementation details.
The migration path from `sqlregressionstore` to `sqlregression2store` highlights
the evolution towards a more structured and queryable data model for
regressions.
# Module: /go/samplestats
The `samplestats` module is designed to perform statistical analysis on sets of
performance data, specifically to identify significant changes between two
sample sets, often referred to as "before" and "after" states. This is crucial
for detecting regressions or improvements in performance metrics over time or
across different code versions.
The core functionality revolves around comparing these two sets of samples for
each trace (a unique combination of parameters identifying a specific test or
metric). It calculates various statistical metrics for each set and then employs
statistical tests to determine if the observed differences are statistically
significant.
**Key Design Choices and Implementation Details:**
- **Statistical Significance:** The module uses p-values to determine
significance. A user-configurable alpha level (defaulting to 0.05) acts as
the threshold. If the calculated p-value for a trace is below this alpha,
the change is considered significant.
- **Choice of Statistical Tests:** The module offers two common statistical
tests:
- **Mann-Whitney U Test (default):** This is a non-parametric test used to
compare two independent samples. It's often preferred when the data
doesn't necessarily follow a normal distribution.
- **Two Sample Welch's t-test:** This parametric test is used to compare
the means of two independent samples, particularly when their variances
might be unequal. The choice allows users to select the most appropriate
test based on the characteristics of their data.
- **Outlier Removal:** An optional Interquartile Range Rule (IQRR) can be
applied to remove outliers from the sample data before calculating
statistics. This helps in reducing the influence of extreme values that
might skew the results. The decision to make this optional acknowledges that
outlier removal isn't always desired or appropriate.
- **Delta Calculation:** For changes deemed significant, the module calculates
the percentage change in the mean between the "before" and "after" samples.
If a change isn't significant, the delta is reported as NaN (Not a Number),
clearly distinguishing it from actual zero-percentage changes.
- **Configurability:** The `Config` struct provides a centralized way to
control the analysis process. This includes setting the alpha level,
choosing the statistical test, enabling outlier removal, and deciding
whether to include all traces in the output or only those with significant
changes. This configurability makes the module adaptable to various analysis
needs.
- **Result Structure:** The `Result` struct encapsulates the outcome of the
analysis, including a list of `Row` structs (one per trace) and a count of
skipped traces. Each `Row` contains the trace identifier, its parameters,
the calculated metrics for both "before" and "after" samples, the percentage
delta, the p-value, and any informational notes (e.g., errors during
statistical test calculation). This structured output facilitates further
processing or display of the results.
- **Sorting:** The results can be sorted based on different criteria, with the
default being by the calculated `Delta`. This allows users to quickly
identify the most impactful changes. The `Order` type and functions like
`ByName`, `ByDelta`, and `Reverse` provide a flexible sorting mechanism.
**Responsibilities and Key Components:**
- **`analyze.go`**: This is the heart of the module.
- **`Analyze` function**: This is the primary entry point. It takes the
`Config` and two maps of samples (`before` and `after`, where keys are
trace IDs and values are `parser.Samples`).
- It iterates through all unique trace IDs present in either the "before"
or "after" sets.
- For each trace, it retrieves the corresponding samples, skipping the
trace if data isn't present in both sets.
- It calls `calculateMetrics` (from `metrics.go`) for both "before" and
"after" samples.
- Based on the `Config.Test` setting, it performs either the Mann-Whitney
U test or the Two Sample Welch's t-test using functions from the
`github.com/aclements/go-moremath/stats` library.
- It compares the resulting p-value with the configured alpha level. If `p
< alpha`, it calculates the percentage `Delta` between the means.
Otherwise, `Delta` is `NaN`.
- It constructs a `Row` struct with all the calculated information.
- It optionally filters out rows where no significant change was detected
if `Config.All` is false.
- Finally, it sorts the resulting `Row`s based on `Config.Order` (or by
`Delta` if no order is specified) using the `Sort` function from
`sort.go`.
- It returns a `Result` struct containing the list of `Row`s and the count
of skipped traces.
- **`Config` struct**: Defines the parameters that control the analysis,
such as `Alpha` for p-value cutoff, `Order` for sorting, `IQRR` for
outlier removal, `All` for including all results, and `Test` for
selecting the statistical test.
- **`Result` struct**: Encapsulates the output of the `Analyze` function,
holding the `Rows` of analysis data and the `Skipped` count.
- **`Row` struct**: Represents the analysis results for a single trace,
including its name, parameters, "before" and "after" `Metrics`, the
percentage `Delta`, the `P` value, and any `Note`.
- **`metrics.go`**: This file is responsible for calculating basic statistical
metrics from a given set of sample values.
- **`calculateMetrics` function**: Takes a `Config` (primarily to check
`IQRR`) and `parser.Samples`.
- If `Config.IQRR` is true, it applies the Interquartile Range Rule to
filter out outliers from `samples.Values`. The values within 1.5 \* IQR
from the first and third quartiles are retained.
- It then calculates the `Mean`, `StdDev` (standard deviation), and
`Percent` (coefficient of variation: `StdDev / Mean * 100`) of the
(potentially filtered) values.
- It returns these calculated statistics in a `Metrics` struct, along with
the (potentially filtered) `Values`.
- **`Metrics` struct**: Holds the calculated `Mean`, `StdDev`, raw
`Values` (after potential outlier removal), and `Percent` (coefficient
of variation).
- **`sort.go`**: This file provides utilities for sorting the results (`Row`
slices).
- **`Order` type**: A function type `func(rows []Row, i, j int) bool`
defining a less-than comparison for sorting `Row`s.
- **`ByName` function**: An `Order` implementation that sorts rows
alphabetically by `Row.Name`.
- **`ByDelta` function**: An `Order` implementation that sorts rows by
`Row.Delta`. It specifically places `NaN` delta values (insignificant
changes) at the beginning.
- **`Reverse` function**: A higher-order function that takes an `Order`
and returns a new `Order` that represents the reverse of the input
order.
- **`Sort` function**: A convenience function that sorts a slice of `Row`s
in place using `sort.SliceStable` and a given `Order`.
**Illustrative Workflow (Simplified `Analyze` Process):**
```
Input: before_samples, after_samples, config
For each trace_id in (before_samples keys + after_samples keys):
If trace_id not in before_samples OR trace_id not in after_samples:
Increment skipped_count
Continue
before_metrics = calculateMetrics(config, before_samples[trace_id])
after_metrics = calculateMetrics(config, after_samples[trace_id])
If config.Test == UTest:
p_value = MannWhitneyUTest(before_metrics.Values, after_metrics.Values)
Else (config.Test == TTest):
p_value = TwoSampleWelchTTest(before_metrics.Values, after_metrics.Values)
alpha = config.Alpha (or defaultAlpha if config.Alpha is 0)
If p_value < alpha:
delta = ((after_metrics.Mean / before_metrics.Mean) - 1) * 100
Else:
delta = NaN
If NOT config.All:
Continue // Skip if not showing all results and change is not significant
Add new Row{Name: trace_id, Delta: delta, P: p_value, ...} to results_list
Sort results_list using config.Order (or ByDelta by default)
Return Result{Rows: results_list, Skipped: skipped_count}
```
# Module: /go/sheriffconfig
The `sheriffconfig` module is responsible for managing configurations for Skia
Perf's anomaly detection and alerting system. These configurations, known as
"Sheriff Configs," are defined in Protocol Buffer format and are typically
stored in LUCI Config. This module handles fetching these configurations,
validating them, and transforming them into a format suitable for storage and
use by other Perf components, specifically the `alerts` and `subscription`
modules.
The core idea is to allow users to define rules for which performance metrics
they care about and how anomalies in those metrics should be detected and
handled. This provides a flexible and centralized way to manage alerting for a
large number of performance tests.
**Key Responsibilities and Components:**
- **Protocol Buffer Definitions (`/proto/v1`):**
- This directory defines the structure of Sheriff Configurations using
Protocol Buffers. This is the "source of truth" for what constitutes a
valid configuration.
- `sheriff_config.proto`: Defines the main messages like `SheriffConfig`,
`Subscription`, `AnomalyConfig`, and `Rules`.
- `SheriffConfig`: The top-level message, containing a list of
`Subscription`s. This represents the entire set of alerting
configurations for a Perf instance.
- `Subscription`: Represents a user's or team's interest in a specific set
of metrics. It includes details for creating bug reports (e.g., contact
email, bug component, labels, priority, severity) and a list of
`AnomalyConfig`s that define how to detect anomalies for the metrics
covered by this subscription.
- `AnomalyConfig`: Specifies the parameters for anomaly detection for a
particular subset of metrics. This includes:
- `Rules`: Define which metrics this `AnomalyConfig` applies to, using
`match` and `exclude` patterns. These patterns are query strings
(e.g., "master=ChromiumPerf&benchmark=Speedometer2").
- Detection parameters: `step` (algorithm for step detection),
`radius` (commits to consider), `threshold` (sensitivity),
`minimum_num` (number of interesting traces to trigger an alert),
`sparse` (handling of missing data), `k` (for K-Means clustering),
`group_by` (for breaking down clustering), `direction` (up, down, or
both), `action` (no action, triage, or bisect), and `algo`
(clustering algorithm like StepFit or KMeans).
- `Rules`: Contains lists of `match` and `exclude` strings. Match strings
define positive criteria for selecting metrics, while exclude strings
define negative criteria. The combination allows for precise targeting
of metrics.
- `sheriff_config.pb.go`: The Go code generated from
`sheriff_config.proto`. This provides the Go structs and methods to work
with these configurations programmatically.
- `generate.go`: Contains `go:generate` directives used to regenerate
`sheriff_config.pb.go` whenever `sheriff_config.proto` changes. This
ensures the Go code stays in sync with the proto definition.
- **Validation (`/validate`):**
- `validate.go`: This is crucial for ensuring the integrity and
correctness of Sheriff Configurations before they are processed or
stored. It performs a series of checks:
- **Pattern Validation:** Ensures that `match` and `exclude` strings in
`Rules` are well-formed query strings. It checks for valid regex if a
value starts with `~`. It also enforces that exclude patterns only
target a single key-value pair.
- **AnomalyConfig Validation:** Ensures that each `AnomalyConfig` has at
least one `match` pattern.
- **Subscription Validation:** Verifies that essential fields like `name`,
`contact_email`, `bug_component`, and `instance` are present. It also
checks that each subscription has at least one `AnomalyConfig`.
- **SheriffConfig Validation:** Ensures there's at least one
`Subscription` and that all subscription names within a config are
unique.
- `DeserializeProto`: A helper function to convert a base64 encoded string
(as typically retrieved from LUCI Config) into a `SheriffConfig`
protobuf message.
- **Service (`/service`):**
- `service.go`: This component orchestrates the process of fetching
Sheriff Configurations from LUCI Config, processing them, and storing
them in the database.
- **`New` function:** Initializes the `sheriffconfigService`, taking
dependencies like a database connection pool (`sql.Pool`),
`subscription.Store`, `alerts.Store`, and a `luciconfig.ApiClient`. If
no `luciconfig.ApiClient` is provided, it creates one.
- **`ImportSheriffConfig` method:** This is the main entry point for
importing configurations.
* It uses the `luciconfig.ApiClient` to fetch configurations from a
specified LUCI Config path (e.g., "skia-sheriff-configs.cfg").
* For each fetched configuration file content:
- It calls `processConfig`.
* It then inserts all derived `subscription_pb.Subscription` objects into
the `subscriptionStore` and all `alerts.SaveRequest` objects into the
`alertStore` within a single database transaction. This ensures
atomicity – either all changes are saved, or none are.
- **`processConfig` method:**
* Deserializes the raw configuration content (string) into a
`pb.SheriffConfig` protobuf message using `prototext.Unmarshal`.
* Validates the deserialized `pb.SheriffConfig` using
`validate.ValidateConfig`.
* Iterates through each `pb.Subscription` in the config:
- It filters subscriptions based on the `instance` field, only
processing those matching the service's configured instance (e.g.,
"chrome-internal"). This allows multiple Perf instances to share a
config file but only import relevant subscriptions.
- It calls `makeSubscriptionEntity` to convert the `pb.Subscription`
into a `subscription_pb.Subscription` (the format used by the
`subscription` module).
- **Revision Check:** Crucially, it checks if a subscription with the
same name and revision already exists in the `subscriptionStore`. If
it does, it means this specific version of the subscription has
already been imported, so it's skipped. This prevents redundant
database writes and processing if the LUCI config file hasn't
actually changed for that subscription.
- If the subscription is new or has a new revision, it calls
`makeSaveRequests` to generate `alerts.SaveRequest` objects for each
alert defined within that subscription.
- **`makeSubscriptionEntity` function:** Transforms a `pb.Subscription`
(from Sheriff Config proto) into a `subscription_pb.Subscription` (for
the `subscription` datastore), mapping fields and applying default
priorities/severities if not specified.
- **`makeSaveRequests` function:**
* Iterates through each `pb.AnomalyConfig` within a `pb.Subscription`.
* For each `match` rule within the `pb.AnomalyConfig.Rules`:
- Calls `buildQueryFromRules` to construct the actual query string
that will be used to select metrics for this alert.
- Calls `createAlert` to create an `alerts.Alert` object, populating
it with parameters from the `pb.AnomalyConfig` and the parent
`pb.Subscription`.
- Wraps the `alerts.Alert` in an `alerts.SaveRequest` along with the
subscription name and revision.
- **`createAlert` function:** Populates an `alerts.Alert` struct. This
involves:
- Mapping enum values from the Sheriff Config proto (e.g.,
`AnomalyConfig_Step`, `AnomalyConfig_Direction`, `AnomalyConfig_Action`,
`AnomalyConfig_Algo`) to their corresponding internal types used by the
`alerts` module (e.g., `alerts.Direction`,
`types.RegressionDetectionGrouping`, `types.StepDetection`,
`types.AlertAction`). This is done using maps like `directionMap`,
`clusterAlgoMap`, etc.
- Applying default values for parameters like `radius`, `minimum_num`,
`sparse`, `k`, `group_by` if they are not explicitly set in the
`AnomalyConfig`.
- **`buildQueryFromRules` function:** Constructs a canonical query string
from a `match` string and a list of `exclude` strings. It parses them as
URL query parameters, combines them (with `!` for excludes), sorts the
parts alphabetically, and joins them with `&`. This ensures that
equivalent rules always produce the same query string.
- **`getPriorityFromProto` and `getSeverityFromProto` functions:** Convert
the enum values for priority and severity from the proto definition to
the integer values expected by the `subscription` module, applying
defaults if the proto value is "unspecified."
- **`StartImportRoutine` and `ImportSheriffConfigOnce`:** Provide
functionality to periodically fetch and import configurations, making
the system self-updating when LUCI configs change.
**Workflow: Importing a Sheriff Configuration**
```
LUCI Config Change (e.g., new revision of skia-sheriff-configs.cfg)
|
v
Sheriffconfig Service (triggered by timer or manual call)
|
|--- 1. luciconfigApiClient.GetProjectConfigs("skia-sheriff-configs.cfg") --> Fetches raw config content + revision
|
v
For each config file content:
|
|--- 2. processConfig(configContent, revision)
| |
| |--- 2a. prototext.Unmarshal(configContent) --> pb.SheriffConfig
| |
| |--- 2b. validate.ValidateConfig(pb.SheriffConfig) --> Error or OK
| |
| v
| For each pb.Subscription in pb.SheriffConfig:
| |
| |--- 2c. If subscription.Instance != service.Instance --> Skip
| |
| |--- 2d. subscriptionStore.GetSubscription(name, revision) --> ExistingSubscription?
| |
| |--- 2e. If ExistingSubscription == nil (new or updated):
| | |
| | |--- makeSubscriptionEntity(pb.Subscription, revision) --> subscription_pb.Subscription
| | |
| | |--- makeSaveRequests(pb.Subscription, revision)
| | | |
| | | v
| | | For each pb.AnomalyConfig in pb.Subscription:
| | | |
| | | v
| | | For each matchRule in pb.AnomalyConfig.Rules:
| | | |
| | | |--- buildQueryFromRules(matchRule, excludeRules) --> queryString
| | | |
| | | |--- createAlert(queryString, pb.AnomalyConfig, pb.Subscription, revision) --> alerts.Alert
| | | |
| | | ---> Collect alerts.SaveRequest
| | |
| | ---> Collect subscription_pb.Subscription
|
v
Database Transaction (BEGIN)
|
|--- 3. subscriptionStore.InsertSubscriptions(collected_subscriptions)
|
|--- 4. alertStore.ReplaceAll(collected_save_requests)
|
Database Transaction (COMMIT or ROLLBACK)
```
This module acts as a critical bridge, translating human-readable (and
machine-parsable via proto) alerting definitions into the concrete data
structures used by Perf's backend alerting and subscription systems. The
validation step is key to preventing malformed configurations from breaking the
alerting pipeline. The revision checking mechanism ensures efficiency by only
processing changes.
# Module: /go/shortcut
The `shortcut` module provides functionality for creating, storing, and
retrieving "shortcuts". A shortcut is essentially a named list of trace keys.
These trace keys typically represent specific performance metrics or
configurations. The primary purpose of shortcuts is to provide a convenient way
to refer to a collection of traces with a short, memorable identifier, rather
than having to repeatedly specify the full list of keys. This is particularly
useful for sharing links to specific views in the Perf UI or for programmatic
access to predefined sets of performance data.
The core component is the `Store` interface, defined in `shortcut.go`. This
interface abstracts the underlying storage mechanism, allowing different
implementations to be used (e.g., in-memory for testing, SQL database for
production). The key operations defined by the `Store` interface are:
- `Insert`: Adds a new shortcut to the store. It takes an `io.Reader`
containing the shortcut data (typically JSON) and returns a unique ID for
the shortcut.
- `InsertShortcut`: Similar to `Insert`, but takes a `Shortcut` struct
directly.
- `Get`: Retrieves a shortcut given its ID.
- `GetAll`: Returns a channel that streams all stored shortcuts. This is
useful for tasks like data migration.
- `DeleteShortcut`: Removes a shortcut from the store.
A `Shortcut` itself is a simple struct containing a slice of strings, where each
string is a trace key.
The generation of shortcut IDs is handled by the `IDFromKeys` function. This
function takes a `Shortcut` struct, sorts its keys alphabetically (to ensure
that the order of keys doesn't affect the ID), and then computes an MD5 hash of
the concatenated keys. A prefix "X" is added to this hash for historical
reasons, maintaining compatibility with older systems. This deterministic ID
generation ensures that the same set of keys will always produce the same
shortcut ID.
Workflow for creating and retrieving a shortcut:
1. **Creation**: `Client Code` ---(JSON data or Shortcut struct)--->
`Store.Insert` or `Store.InsertShortcut` `Store` ---(Generates ID using
IDFromKeys, marshals to JSON if needed)---> `Underlying Storage (e.g., SQL
DB)` `Underlying Storage` ---> `Store` ---(Returns Shortcut ID)---> `Client
Code`
2. **Retrieval**: `Client Code` ---(Shortcut ID)---> `Store.Get` `Store`
---(Queries by ID)---> `Underlying Storage (e.g., SQL DB)` `Underlying
Storage` ---(Returns stored JSON or data)---> `Store` `Store` ---(Unmarshals
to Shortcut struct, sorts keys)---> `Client Code` (receives Shortcut struct)
The `sqlshortcutstore` subdirectory provides a concrete implementation of the
`Store` interface using an SQL database (specifically designed for CockroachDB,
as indicated by test setup and migration references). The `sqlshortcutstore.go`
file contains the logic for interacting with the database, including SQL
statements for inserting, retrieving, and deleting shortcuts. Shortcut data is
stored as JSON strings in the database. The schema for the `Shortcuts` table is
implicitly defined by the SQL statements and further clarified in
`sqlshortcutstore/schema/schema.go`, which defines a `ShortcutSchema` struct
mirroring the table structure (though this struct is primarily for documentation
or ORM-like purposes and not directly used in the raw SQL interaction in
`sqlshortcutstore.go`).
Testing is a significant aspect of this module:
- `shortcut_test.go` contains unit tests for the `IDFromKeys` function,
ensuring its correctness and deterministic behavior.
- `shortcuttest` provides a suite of common tests (`InsertGet`,
`GetNonExistent`, `GetAll`, `DeleteShortcut`) that can be run against any
implementation of the `shortcut.Store` interface. This promotes consistency
and ensures that different store implementations behave as expected. The
`InsertGet` test, for example, verifies that a stored shortcut can be
retrieved and that the keys are sorted upon retrieval, even if they were not
sorted initially.
- `sqlshortcutstore_test.go` utilizes the tests from `shortcuttest` to
validate the `SQLShortcutStore` implementation against a test database.
- `mocks/Store.go` provides a mock implementation of the `Store` interface,
generated by the `mockery` tool. This is useful for testing components that
depend on `shortcut.Store` without needing a real storage backend.
# Module: /go/sql
The `go/sql` module serves as the central hub for managing the SQL database
schema used by the Perf application. It defines the structure of the database
tables and provides utilities for schema generation, validation, and migration.
This module ensures that the application's database schema is consistent,
well-defined, and can evolve smoothly over time.
**Key Responsibilities and Components:**
- **Schema Definition (`schema.go`, `spanner/schema_spanner.go`):**
- **Why:** These files contain the SQL `CREATE TABLE` statements that
define the structure of all tables used by Perf. Having the schema
defined in code (generated from Go structs) provides a single source of
truth and allows for easier version control and programmatic
manipulation.
- **How:**
- `schema.go`: Defines the schema for CockroachDB.
- `spanner/schema_spanner.go`: Defines the schema for Spanner. Spanner has
slightly different SQL syntax and features (e.g., `TTL INTERVAL`),
necessitating a separate schema definition.
- The schema is not written manually but is _generated_ by the `tosql`
utility (see below). This ensures that the SQL schema accurately
reflects the Go struct definitions in other modules (e.g.,
`perf/go/alerts/sqlalertstore/schema`).
- Along with the `CREATE TABLE` statements, these files also export slices
of strings representing the column names for each table. This can be
useful for constructing SQL queries programmatically.
- **Table Struct Definition (`tables.go`):**
- **Why:** This file defines a Go struct `Tables` which aggregates all the
individual table schema structs from various Perf sub-modules (like
`alerts`, `anomalygroup`, `git`, etc.).
- **How:** The `Tables` struct serves as the input to the `tosql` schema
generator. By referencing schema structs from other modules, it ensures
that the generated SQL schema is consistent with how data is represented
and manipulated throughout the application. The `//go:generate`
directives at the top of this file trigger the `tosql` utility to
regenerate the schema files when necessary.
- **Schema Generation Utility (`tosql/main.go`):**
- **Why:** Manually writing and maintaining complex SQL schemas is
error-prone. This utility automates the generation of the SQL schema
files (`schema.go` and `spanner/schema_spanner.go`) from the Go struct
definitions.
- **How:** It takes the `sql.Tables` struct (defined in `tables.go`) as
input and uses the `go/sql/exporter` module to translate the Go struct
tags and field types into corresponding SQL `CREATE TABLE` statements.
It supports different SQL dialects (CockroachDB and Spanner) and can
handle specific features like Spanner's TTL (Time To Live) for tables.
The `schemaTarget` flag controls which database dialect is generated.
- **Expected Schema and Migration (`expectedschema/`):**
- **Why:** As the application evolves, the database schema needs to
change. This submodule manages schema migrations, ensuring that the live
database can be updated to new versions without downtime or data loss.
It also validates that the current database schema matches an expected
version.
- **How:**
- **`embed.go`:** This file uses `go:embed` to embed JSON representations
of the _current_ (`schema.json`, `schema_spanner.json`) and _previous_
(`schema_prev.json`, `schema_prev_spanner.json`) expected database
schemas. These JSON files are generated by the `exportschema` utility.
`Load()` and `LoadPrev()` functions provide access to these deserialized
schema descriptions.
- **`migrate.go`:** This is the core of the schema migration logic.
- It defines SQL statements (`FromLiveToNext`, `FromNextToLive`, and
their Spanner equivalents) that describe how to upgrade the database
from the "previous" schema version to the "next" (current) schema
version, and how to roll back that change. **Crucially, schema
changes must be backward and forward compatible** because during a
deployment, old and new versions of the application might run
concurrently.
- `ValidateAndMigrateNewSchema` is the key function. It:
* Loads the "next" (target) and "previous" expected schemas from the
embedded JSON files.
* Gets the _actual_ schema description from the live database.
* Compares the actual schema with the previous and next expected
schemas.
- If `actual == next`, no migration is needed.
- If `actual == prev` and `actual != next`, it executes the
`FromLiveToNext` SQL statements to upgrade the database schema.
- If `actual` matches neither `prev` nor `next`, it indicates an
unexpected schema state and returns an error, preventing
application startup. This is a critical safety check.
- The migration process is designed to be run by a maintenance task
during deployment. Old instances (frontend, ingesters) ->
Maintenance task (runs `ValidateAndMigrateNewSchema`) -> New
instances (frontend, ingesters)
```
Deployment Starts
|
V
Maintenance Task Runs
|
+------------------------------------+
| Calls ValidateAndMigrateNewSchema |
+------------------------------------+
|
V
Is schema == previous_expected_schema? --Yes--> Apply `FromLiveToNext` SQL
| No |
V V
Is schema == current_expected_schema? ---Yes---> Migration Successful / No Action
| No
V
Error: Schema mismatch! Halt.
|
V
New Application Instances Start (if migration was successful)
```
- **Test files (`migrate_test.go`, `migrate_spanner_test.go`):** These
files contain unit tests to verify the schema migration logic for both
CockroachDB and Spanner. They test scenarios where no migration is
needed, migration is required, and the schema is in an invalid state.
- **Schema Export Utility (`exportschema/main.go`):**
- **Why:** The `expectedschema` submodule needs JSON representations of
the "current" and "previous" database schemas to perform validation and
migration. This utility generates these JSON files.
- **How:** It takes the `sql.Tables` struct (for CockroachDB) or
`spanner.Schema` (for Spanner) and uses the `go/sql/schema/exportschema`
module to serialize the schema description into a JSON format. The
output of this utility is typically checked into version control as
`schema.json`, `schema_prev.json`, etc., within the `expectedschema`
directory. The typical workflow for a schema change involves:
* Make schema changes in relevant Go structs (e.g., add a new field to
`alerts.AlertSchema`).
* Run `go generate ./...` within `perf/go/sql/` to regenerate `schema.go`
and `spanner/schema_spanner.go`.
* Copy the _old_ `expectedschema/schema.json` to
`expectedschema/schema_prev.json` (and similarly for Spanner).
* Run the `exportschema` binary (e.g., `bazel run
//perf/go/sql/exportschema -- --out
perf/go/sql/expectedschema/schema.json`) to generate the new
`expectedschema/schema.json`.
* Update the `FromLiveToNext` and `FromNextToLive` SQL statements in
`expectedschema/migrate.go`.
* Update test constants in `sql_test.go` (`LiveSchema`, `DropTables`) if
necessary.
- **Testing Utilities (`sqltest/sqltest.go`):**
- **Why:** Provides standardized ways to set up temporary CockroachDB or
Spanner emulator instances for testing components that interact with the
database.
- **How:**
- `NewCockroachDBForTests`: Sets up a connection to a local CockroachDB
instance (managed by `cockroachdb_instance.Require`), creates a new
temporary database for the test, applies the current `sql.Schema`, and
registers a cleanup function to drop the database after the test.
- `NewSpannerDBForTests`: Similarly, sets up a connection to a local
Spanner emulator (via PGAdapter, required by `pgadapter.Require`),
applies the current `spanner.Schema`, and prepares it for tests.
- These functions abstract away the complexities of emulator management
and initial schema setup, making tests cleaner and more reliable.
- **Schema Tests (`sql_test.go`):**
- **Why:** Verifies that the schema migration scripts correctly transform
a database from a "live-like" previous state to the current expected
state.
- **How:**
- Defines constants like `DropTables` (to clean up) and `LiveSchema` /
`LiveSchemaSpanner`. `LiveSchema` represents the schema _before_ the
latest change defined in `expectedschema/migrate.go`'s `FromLiveToNext`.
- The tests typically:
1. Create a test database.
2. Apply `DropTables` to ensure a clean slate.
3. Apply `LiveSchema` to simulate the state of the database _before_
the pending migration.
4. Execute `expectedschema.FromLiveToNext` (or its Spanner equivalent).
5. Fetch the schema description from the migrated database.
6. Compare this migrated schema with the schema obtained by applying
`sql.Schema` (or `spanner.Schema`) directly to a fresh database
(which represents the target state). They should be identical.
This comprehensive approach to schema management ensures that Perf's database
can be reliably deployed, maintained, and evolved. The separation of concerns
(schema definition, generation, validation, migration, and testing) makes the
system robust and easier to understand.
# Module: /go/stepfit
The `stepfit` module is designed to analyze time-series data, specifically
performance traces, to detect significant changes or "steps." It employs various
statistical algorithms to determine if a step up (performance improvement), a
step down (performance regression), or no significant change has occurred in the
data. This module is crucial for automated performance monitoring, allowing for
the identification of impactful changes in system behavior.
The core idea is to fit a step function to the input trace data. A step function
is a simple function that is constant except for a single jump (the "step") at a
particular point (the "turning point"). The module calculates the best fit for
such a function and then evaluates the characteristics of this fit to determine
the nature and significance of the step.
**Key Components and Logic:**
The primary entity in this module is the `StepFit` struct. It encapsulates the
results of the step detection analysis:
- `LeastSquares`: This field stores the Least Squares Error (LSE) of the
fitted step function. A lower LSE generally indicates a better fit of the
step function to the data. It's important to note that not all step
detection algorithms calculate or use LSE; in such cases, this field is set
to `InvalidLeastSquaresError`.
- `TurningPoint`: This integer indicates the index in the input trace where
the step function changes its value. It essentially marks the location of
the detected step.
- `StepSize`: This float represents the magnitude of the change in the step
function. A negative `StepSize` implies a step _up_ in the trace values
(conventionally a performance regression, e.g., increased latency).
Conversely, a positive `StepSize` indicates a step _down_ (conventionally a
performance improvement, e.g., decreased latency).
- `Regression`: This value is a metric used to quantify the significance or
"interestingness" of the detected step. Its calculation varies depending on
the chosen `stepDetection` algorithm.
- For the `OriginalStep` algorithm, it's calculated as `StepSize / LSE`
(or `StepSize / stddevThreshold` if LSE is too small). A larger absolute
value of `Regression` implies a more significant step.
- For other algorithms like `AbsoluteStep`, `PercentStep`, and
`CohenStep`, `Regression` is directly related to the `StepSize` (or a
normalized version of it).
- For `MannWhitneyU`, `Regression` represents the p-value of the test.
- `Status`: This is an enumerated type (`StepFitStatus`) indicating the
overall assessment of the step:
- `LOW`: A step down was detected, often interpreted as a performance
improvement.
- `HIGH`: A step up was detected, often interpreted as a performance
regression.
- `UNINTERESTING`: No significant step was found.
The main function responsible for performing the analysis is `GetStepFitAtMid`.
It takes the following inputs:
- `trace`: A slice of `float32` representing the time-series data to be
analyzed.
- `stddevThreshold`: A threshold for standard deviation. This is used in the
`OriginalStep` algorithm for normalizing the trace and as a floor for
standard deviation in other algorithms like `CohenStep` to prevent division
by zero or near-zero values.
- `interesting`: A threshold value used to determine if a calculated
`Regression` value is significant enough to be classified as `HIGH` or
`LOW`. The exact interpretation of this threshold depends on the
`stepDetection` algorithm.
- `stepDetection`: An enumerated type (`types.StepDetection`) specifying which
algorithm to use for step detection.
**Workflow of `GetStepFitAtMid`:**
1. **Initialization and Preprocessing:**
- A new `StepFit` struct is initialized with `Status` set to
`UNINTERESTING`.
- If the trace length is less than `minTraceSize` (currently 3), the
function returns the initialized `StepFit` as there isn't enough data to
analyze.
- **Trace Normalization/Adjustment:**
- If `stepDetection` is `types.OriginalStep`, the input `trace` is
duplicated and normalized (mean centered and scaled by its standard
deviation, unless the standard deviation is below
`stddevThreshold`).
- For all other `stepDetection` types, if the trace has an odd length,
the last element is dropped to make the trace length even. This is
because these algorithms typically compare the first half of the
trace with the second half.
2. **Step Detection Algorithm Execution:** The function then proceeds based on
the selected `stepDetection` algorithm. The core logic involves splitting
the (potentially modified) trace roughly in half at the `TurningPoint`
(which is `len(trace) / 2`) and comparing statistics of the two halves.
- **`types.OriginalStep`:**
- Calculates the mean of the first half (`y0`) and the second half
(`y1`) of the (normalized) trace.
- Computes the Sum of Squared Errors (SSE) for fitting `y0` to the
first half and `y1` to the second half. The `LeastSquares` error
(`lse`) is derived from this SSE.
- `StepSize` is `y0 - y1`.
- `Regression` is calculated as `StepSize / lse` (or `StepSize /
stddevThreshold`if`lse` is too small). _Note: The original
implementation has a slight deviation from the standard definition
of standard error in this calculation._
- **`types.AbsoluteStep`:**
- `StepSize` is `y0 - y1`.
- `Regression` is simply the `StepSize`.
- The step is considered interesting if the absolute value of
`StepSize` meets the `interesting` threshold.
- **`types.Const`:**
- This algorithm behaves differently. It focuses on the absolute value
of the trace point at the `TurningPoint` (`trace[i]`).
- `StepSize` is `abs(trace[i]) - interesting`.
- `Regression` is `-1 * abs(trace[i])`. This is done so that larger
deviations (regressions) result in more negative `Regression`
values, which are then flagged as `HIGH`.
- **`types.PercentStep`:**
- `StepSize` is `(y0 - y1) / y0`, representing the percentage change
relative to the mean of the first half.
- Handles potential `Inf` or `NaN` results from the division (e.g., if
`y0` is zero).
- `Regression` is the `StepSize`.
- **`types.CohenStep`:**
- Calculates Cohen's d, a measure of effect size.
- `StepSize` is `(y0 - y1) / s_pooled`, where `s_pooled` is the pooled
standard deviation of the two halves (or `stddevThreshold` if
`s_pooled` is too small or NaN).
- `Regression` is the `StepSize`.
- **`types.MannWhitneyU`:**
- Performs a Mann-Whitney U test (a non-parametric test) to determine
if the two halves of the trace come from different distributions.
- `StepSize` is `y0 - y1`.
- `Regression` is the p-value of the test.
- `LeastSquares` is set to the U-statistic from the test.
3. **Status Determination:**
- For `types.MannWhitneyU`:
- If `Regression` (p-value) is less than or equal to the `interesting`
threshold (e.g., 0.05), a significant difference is detected.
- The `Status` (`HIGH` or `LOW`) is then determined by the sign of
`StepSize`. If `StepSize` is negative (step up), `Status` is `HIGH`.
Otherwise, it's `LOW`.
- The `Regression` value is then negated if the status is `HIGH` to
align with the convention that more negative values are "worse."
- For all other algorithms:
- If `Regression` is greater than or equal to `interesting`, `Status`
is `LOW`.
- If `Regression` is less than or equal to `-interesting`, `Status` is
`HIGH`.
- Otherwise, `Status` remains `UNINTERESTING`.
4. **Return Result:** The populated `StepFit` struct, containing
`LeastSquares`, `TurningPoint`, `StepSize`, `Regression`, and `Status`, is
returned.
**Design Rationale:**
- **Multiple Algorithms:** The inclusion of various step detection algorithms
(`OriginalStep`, `AbsoluteStep`, `PercentStep`, `CohenStep`, `MannWhitneyU`)
provides flexibility. Different datasets and performance characteristics may
be better suited to different statistical approaches. For instance,
`MannWhitneyU` is non-parametric and makes fewer assumptions about the data
distribution, which can be beneficial for noisy or non-Gaussian data.
`AbsoluteStep` and `PercentStep` offer simpler, more direct ways to define a
regression based on absolute or relative changes.
- **Centralized Logic:** The `GetStepFitAtMid` function consolidates the logic
for all supported algorithms, making it easier to manage and extend.
- **Clear `StepFit` Structure:** The `StepFit` struct provides a well-defined
way to communicate the results of the analysis, separating the raw metrics
(like `StepSize`, `LeastSquares`) from the final interpretation (`Status`).
- **`interesting` Threshold:** The `interesting` parameter allows users to
customize the sensitivity of the step detection. This is crucial because
what constitutes a "significant" change can vary greatly depending on the
context of the performance metric being monitored.
- **`stddevThreshold`:** This parameter helps in handling cases with very low
variance, preventing numerical instability (like division by zero) and
ensuring that normalization in `OriginalStep` behaves reasonably.
- **Focus on the Middle:** The `GetStepFitAtMid` name implies that the step
detection is focused around the middle of the trace. This is a common
approach for detecting a single, prominent step. More complex scenarios with
multiple steps would require different techniques.
**Why specific implementation choices?**
- **Normalization in `OriginalStep`:** Normalizing the trace in the
`OriginalStep` algorithm (as described in the linked blog post) aims to make
the detection less sensitive to the absolute scale of the data and more
focused on the relative change.
- **Symmetric Traces for Non-`OriginalStep`:** For algorithms other than
`OriginalStep`, ensuring an even trace length by potentially dropping the
last point simplifies the division of the trace into two equal halves for
comparison.
- **Handling of `Inf` and `NaN` in `PercentStep`:** Explicitly checking for
and handling `Inf` and `NaN` values that can arise from division by zero
(when `y0` is zero) makes the `PercentStep` calculation more robust.
- **`Regression` as p-value for `MannWhitneyU`:** Using the p-value as the
`Regression` metric for `MannWhitneyU` directly reflects the statistical
significance of the observed difference between the two halves of the trace.
The `interesting` threshold then acts as the significance level (alpha).
- **`InvalidLeastSquaresError`:** This constant provides a clear way to
indicate when LSE is not applicable or not calculated by a particular
algorithm, avoiding confusion with a calculated LSE of 0 or a negative
value.
In essence, the `stepfit` module provides a toolkit for identifying abrupt
changes in performance data, offering different lenses (algorithms) through
which to view and quantify these changes. The design prioritizes flexibility in
algorithm choice and user-configurable sensitivity to cater to diverse
performance analysis needs.
# Module: /go/subscription
The `subscription` module manages alerting configurations, known as
subscriptions, for anomalies detected in performance data. It provides the means
to define, store, and retrieve these configurations.
The core concept is that a "subscription" dictates how the system should react
when an anomaly is found. This includes details like which bug tracker component
to file an issue under, what labels to apply, who to CC on the bug, and the
priority/severity of the issue. This allows for automated and consistent
handling of performance regressions.
Subscriptions are versioned using an `infra_internal` Git hash (revision). This
allows for tracking changes to subscription configurations over time and ensures
that the correct configuration is used based on the state of the infrastructure
code.
**Key Components and Files:**
- **`store.go`**: Defines the `Store` interface. This interface is the central
abstraction for interacting with subscription data. It dictates the
operations that any concrete subscription storage implementation must
provide. This design choice allows for flexibility in the underlying storage
mechanism (e.g., SQL database, in-memory store for testing).
- **Why an interface?** Decouples the business logic from the specific
storage implementation. This promotes testability (using mocks) and
allows for easier migration to different database technologies in the
future if needed.
- **Key methods:**
- `GetSubscription`: Retrieves a specific version of a subscription.
- `GetActiveSubscription`: Retrieves the currently active version of a
subscription by its name. This is likely the most common retrieval
method for active alerting.
- `InsertSubscriptions`: Allows for batch insertion of new subscriptions.
This is typically done within a database transaction to ensure atomicity
– either all subscriptions are inserted, or none are. This is crucial
when updating configurations, as it prevents a partially updated state.
The implementation in `sqlsubscriptionstore` deactivates all existing
subscriptions before inserting the new ones as active, effectively
replacing the entire active set.
- `GetAllSubscriptions`: Retrieves all historical versions of all
subscriptions.
- `GetAllActiveSubscriptions`: Retrieves all currently active
subscriptions. This is useful for systems that need to know all current
alerting rules.
- **`proto/v1/subscription.proto`**: Defines the structure of a `Subscription`
using Protocol Buffers. This is the canonical data model for subscriptions.
- **Why Protocol Buffers?** Provides a language-neutral, platform-neutral,
extensible mechanism for serializing structured data. This is beneficial
for potential interoperability with other services or for persisting
data in a well-defined format. It also generates efficient serialization
and deserialization code.
- **Key fields:** `name`, `revision`, `bug_labels`, `hotlists`,
`bug_component`, `bug_priority`, `bug_severity`, `bug_cc_emails`,
`contact_email`. Each field directly maps to a configuration aspect for
bug filing and contact information.
- **`sqlsubscriptionstore/sqlsubscriptionstore.go`**: Provides a concrete
implementation of the `Store` interface using an SQL database (specifically
designed for CockroachDB, as indicated by the use of `pgx`).
- **Why SQL?** Relational databases offer robust data integrity,
transaction support (ACID properties), and powerful querying
capabilities, which are well-suited for managing structured
configuration data like subscriptions.
- **How it works:** It defines SQL statements for each operation in the
`Store` interface. When inserting subscriptions, it first deactivates
all existing subscriptions and then inserts the new ones as active. This
ensures that only the latest set of configurations is considered active.
- The `is_active` boolean column in the database schema
(`sqlsubscriptionstore/schema/schema.go`) is key to this "active
version" concept.
- **`sqlsubscriptionstore/schema/schema.go`**: Defines the SQL table schema
for storing subscriptions.
- **Key design choice:** The primary key is a composite of `name` and
`revision`. This allows multiple versions of the same named subscription
to exist, identified by their revision. The `is_active` field
differentiates the current version from historical ones.
- **`mocks/Store.go`**: Contains a mock implementation of the `Store`
interface, generated by the `mockery` tool.
- **Why mocks?** Essential for unit testing components that depend on the
`Store` interface without requiring an actual database connection. This
makes tests faster, more reliable, and isolates the unit under test.
**Key Workflows:**
1. **Updating Subscriptions:** This typically happens when configurations in
`infra_internal` are changed.
```
External Process (e.g., config syncer)
|
v
Reads new subscription definitions (likely from files)
|
v
Parses definitions into []*pb.Subscription
|
v
Calls store.InsertSubscriptions(ctx, newSubscriptions, tx)
|
|--> [SQL Transaction Start]
| |
| v
| sqlsubscriptionstore: Deactivate all existing subscriptions (UPDATE Subscriptions SET is_active=false WHERE is_active=true)
| |
| v
| sqlsubscriptionstore: Insert each new subscription with is_active=true (INSERT INTO Subscriptions ...)
| |
| v
|--> [SQL Transaction Commit/Rollback]
```
This ensures that the update is atomic. If any part fails, the transaction
is rolled back, leaving the previous set of active subscriptions intact.
2. **Anomaly Detection Triggering Alerting:** `Anomaly Detector | v Identifies
an anomaly and the relevant subscription name (e.g., based on metric
patterns) | v Calls store.GetActiveSubscription(ctx, subscriptionName) | v
sqlsubscriptionstore: Retrieves the active subscription (SELECT ... FROM
Subscriptions WHERE name=$1 AND is_active=true) | v Anomaly Detector uses
the pb.Subscription details (bug component, labels, etc.) to file a bug.`
This module provides a robust and versioned way to manage alerting rules,
ensuring that performance regressions are handled consistently and routed
appropriately. The separation of interface and implementation, along with the
use of Protocol Buffers, contributes to a maintainable and extensible system.
# Module: /go/tracecache
## TraceCache Module Documentation
The `tracecache` module provides a mechanism for caching trace identifiers
(trace IDs) associated with specific tiles and queries. This caching layer
significantly improves performance by reducing the need to repeatedly compute or
fetch trace IDs, which can be a computationally expensive operation.
**Core Functionality & Design Rationale:**
The primary purpose of `tracecache` is to store and retrieve lists of trace IDs.
Trace IDs are represented as `paramtools.Params`, which are essentially
key-value pairs that uniquely identify a specific trace within the performance
monitoring system.
The caching strategy is built around the concept of a "tile" and a "query."
- **Tile:** In the context of Skia Perf, a tile represents a chunk of commit
history. Caching trace IDs per tile allows for efficient retrieval of
relevant traces when analyzing a specific range of commits.
- **Query:** A query, represented by `query.Query`, defines the specific
parameters used to filter traces. Different queries will yield different
sets of trace IDs.
By combining the tile number and a string representation of the query, a unique
cache key is generated. This ensures that cached data is specific to the exact
combination of commit range and filter criteria.
The module relies on an external caching implementation provided via the
`go/cache.Cache` interface. This design choice promotes flexibility, allowing
different caching backends (e.g., in-memory, Redis, Memcached) to be used
without modifying the `tracecache` logic itself. This separation of concerns is
crucial for adapting to various deployment environments and performance
requirements.
**Key Components:**
- **`traceCache.go`**: This is the sole file in the module and contains the
implementation of the `TraceCache` struct and its associated methods.
- **`TraceCache` struct**:
- Holds an instance of `cache.Cache`. This is the underlying cache client
used for storing and retrieving data.
- **`New(cache cache.Cache) *TraceCache`**:
- The constructor for `TraceCache`. It takes a `cache.Cache` instance as
an argument, which will be used for all caching operations. This
dependency injection allows the caller to provide any cache
implementation that conforms to the `cache.Cache` interface.
- **`CacheTraceIds(ctx context.Context, tileNumber types.TileNumber, q
*query.Query, traceIds []paramtools.Params) error`**:
- This method is responsible for storing a list of trace IDs into the
cache.
- It first generates a unique `cacheKey` using the `tileNumber` and the
`query.Query`.
- The `traceIds` (a slice of `paramtools.Params`) are then serialized into
a JSON string using the `toJSON` helper function. This serialization is
necessary because most cache backends store data as strings or byte
arrays. JSON is chosen for its human-readability and widespread support.
- Finally, it uses the `cacheClient.SetValue` method to store the JSON
string under the generated `cacheKey`.
- **`GetTraceIds(ctx context.Context, tileNumber types.TileNumber, q
*query.Query) ([]paramtools.Params, error)`**:
- This method retrieves a list of trace IDs from the cache.
- It generates the `cacheKey` in the same way as `CacheTraceIds`.
- It then attempts to fetch the value associated with this key using
`cacheClient.GetValue`.
- If the value is not found in the cache (i.e., `cacheJson` is empty), it
returns `nil` for both the trace IDs and the error, indicating a cache
miss.
- If a value is found, it deserializes the JSON string back into a slice
of `paramtools.Params` using `json.Unmarshal`.
- **`traceIdCacheKey(tileNumber types.TileNumber, q query.Query)
string`**:
- A private helper function that constructs the cache key. It combines the
`tileNumber` (an integer) and a string representation of the
`query.Query` (obtained via `q.KeyValueString()`) separated by an
underscore. This format ensures uniqueness and provides some
human-readable context within the cache keys.
- **`toJSON(obj interface{}) (string, error)`**:
- A private generic helper function to marshal any given object into a
JSON string. This is used specifically for serializing the
`[]paramtools.Params` before caching.
**Workflow for Caching Trace IDs:**
1. **Input:** `tileNumber`, `query.Query`, `[]paramtools.Params` (trace IDs to
cache)
2. `CacheTraceIds` is called.
3. `traceIdCacheKey(tileNumber, query)` generates a unique key. `tileNumber +
"_" + query.KeyValueString() ---> cacheKey`
4. `toJSON(traceIds)` serializes the list of trace IDs into a JSON string.
`[]paramtools.Params --json.Marshal--> jsonString`
5. `t.cacheClient.SetValue(ctx, cacheKey, jsonString)` stores the JSON string
in the underlying cache.
**Workflow for Retrieving Trace IDs:**
1. **Input:** `tileNumber`, `query.Query`
2. `GetTraceIds` is called.
3. `traceIdCacheKey(tileNumber, query)` generates the cache key (same logic as
above). `tileNumber + "_" + query.KeyValueString() ---> cacheKey`
4. `t.cacheClient.GetValue(ctx, cacheKey)` attempts to retrieve the value from
the cache. `cacheClient --GetValue(cacheKey)--> jsonString (or empty if not
found)`
5. **If `jsonString` is empty (cache miss):** Return `nil`, `nil`.
6. **If `jsonString` is not empty (cache hit):**
`json.Unmarshal([]byte(jsonString), &traceIds)` deserializes the JSON string
back into `[]paramtools.Params`. `jsonString --json.Unmarshal-->
[]paramtools.Params`
7. Return the deserialized `[]paramtools.Params` and `nil` error.
# Module: /go/tracefilter
## Tracefilter Module Documentation
The `tracefilter` module provides a mechanism to organize and filter trace data
based on their hierarchical paths. The core idea is to represent traces within a
tree structure, where each node in the tree corresponds to a segment of the
trace's path. This allows for efficient filtering of traces, specifically to
identify "leaf" traces – those that do not have any further sub-paths.
This approach is particularly useful in scenarios where traces have a
parent-child relationship implied by their path structure. For instance, in
performance analysis, a trace like `/root/p1/p2/p3/t1` might represent a
specific test (`t1`) under a series of nested configurations (`p1`, `p2`, `p3`).
If there's another trace `/root/p1/p2`, it could be considered a "parent" or an
aggregate trace. The `tracefilter` helps in identifying only the most specific,
or "leaf," traces, effectively filtering out these higher-level parent traces.
### Key Components and Responsibilities
The primary component is the `TraceFilter` struct.
**`TraceFilter` struct:**
- **Purpose:** Represents a node within the trace path tree.
- **Fields:**
- `traceKey`: A string identifier associated with the trace path ending at
this node. For the root of the tree, this is initialized to "HEAD".
- `value`: The string value of the current path segment this node
represents.
- `children`: A map where keys are the next path segments and values are
pointers to child `TraceFilter` nodes. This map forms the branches of
the tree.
- **Why this structure?**
- A tree is a natural way to represent hierarchical path data.
- Using a map for `children` allows for efficient lookup and addition of
child nodes based on the next path segment.
- Storing the `traceKey` at each node allows associating an identifier
with a complete path as it's being built.
**`NewTraceFilter()` function:**
- **Purpose:** Acts as the constructor for the `TraceFilter` tree.
- **How it works:** It initializes a root `TraceFilter` node. The `traceKey`
is set to "HEAD" as a sentinel value for the root, and its `children` map is
initialized as empty, ready to have paths added to it.
- **Why this design?** Provides a clear and simple entry point for creating a
new filter structure.
**`AddPath(path []string, traceKey string)` method:**
- **Purpose:** Adds a new trace, defined by its `path` (a slice of strings
representing path segments) and its unique `traceKey`, to the filter tree.
- **How it works:**
1. It traverses the tree, creating new nodes as needed for each segment in
the input `path`.
2. If a segment in the `path` already exists as a child of the current
node, it moves to that existing child.
3. If a segment does not exist, a new `TraceFilter` node is created for
that segment, its `value` is set to the segment string, its `traceKey`
is set to the input `traceKey`, and it's added to the `children` map of
the current node.
4. This process repeats recursively for the remaining segments in the
`path`.
- **Why this design?**
- This incremental build process efficiently constructs the tree by
reusing existing nodes for common path prefixes.
- The recursive nature elegantly handles paths of arbitrary length.
- Associating the `traceKey` with each newly created node ensures that
even intermediate nodes (which might later become leaves if no further
sub-paths are added) have an associated key.
```
Example: Adding path ["root", "p1", "p2"] with key "keyA"
Initial Tree:
(HEAD)
After AddPath(["root", "p1", "p2"], "keyA"):
(HEAD)
|
+-- ("root", key="keyA")
|
+-- ("p1", key="keyA")
|
+-- ("p2", key="keyA") <- Leaf node initially
```
If we then add `["root", "p1", "p2", "t1"]` with key `"keyB"`:
```
(HEAD)
|
+-- ("root", key="keyB") // traceKey updated if path is prefix
|
+-- ("p1", key="keyB")
|
+-- ("p2", key="keyB")
|
+-- ("t1", key="keyB") <- New leaf node
```
_Note: The `traceKey` of an existing node is updated by `AddPath` if the new
path being added shares that node as a prefix. This ensures that the
`traceKey` stored at a node corresponds to the longest path ending at that
node if it's also a prefix of other paths._ However, the primary use of
`GetLeafNodeTraceKeys` relies on the `traceKey` of nodes that _become_
leaves.
**`GetLeafNodeTraceKeys()` method:**
- **Purpose:** Retrieves the `traceKey`s of all traces that are considered
"leaf" nodes in the tree. A leaf node is a node that has no children.
- **How it works:**
1. It performs a depth-first traversal of the tree.
2. If the current node has no children (i.e., `len(tf.children) == 0`), its
`traceKey` is considered a leaf key and is added to the result list.
3. If the current node has children, the method recursively calls itself on
each child node and aggregates the results.
- **Why this design?**
- This is the core filtering logic. By only returning keys from nodes
without children, it effectively filters out traces that serve as
prefixes (parents) to other, more specific traces.
- Recursion is a natural fit for traversing tree structures.
```
Workflow for GetLeafNodeTraceKeys:
Start at (CurrentNode)
|
V
Is CurrentNode a leaf (no children)?
|
+-- YES --> Add CurrentNode.traceKey to results
|
+-- NO --> For each ChildNode in CurrentNode.children:
|
V
Recursively call GetLeafNodeTraceKeys on ChildNode
|
V
Append results from ChildNode to overall results
|
V
Return aggregated results
```
### Example Scenario and How it Works
Consider the following traces and their paths:
1. `traceA`: path `["config", "test_group", "test1"]`, key `"keyA"`
2. `traceB`: path `["config", "test_group"]`, key `"keyB"`
3. `traceC`: path `["config", "test_group", "test2"]`, key `"keyC"`
4. `traceD`: path `["config", "other_group", "test3"]`, key `"keyD"`
5. **Tree Construction (`AddPath` calls):**
- `tf.AddPath(["config", "test_group", "test1"], "keyA")`
- `tf.AddPath(["config", "test_group"], "keyB")`
- When this is added, the node for `"test_group"` initially created by
`keyA` will have its `traceKey` updated to `"keyB"`.
- `tf.AddPath(["config", "test_group", "test2"], "keyC")`
- `tf.AddPath(["config", "other_group", "test3"], "keyD")`
The tree would look something like this (simplified, showing relevant
traceKeys for leaf potential):
```
(HEAD)
|
+-- ("config")
|
+-- ("test_group", traceKey likely updated by "keyB" during AddPath)
| |
| +-- ("test1", traceKey="keyA") <-- Leaf
| |
| +-- ("test2", traceKey="keyC") <-- Leaf
|
+-- ("other_group")
|
+-- ("test3", traceKey="keyD") <-- Leaf
```
6. **Filtering (`GetLeafNodeTraceKeys()` call):**
- When `GetLeafNodeTraceKeys()` is called on the root:
- It traverses to `"config"`.
- It traverses to `"test_group"`. This node has children (`"test1"`
and `"test2"`), so its key (`"keyB"`) is _not_ added.
- It traverses to `"test1"`. This is a leaf. `"keyA"` is added.
- It traverses to `"test2"`. This is a leaf. `"keyC"` is added.
- It traverses to `"other_group"`.
- It traverses to `"test3"`. This is a leaf. `"keyD"` is added.
The result would be `["keyA", "keyC", "keyD"]`. Notice that `"keyB"` is
excluded because the path `["config", "test_group"]` has sub-paths
(`.../test1` and `.../test2`), making it a non-leaf node in the context of
trace specificity.
This module provides a clean and efficient way to identify the most granular
traces in a dataset where hierarchy is defined by path structure.
# Module: /go/tracesetbuilder
The `tracesetbuilder` module is designed to efficiently construct a
`types.TraceSet` and its corresponding `paramtools.ReadOnlyParamSet` from
multiple, potentially disparate, sets of trace data. This is particularly useful
when dealing with performance data that might arrive in chunks (e.g., from
different "Tiles" of data) and needs to be aggregated into a coherent view
across a series of commits.
The core challenge this module addresses is the concurrent and distributed
nature of processing trace data. If multiple traces with the same identifier
(key) were processed by different workers simultaneously without coordination,
it could lead to race conditions and incorrect data. Similarly, simply locking
the entire `TraceSet` for each update would create a bottleneck.
The `tracesetbuilder` solves this by employing a worker pool (`mergeWorkers`).
The key design decision here is to distribute the work based on the trace key.
Each trace key is hashed (using `crc32.ChecksumIEEE`), and this hash determines
which `mergeWorker` is responsible for that specific trace. This ensures that
all data points for a single trace are always processed by the same worker,
thereby avoiding the need for explicit locking at the individual trace level
within the worker. Each `mergeWorker` maintains its own `types.TraceSet` and
`paramtools.ParamSet`.
**Key Components and Workflow:**
1. **`TraceSetBuilder`:**
- **Responsibilities:**
- Manages a pool of `mergeWorker` instances.
- Provides the `Add` method to ingest new trace data.
- Provides the `Build` method to consolidate results from all workers
and return the final `TraceSet` and `ReadOnlyParamSet`.
- Provides the `Close` method to shut down the worker pool.
- **`New(size int)`:** Initializes the `TraceSetBuilder`. The `size`
parameter is crucial as it defines the expected length of each trace in
the final, consolidated `TraceSet`. This allows the builder to
pre-allocate trace slices of the correct length, filling in missing data
points as necessary. It creates `numWorkers` instances of `mergeWorker`.
- **`Add(commitNumberToOutputIndex map[types.CommitNumber]int32, commits
[]provider.Commit, traces types.TraceSet)`:\*\* This is the entry point
for feeding data into the builder.
- `traces`: A `types.TraceSet`representing a chunk of data (e.g.,
from a single tile). -`commits`: A slice of `provider.Commit`objects corresponding to the
data points in the`traces`.
- `commitNumberToOutputIndex`: A map that dictates where each data
point from the input `traces`(identified by its
`types.CommitNumber`) should be placed in the _final_ output trace.
This mapping is essential for correctly aligning data points that
might come from different sources or represent different commit
ranges.
- For each trace in the input `traces`:
- It parses the trace key into `paramtools.Params`.
- It creates a `request`struct containing the key, params, the trace
data itself, the`commitNumberToOutputIndex`map, and the`commits` slice.
- It calculates an index based on the CRC32 hash of the trace key
modulo`numWorkers`.
- It sends this `request`to the`ch`channel of the selected
`mergeWorker`.
- A `sync.WaitGroup`is incremented for each trace added, ensuring
`Build` waits for all processing to complete.
- **`Build(ctx context.Context)`:**
- Waits for all `Add`operations to be processed by the workers (using
`t.wg.Wait()`).
- Iterates through all `mergeWorkers`.
- Merges the `traceSet`and`paramSet`from each`mergeWorker`into a
single, final`types.TraceSet`and`paramtools.ParamSet`.
- Normalizes and freezes the final `paramSet`to create a
`paramtools.ReadOnlyParamSet`.
- Returns the consolidated `TraceSet`and`ReadOnlyParamSet`.
- **`Close()`:** Iterates through the `mergeWorkers` and closes their
respective input channels (`ch`). This signals the worker goroutines to
terminate once they have processed all pending requests.
2. **`mergeWorker`:**
- **Responsibilities:**
- Processes `request` objects sent to its channel.
- Maintains its own local `types.TraceSet` and `paramtools.ParamSet`.
- Updates its local `TraceSet` with new data points, placing them
correctly according to `request.commitNumberToOutputIndex`.
- Adds the parameters from each processed trace to its local
`ParamSet`.
- **`newMergeWorker(wg *sync.WaitGroup, size int)`:** Creates a
`mergeWorker` and starts its goroutine.
- It initializes an empty `types.TraceSet` and `paramtools.ParamSet`.
- The goroutine continuously reads `request` objects from its `ch`
channel.
- For each `request`:
- It retrieves or creates a trace in its `m.traceSet` for the given
`req.key`. If creating, it uses `types.NewTrace(size)` to ensure the
trace has the correct final length.
- It iterates through the `req.commits` and uses
`req.commitNumberToOutputIndex` to determine the correct destination
index in its local trace for each data point in `req.trace`.
- It updates the trace value at that destination index.
- It adds `req.params` to its `m.paramSet`.
- It decrements the shared `sync.WaitGroup` (`m.wg.Done()`) to signal
completion of this piece of work.
- **`Process(req *request)`:** Sends a request to the worker's channel.
- **`Close()`:** Closes the worker's input channel.
3. **`request` struct:**
- A simple data structure used to pass all necessary information for
processing a single trace segment through the pipeline to a
`mergeWorker`. It encapsulates the trace key, its parsed parameters, the
actual trace data segment, the mapping of commit numbers to output
indices, and the corresponding commit metadata.
**Workflow Diagram:**
```
TraceSetBuilder.New(outputTraceLength)
|
V
+-----------------------------------------------------------------------+
| TraceSetBuilder (manages WaitGroup and pool of mergeWorkers) |
+-----------------------------------------------------------------------+
| ^
| Add(commitMap1, commits1, traces1) | Build() waits for WaitGroup
| Add(commitMap2, commits2, traces2) |
V |
+-----------------------------------------------------------------------+
| For each trace in input: |
| 1. Parse key -> params |
| 2. Create 'request' struct |
| 3. Hash key -> workerIndex |
| 4. Send 'request' to mergeWorkers[workerIndex].ch |
| 5. Increment WaitGroup |
+-----------------------------------------------------------------------+
| | | ... (numWorkers times)
V V V
+--------+ +--------+ +--------+
| mergeW_0 | | mergeW_1 | | mergeW_N | (Each runs in its own goroutine)
| .ch | | .ch | | .ch |
| .traceSet| | .traceSet| | .traceSet|
| .paramSet| | .paramSet| | .paramSet|
+--------+ +--------+ +--------+
^ ^ ^
| Process request: |
| - Get/Create local trace for req.key (length: outputTraceLength) |
| - For each point in req.trace: |
| - Use req.commitNumberToOutputIndex[commitNum] to find dstIdx |
| - localTrace[dstIdx] = req.trace[srcIdx] |
| - Add req.params to local paramSet |
| - Decrement WaitGroup |
| | |
--------------------- (When TraceSetBuilder.Build() is called)
|
V
+-----------------------------------------------------------------------+
| TraceSetBuilder.Build(): |
| 1. Wait for all 'Add' operations (WaitGroup.Wait()) |
| 2. Create finalTraceSet, finalParamSet |
| 3. For each mergeWorker: |
| - Merge worker.traceSet into finalTraceSet |
| - Merge worker.paramSet into finalParamSet |
| 4. Normalize and Freeze finalParamSet |
| 5. Return finalTraceSet, finalParamSet (ReadOnly) |
+-----------------------------------------------------------------------+
|
V
+-----------------------------------------------------------------------+
| TraceSetBuilder.Close(): |
| - Close channels of all mergeWorkers (signals them to terminate) |
+-----------------------------------------------------------------------+
```
The use of `numWorkers` and `channelBufferSize` are constants that can be tuned
for performance based on the expected workload and system resources. The CRC32
hash provides a reasonably good distribution of keys across workers, minimizing
the chance of one worker becoming a bottleneck. The `sync.WaitGroup` is
essential for ensuring that the `Build` method doesn't prematurely try to
aggregate results before all input data has been processed by the workers.
The design allows for efficient, concurrent processing of large volumes of trace
data by partitioning the work based on trace identity and then merging the
results, making it suitable for building comprehensive views of performance
metrics over time.
# Module: /go/tracestore
The `tracestore` module defines interfaces and implementations for storing and
retrieving performance trace data. It's a core component of the Perf system,
enabling the analysis of performance metrics over time and across different
configurations.
## Design Philosophy
The primary goal of `tracestore` is to provide an efficient and scalable way to
manage large volumes of trace data. This involves:
- **Tiled Storage:** Data is organized into "tiles," which are fixed-size
blocks of commits. This approach simplifies data management and allows for
efficient querying of data within specific time ranges. Each tile has its
own inverted index and ParamSet, making searches within a tile fast.
- **Inverted Indexing:** To quickly find traces matching specific criteria
(e.g., "arch=x86" and "config=8888"), `tracestore` uses an inverted index.
This index maps key-value pairs to the trace IDs that contain them within
each tile.
- **Caching:** Various caching mechanisms are employed to improve performance,
including:
- In-memory LRU caches for frequently accessed data like ParamSets and
recently written Postings/ParamSet entries.
- An optional external cache (like Memcached via `go/cache/memcached`) for
broader caching strategies.
- A `tracecache` for caching the results of `QueryTracesIDOnly` to speed
up repeated queries.
- **Interface-Based Design:** The module defines interfaces (`TraceStore`,
`MetadataStore`, `TraceParamStore`) to allow for different backend
implementations. This promotes flexibility and testability. The primary
implementation provided is `sqltracestore`, which uses an SQL database.
- **Concurrency:** Operations like writing traces and querying are designed to
be concurrent, leveraging Go routines and parallel processing to handle
large datasets efficiently. For instance, writing large batches of traces or
postings is often chunked and processed in parallel.
- **Separation of Concerns:**
- `TraceStore` handles the core logic of reading and writing trace values
and their associated parameters.
- `MetadataStore` manages metadata associated with source files (e.g.,
links to dashboards or logs).
- `TraceParamStore` specifically handles the mapping between trace IDs
(MD5 hashes of trace names) and their full parameter sets. This
separation helps in optimizing storage and retrieval for these distinct
types of data.
## Key Components and Responsibilities
The `tracestore` module is primarily defined by a set of interfaces and their
SQL-based implementations.
### `tracestore.go`
This file defines the main `TraceStore` interface. It outlines the contract for
any system that wants to store and retrieve performance traces. Key
responsibilities include:
- **Writing Traces (`WriteTraces`, `WriteTraces2`):** Ingesting new
performance data points. Each data point is associated with a specific
commit, a set of parameters (defining the trace, e.g.,
`config=8888,arch=x86`), a value, the source file it came from, and a
timestamp.
- The `WriteTraces` method is designed to handle potentially large batches
of data efficiently. Implementations often involve chunking data and
performing parallel writes to the underlying storage.
- `WriteTraces2` is a newer variant, potentially for different storage
schemas or optimizations (e.g., denormalizing common params directly
into the trace values table as seen in `TraceValues2Schema`).
- **Reading Traces (`ReadTraces`, `ReadTracesForCommitRange`):** Retrieving
trace data for specific keys (trace names) within a given tile or commit
range.
- **Querying Traces (`QueryTraces`, `QueryTracesIDOnly`):**
- `QueryTraces` allows searching for traces based on a `query.Query`
object (which specifies parameter key-value pairs). It returns the
actual trace values and associated commit information.
- `QueryTracesIDOnly` is an optimization that returns only the
`paramtools.Params` (effectively the identifying parameters) of traces
matching a query. This is useful when only the list of matching traces
is needed, not their values.
- **Tile Management (`GetLatestTile`, `TileNumber`, `TileSize`,
`CommitNumberOfTileStart`):** Provides methods for interacting with the
tiled storage system.
- **ParamSet Management (`GetParamSet`):** Retrieving the
`paramtools.ReadOnlyParamSet` for a specific tile. A ParamSet represents all
unique key-value pairs present in the traces within that tile, which is
crucial for UI elements like query builders.
- **Source Information (`GetSource`, `GetLastNSources`,
`GetTraceIDsBySource`):** Retrieving information about the origin of trace
data, such as the ingested file name.
### `metadatastore.go`
This file defines the `MetadataStore` interface. Its responsibility is to manage
metadata associated with source files.
- **`InsertMetadata`:** Stores links or other metadata for a given source file
name.
- **`GetMetadata`:** Retrieves the stored metadata for a source file. This can
be used, for example, to link from a data point back to the original log
file or a specific dashboard view related to the data ingestion.
### `traceparamstore.go`
This file defines the `TraceParamStore` interface. This store is dedicated to
managing the relationship between a trace's unique identifier (typically an MD5
hash of its full parameter string) and the actual `paramtools.Params` object.
- **`WriteTraceParams`:** Stores the mapping from trace IDs to their parameter
sets. This is done to avoid repeatedly parsing or storing the full parameter
string for every data point of a trace.
- **`ReadParams`:** Retrieves the `paramtools.Params` for a given set of trace
IDs.
### Submodule: `sqltracestore`
This submodule provides the SQL-based implementation of the `TraceStore`,
`MetadataStore`, and `TraceParamStore` interfaces.
- **`sqltracestore.go`:** Implements the `TraceStore` interface.
- **Schema:** It relies on a specific SQL schema (defined conceptually in
the package documentation and concretely in
`sqltracestore/schema/schema.go`) involving tables like `TraceValues`
(for actual metric values), `Postings` (the inverted index), `ParamSets`
(per-tile parameter information), and `SourceFiles`.
- **Writing Data:** When `WriteTraces` is called, it performs several
actions:
* Updates the `SourceFiles` table with the new source filename if it's not
already present.
* Updates the `ParamSets` table for the current tile with any new
key-value pairs from the incoming traces. This uses a cache to avoid
redundant writes.
* For each incoming trace: _ Calculates its MD5 hash (trace ID). _ Inserts
the value into the `TraceValues` table (or `TraceValues2` for
`WriteTraces2`). _ If the trace ID and its key-value pairs are not
already in the `Postings` table for the current tile (checked via
cache), it inserts them. _ Stores the mapping of the trace ID to its
`paramtools.Params` in the `TraceParams` table via the
`TraceParamStore`. All these writes are typically batched and
parallelized for efficiency.
- **Querying Data (`QueryTracesIDOnly`):**
* Retrieves the `ParamSet` for the target tile.
* Generates a query plan based on the input `query.Query` and the tile's
`ParamSet`.
* **Optimization (`restrictByCounting`):** It attempts to optimize the
query by first running `COUNT(*)` queries for each part of the query
plan. The part of the plan that matches the fewest traces (below a
threshold) is then used to fetch its corresponding trace IDs. These IDs
are then used to construct a `restrictClause` (e.g., `AND trace_id IN
(...)`) that is appended to the queries for the other parts of the plan.
This significantly speeds up queries where one filter is much more
selective than others.
* For each part of the query plan (each key and its OR'd values), it
executes an SQL query against the `Postings` table (using the
`restrictClause` if applicable) to get a stream of matching
`traceIDForSQL`.
* The streams of `traceIDForSQL` from each part of the plan are then
intersected (using `newIntersect`) to find the trace IDs that satisfy
all AND conditions of the query.
* These resulting trace IDs are then passed to the `TraceParamStore` to
fetch their full `paramtools.Params`.
- **Reading Data (`QueryTraces`, `ReadTraces`):** Once the trace IDs (and
thus their full names) are known (either from `QueryTracesIDOnly` or
directly provided), it queries the `TraceValues` table to fetch the
actual floating-point values for those traces within the specified
commit range or tile. It also fetches commit information from the
`Commits` table.
- **Follower Reads:** Supports `enableFollowerReads` configuration, which
adds `AS OF SYSTEM TIME '-5s'` to certain read queries, allowing them to
potentially hit read replicas and reduce load on the primary, at the
cost of slightly stale data.
- **Dialect Specificity:** It has distinct SQL templates and statement
strings for CockroachDB (default) and Spanner (`spanner.go`) to account
for syntax differences or performance characteristics (e.g., `UPSERT`
vs. `ON CONFLICT`).
- **`sqlmetadatastore.go`:** Implements the `MetadataStore` interface. It uses
an `Metadata` SQL table that links a `source_file_id` (from `SourceFiles`)
to a JSONB column storing the metadata map.
- **`sqltraceparamstore.go`:** Implements the `TraceParamStore` interface. It
uses a `TraceParams` SQL table that stores `trace_id` (bytes) and their
corresponding `params` (JSONB). Writes are chunked and can be parallelized.
- **`intersect.go`:** Provides helper functions (`newIntersect`,
`newIntersect2`) to compute the intersection of multiple sorted channels of
`traceIDForSQL`. This is crucial for implementing the AND logic in
`QueryTracesIDOnly`. It builds a binary tree of `newIntersect2` operations
for efficiency, avoiding slower reflection-based approaches.
- **`schema/schema.go`:** Defines Go structs that mirror the SQL table
schemas. This is used for documentation and potentially could be used with
ORM-like tools if needed, though the current implementation uses direct SQL
templating.
- `TraceValuesSchema`: Stores individual data points (value, commit,
source file) keyed by trace ID.
- `TraceValues2Schema`: An alternative/extended schema for trace values,
potentially denormalizing common parameters like `benchmark`, `bot`,
`test`, etc., for direct querying.
- `SourceFilesSchema`: Maps source file names to integer IDs.
- `ParamSetsSchema`: Stores the unique key-value pairs present in each
tile.
- `PostingsSchema`: The inverted index, mapping (tile, key-value) to trace
IDs.
- `MetadataSchema`: Stores JSON metadata for source files.
- `TraceParamsSchema`: Maps trace IDs (MD5 hashes) to their full
`paramtools.Params` (stored as JSON).
- **`spanner.go`:** Contains SQL templates and specific configurations (like
parallel pool sizes for writes) tailored for Google Cloud Spanner.
### Submodule: `mocks`
- **`TraceStore.go`:** Provides a mock implementation of the `TraceStore`
interface, generated by the `mockery` tool. This is essential for unit
testing components that depend on `TraceStore` without needing a full
database setup.
## Key Workflows
### Writing Traces
```
Caller (e.g., ingester) -> TraceStore.WriteTraces(ctx, commitNumber, params[], values[], paramset, sourceFile, timestamp)
|
`-> SQLTraceStore.WriteTraces
|
| 1. Tile Calculation: tileNumber = TileNumber(commitNumber)
|
| 2. Source File ID:
| `-> updateSourceFile(ctx, sourceFile) -> sourceFileID
| (Queries SourceFiles table, inserts if not exists)
|
| 3. ParamSet Update (for the tile):
| For each key, value in paramset:
| If not in cache(tileNumber, key, value):
| Add to batch for ParamSets table insertion
| Execute batch insert into ParamSets, update cache
|
| 4. For each trace (params[i], values[i]):
| | a. Trace ID Calculation: traceID_md5_hex = md5(query.MakeKey(params[i]))
| |
| | b. Store Trace Params:
| | `-> TraceParamStore.WriteTraceParams(ctx, {traceID_md5_hex: params[i]})
| | (Inserts into TraceParams table if not exists)
| |
| | c. Add to TraceValues Batch: (traceID_md5_hex, commitNumber, values[i], sourceFileID)
| |
| | d. Postings Update (for the tile):
| | If not in cache(tileNumber, traceID_md5_hex): // Marks this whole trace as processed for postings
| | For each key, value in params[i]:
| | Add to batch for Postings table: (tileNumber, "key=value", traceID_md5_hex)
|
| 5. Execute batch insert into TraceValues (or TraceValues2)
|
| 6. Execute batch insert into Postings, update postings cache
```
### Querying for Trace IDs (`QueryTracesIDOnly`)
```
Caller -> TraceStore.QueryTracesIDOnly(ctx, tileNumber, query)
|
`-> SQLTraceStore.QueryTracesIDOnly
|
| 1. Get ParamSet for tile:
| `-> GetParamSet(ctx, tileNumber) -> tileParamSet
| (Checks OPS cache, falls back to querying ParamSets table)
|
| 2. Generate Query Plan: plan = query.QueryPlan(tileParamSet)
| (If plan is empty or invalid for tile, return empty channel)
|
| 3. Optimization (restrictByCounting):
| | For each part of 'plan' (key, or_values[]):
| | `-> DB: COUNT(*) FROM Postings WHERE tile_number=... AND key_value IN (...) LIMIT threshold
| | Find the plan part (minKey, minValues) with the smallest count (if count < threshold).
| | If any count is 0, plan is skippable.
| | If minKey found:
| | `-> DB: SELECT trace_id FROM Postings WHERE tile_number=... AND key_value IN (minValues)
| | `-> restrictClause = "AND trace_id IN (result_ids...)"
|
| 4. Execute Query for each plan part (concurrently):
| For each key, values[] in 'plan' (excluding minKey if restrictClause is used):
| `-> DB: SELECT trace_id FROM Postings
| WHERE tile_number=tileNumber AND key_value IN ("key=value1", "key=value2"...)
| [restrictClause]
| ORDER BY trace_id
| -> channel_for_key_N (stream of traceIDForSQL)
|
| 5. Intersect Results:
| `-> newIntersect(ctx, [channel_for_key_1, channel_for_key_2,...]) -> finalTraceIDsChannel (stream of unique traceIDForSQL)
|
| 6. Fetch Full Params (concurrently, in chunks):
| For each batch of unique traceIDForSQL from finalTraceIDsChannel:
| `-> TraceParamStore.ReadParams(ctx, batch_of_ids) -> map[traceID]Params
| For each Params in map:
| Send Params to output channel
|
`-> Returns output channel of paramtools.Params
```
This structured approach, combining interfaces with a robust SQL implementation,
allows `tracestore` to serve as a reliable and performant foundation for Perf's
data storage needs.
# Module: /go/tracing
## Tracing Module Documentation
**High-Level Overview**
The `/go/tracing` module is responsible for initializing and configuring tracing
capabilities within the Perf application. It leverages the OpenCensus library to
provide distributed tracing, allowing developers to understand the flow of
requests across different services and components. This is crucial for debugging
performance issues, identifying bottlenecks, and gaining insights into the
application's behavior in a distributed environment.
**Design Decisions and Implementation Choices**
The core design principle behind this module is to centralize tracing
initialization. This ensures consistency in how tracing is set up across
different parts of the application.
- **Conditional Initialization:** The `Init` function provides different
initialization paths based on whether the application is running in a
`local` development environment or a deployed environment.
- **Local Environment:** In a local setup, `loggingtracer.Initialize()` is
called. This likely configures a simpler, console-based tracer. The
rationale is that in local development, detailed, distributed tracing
might be overkill, and logging traces to the console is often sufficient
for debugging.
- **Deployed Environment:** For deployed instances, the
`tracing.Initialize` function from the shared
`go.skia.org/infra/go/tracing` library is used. This enables more
sophisticated tracing, likely integrating with a backend tracing system
like Jaeger or Stackdriver Trace.
- **Configuration-Driven Sampling:** The `cfg.TraceSampleProportion` (of type
`config.InstanceConfig`) determines the sampling rate for traces. This
allows administrators to control the volume of trace data generated,
balancing the need for detailed information with the cost and overhead of
storing and processing traces. A value of `0.0` would likely disable
tracing, while `1.0` would trace every request.
- **Automatic Project ID Detection:** The `autoDetectProjectID` constant being
an empty string suggests that the underlying `tracing.Initialize` function
is capable of automatically determining the Google Cloud Project ID when
running in a GCP environment. This simplifies configuration as the project
ID doesn't need to be explicitly passed.
- **Metadata Enrichment:** The `map[string]interface{}` passed to
`tracing.Initialize` includes:
- `podName`: This value is retrieved from the `MY_POD_NAME` environment
variable. This is a common practice in Kubernetes environments to
identify the specific pod generating the trace, which is invaluable for
pinpointing issues.
- `instance`: This is derived from `cfg.InstanceName`. This helps
differentiate traces originating from different Perf instances (e.g.,
"perf-prod", "perf-staging").
**Responsibilities and Key Components/Files**
- **`tracing.go`:** This is the sole file in this module and contains the
`Init` function.
- **`Init(local bool, cfg *config.InstanceConfig) error` function:**
- **Responsibility:** To initialize the tracing system for the
application. It acts as the single entry point for tracing setup.
- **How it works:** 1. It takes a `local` boolean flag and an `InstanceConfig` pointer as
input. 2. If `local` is `true`, it calls `loggingtracer.Initialize()`. This
indicates a preference for a simpler, possibly console-based,
tracing mechanism for local development. `local=true ---->
loggingtracer.Initialize()` 3. If `local` is `false`, it proceeds to initialize tracing for a
deployed environment. - It retrieves the `TraceSampleProportion` from the `cfg`. - It retrieves the `InstanceName` from `cfg` to be used as an
attribute. - It calls `tracing.Initialize` from the shared
`go.skia.org/infra/go/tracing` library. - It passes the sampling proportion, `autoDetectProjectID` (an
empty string, relying on automatic detection), and a map of
attributes (`podName` from the environment and `instance` from
the config). `local=false | V Read cfg.TraceSampleProportion
Read cfg.InstanceName Read os.Getenv("MY_POD_NAME") | V
tracing.Initialize(sample_proportion, "", {podName, instance})`
- **Why this approach:**
- Centralizes tracing setup, making it easier to manage and modify.
- Provides a clear distinction between local and deployed tracing
configurations, catering to different needs.
- Leverages shared tracing libraries (`go.skia.org/infra/go/tracing`)
for common functionality, promoting code reuse.
- **Dependencies:**
- `//go/tracing` (likely `go.skia.org/infra/go/tracing`): This is the core
shared tracing library providing the `Initialize` function for robust,
distributed tracing. It handles the actual setup of exporters (e.g., to
Stackdriver, Jaeger) and samplers.
- `//go/tracing/loggingtracer`: This dependency provides a simpler tracer
implementation, probably for logging traces to standard output, suitable
for local development environments where a full-fledged tracing backend
might not be available or necessary.
- `//perf/go/config`: This module provides the `InstanceConfig` struct,
which contains application-specific configuration, including the
`TraceSampleProportion` and `InstanceName` used by the tracing
initialization. This decouples tracing configuration from the tracing
logic itself.
**Key Workflows/Processes**
**Tracing Initialization Workflow:**
```
Application Startup
|
V
Call perf/go/tracing.Init(isLocal, instanceConfig)
|
+---- isLocal is true? ----> Call loggingtracer.Initialize() --> Tracing active (console/simple)
| |
| V
| Application proceeds
|
+---- isLocal is false? ---> Read TraceSampleProportion from instanceConfig
Read InstanceName from instanceConfig
Read MY_POD_NAME environment variable
|
V
Call shared go.skia.org/infra/go/tracing.Initialize(...)
with sampling rate and attributes (podName, instance)
|
V
Tracing active (distributed, e.g., Stackdriver)
|
V
Application proceeds
```
This workflow illustrates how the `Init` function adapts the tracing setup based
on the execution context (local vs. deployed) and external configuration. The
goal is to provide appropriate tracing capabilities with minimal boilerplate in
the rest of the application.
# Module: /go/trybot
The `/go/trybot` module is responsible for managing performance data generated
by trybots. Trybots are automated systems that run tests on code changes
(patches or changelists) before they are merged into the main codebase. This
module handles the ingestion, storage, and retrieval of these trybot results,
allowing developers and performance engineers to analyze the performance impact
of proposed code changes.
The core idea is to provide a way to compare the performance characteristics of
a pending change against the baseline performance of the current codebase. This
helps in identifying potential performance regressions or improvements early in
the development cycle.
## Key Components and Responsibilities
### `/go/trybot/trybot.go`
This file defines the central data structure `TryFile`.
- **`TryFile`**: This struct represents a single file containing trybot
results.
- `CL`: The identifier of the changelist (e.g., a Gerrit change ID). This
is crucial for associating results with a specific code change.
- `PatchNumber`: The specific patchset within the changelist. Code review
systems often allow multiple iterations (patchsets) for a single
changelist.
- `Filename`: The name of the file where the trybot results are stored,
often including a scheme like `gs://` indicating its location (e.g., in
Google Cloud Storage).
- `Timestamp`: When the result file was created. This is important for
tracking and ordering results.
### `/go/trybot/ingester`
This submodule is responsible for taking raw result files and transforming them
into the `TryFile` format that the rest of the system understands.
- **`/go/trybot/ingester/ingester.go`**: Defines the `Ingester` interface.
- **`Ingester` interface**: Specifies a contract for components that can
process incoming files (represented by `file.File`) and produce a stream
of `trybot.TryFile` objects. The `Start` method initiates this
processing, typically in a background goroutine. This design allows for
different sources or formats of trybot results to be plugged into the
system.
- **`/go/trybot/ingester/gerrit/gerrit.go`**: Provides a concrete
implementation of the `Ingester` interface, specifically for handling trybot
results originating from Gerrit code reviews.
- **`Gerrit` struct**: Implements `ingester.Ingester`. It uses a
`parser.Parser` (from `/perf/go/ingest/parser`) to understand the
content of the result files.
- **`New` function**: Constructor for the `Gerrit` ingester.
- **`Start` method**:
- It receives a channel of `file.File` objects.
- For each file, it attempts to parse it using `parser.ParseTryBot`. This
method extracts the changelist ID (`issue`) and patchset number.
- If parsing is successful, it converts the patchset string to an integer.
- A `trybot.TryFile` is created with the extracted CL, patch number,
filename, and creation timestamp.
- This `TryFile` is then sent to an output channel.
- It includes metrics (`parseCounter`, `parseFailCounter`) to track the
success and failure rates of parsing.
- The use of channels for input (`files`) and output (`ret`) facilitates
asynchronous processing, meaning the ingester can process files as they
become available without blocking other operations.
### `/go/trybot/store`
This submodule is responsible for persisting and retrieving `TryFile`
information and the associated performance measurements.
- **`/go/trybot/store/store.go`**: Defines the `TryBotStore` interface.
- **`TryBotStore` interface**: This interface outlines the contract for
storing and retrieving trybot data. This abstraction allows different
database backends (e.g., CockroachDB, in-memory stores for testing) to
be used.
- `Write(ctx context.Context, tryFile trybot.TryFile) error`: Persists a
`TryFile` and its associated data.
- `List(ctx context.Context, since time.Time) ([]ListResult, error)`:
Retrieves a list of unique changelist/patchset combinations that have
been processed since a given time. `ListResult` contains the `CL` (as a
string) and `Patch` number.
- `Get(ctx context.Context, cl types.CL, patch int) ([]GetResult, error)`:
Fetches all performance results for a specific changelist and patch
number. `GetResult` contains the `TraceName` (a unique identifier for a
specific metric and parameter combination) and its measured `Value`.
- **`/go/trybot/store/mocks/TryBotStore.go`**: Provides a mock implementation
of `TryBotStore`, generated by the `mockery` tool. This is essential for
unit testing components that depend on `TryBotStore` without needing a real
database.
### `/go/trybot/results`
This submodule focuses on loading and preparing trybot results for analysis and
presentation, often by comparing them to baseline data.
- **`/go/trybot/results/results.go`**: Defines the structures for requesting
and representing analyzed trybot results.
- **`Kind` type (`TryBot`, `Commit`)**: Distinguishes whether the analysis
request is for trybot data (pre-submit) or for data from an already
landed commit (post-submit). This allows the system to handle both
scenarios.
- **`TryBotRequest` struct**: Represents a request from a client (e.g., a
UI) to get analyzed performance data. It includes the `Kind`, `CL` and
`PatchNumber` (for `TryBot` kind), `CommitNumber` and `Query` (for
`Commit` kind). The `Query` is used to filter the traces to be analyzed
when looking at landed commits.
- **`TryBotResult` struct**: Contains the analysis results for a single
trace.
- `Params`: The key-value parameters that uniquely identify the trace.
- `Median`, `Lower`, `Upper`, `StdDevRatio`: Statistical measures derived
from the trace data. `StdDevRatio` is a key metric indicating how much a
new value deviates from the historical distribution, helping to flag
regressions or improvements.
- `Values`: A slice of recent historical values for the trace, with the
last value being either the trybot result or the value at the specified
commit.
- **`TryBotResponse` struct**: The overall response to a `TryBotRequest`.
- `Header`: Column headers for the data, typically representing commit
information.
- `Results`: A slice of `TryBotResult` for each analyzed trace.
- `ParamSet`: A collection of all unique parameter key-value pairs present
in the results, useful for filtering in a UI.
- **`Loader` interface**: Defines a contract for components that can take
a `TryBotRequest` and produce a `TryBotResponse`. This involves fetching
relevant data, performing statistical analysis, and formatting it.
- **`/go/trybot/results/dfloader/dfloader.go`**: Implements the
`results.Loader` interface using a `dataframe.DataFrameBuilder`. DataFrames
are a common way to represent tabular data for analysis.
- **`Loader` struct**: Holds references to a `dataframe.DataFrameBuilder`
(for constructing DataFrames from trace data), a `store.TryBotStore`
(for fetching trybot-specific measurements), and `perfgit.Git` (for
resolving commit information).
- **`TraceHistorySize` constant**: Defines how many historical data points
to load for each trace for comparison.
- **`New` function**: Constructor for the `Loader`.
- **`Load` method**: This is the core logic for generating the
`TryBotResponse`.
- **Workflow**:
1. `Determine Timestamp`: If the request is for a `Commit`, it fetches
the commit details (including its timestamp) using `perfgit.Git`.
Otherwise, it uses the current time.
2. `Parse Query`: If the request kind is `Commit`, the provided `Query`
string is parsed. An empty query for a `Commit` request is an error.
3. `Fetch Baseline Data (DataFrame)`:
- If `Kind` is `Commit`: It uses `dfb.NewNFromQuery` to load a
DataFrame containing the last `TraceHistorySize+1` data points
for traces matching the query, up to the commit's timestamp. The
"+1" is to hold the value at the commit itself or to be a
placeholder.
- If `Kind` is `TryBot`: a. It first calls `store.Get` to retrieve
the specific trybot measurements for the given `CL` and
`PatchNumber`. b. It then extracts the trace names from these
trybot results. c. It calls `dfb.NewNFromKeys` to load a
DataFrame with `TraceHistorySize+1` historical data points for
these specific trace names. d. Crucially, it then _replaces_ the
last value in each trace within the DataFrame with the
corresponding value obtained from the `store.Get` call. This
effectively injects the trybot's measurement into the historical
context for comparison. e. If a trybot result exists for a trace
that has no historical data in the DataFrame, that trace is
removed from the analysis, and `rebuildParamSet` is flagged.
4. `Prepare Response Header`: The DataFrame's header (commit
information) is used for the response. If it's a `TryBot` request,
the last header entry (representing the trybot data point) has its
`Offset` set to `types.BadCommitNumber` to indicate it's not a
landed commit.
5. `Calculate Statistics`: For each trace in the DataFrame:
- The trace name (key) is parsed into `paramtools.Params`.
- `vec32.StdDevRatio` is called with the trace values (which now
includes the trybot value at the end if applicable). This
function calculates the median, lower/upper bounds, and the
standard deviation ratio.
- A `results.TryBotResult` is created.
- If `StdDevRatio` calculation fails (e.g., insufficient data),
the trace is skipped, and `rebuildParamSet` is flagged.
6. `Sort Results`: The `TryBotResult` slice is sorted by `StdDevRatio`
in descending order. This prioritizes potential regressions (high
positive ratio) and significant improvements (high negative ratio).
7. `Normalize ParamSet`: If `rebuildParamSet` is true (due to missing
traces or parsing errors), the `ParamSet` for the response is
regenerated from the final set of `TryBotResult`s.
8. The `results.TryBotResponse` is assembled and returned.
- This process allows a direct comparison of a tryjob's performance
numbers against the recent history of the same metrics on the main
branch.
### `/go/trybot/samplesloader`
This submodule deals with loading raw sample data from trybot result files.
Sometimes, instead of just a single aggregated value, trybots might output
multiple raw measurements (samples) for a metric.
- **`/go/trybot/samplesloader/samplesloader.go`**: Defines the `SamplesLoader`
interface.
- **`SamplesLoader` interface**: Specifies a method `Load(ctx
context.Context, filename string) (parser.SamplesSet, error)` that takes
a filename (URL to the result file) and returns a `parser.SamplesSet`. A
`SamplesSet` is a map where keys are trace identifiers and values are
`parser.Samples` (which include parameters and a slice of raw float64
sample values).
- **`/go/trybot/samplesloader/gcssamplesloader/gcssamplesloader.go`**:
Implements `SamplesLoader` for files stored in Google Cloud Storage (GCS).
- **`loader` struct**: Holds a `gcs.GCSClient` for interacting with GCS
and a `parser.Parser`.
- **`New` function**: Constructor for the GCS samples loader.
- **`Load` method**:
- Parses the input `filename` (which is a GCS URL like
`gs://bucket/path/file.json`) to extract the bucket and path.
- Uses the `storageClient` to read the content of the file from GCS.
- Parses the file content using `format.ParseLegacyFormat` (assuming a
specific JSON structure for these sample files).
- Converts the parsed data into a `parser.SamplesSet` using
`parser.GetSamplesFromLegacyFormat`.
- This component is essential when detailed analysis of raw samples is
needed, rather than just aggregated metrics.
## Overall Workflow (Ingestion and Analysis)
A simplified workflow could look like this:
1. **File Arrival**: A new trybot result file appears (e.g., uploaded to GCS).
```
New File (e.g., in GCS)
```
2. **Ingestion**: An `ingester.Ingester` (like `ingester.gerrit.Gerrit`)
detects and processes this file.
```
File --> [Gerrit Ingester] --parses--> trybot.TryFile{CL, PatchNum, Filename, Timestamp}
```
3. **Storage**: The `TryFile` metadata and potentially the parsed values are
written to the `store.TryBotStore`.
```
trybot.TryFile --> [TryBotStore.Write] --> Database
```
(The actual performance values might be stored alongside the `TryFile`
metadata or linked via the `Filename` if they are in a separate detailed
file).
4. **Analysis Request**: A user or an automated system requests analysis for a
particular CL/Patch via a UI or API, sending a `results.TryBotRequest`.
```
UI/API --sends--> results.TryBotRequest{Kind=TryBot, CL="123", PatchNumber=1}
```
5. **Data Loading and Comparison**: The `results.dfloader.Loader` handles this
request. `results.TryBotRequest | v [dfloader.Loader.Load] | +--(A)-->
[TryBotStore.Get(CL, PatchNum)] --> Trybot specific values (Value_T) for
traces T1, T2... | +--(B)-->
[DataFrameBuilder.NewNFromKeys(traceNames=[T1,T2...])] --> Historical data
for T1, T2... | (e.g., [V1_hist1, V1_hist2, ..., V1_histN, _placeholder_]) |
+--(C)--> Combine: Replace _placeholder_ with Value_T | (e.g., for T1:
[V1_hist1, V1_hist2, ..., V1_histN, V1_T]) | +--(D)--> Calculate
StdDevRatio, Median, etc. for each trace | +--(E)--> Sort results | v
results.TryBotResponse (sent back to UI/API)`
This module is crucial for proactive performance monitoring, enabling teams to
catch performance regressions before they land in the main codebase, by
systematically ingesting, storing, and analyzing the performance data generated
during the pre-submit testing phase. The use of interfaces for storage
(`TryBotStore`), ingestion (`Ingester`), and results loading (`results.Loader`)
makes the system flexible and extensible.
# Module: /go/ts
The `go/ts` module serves as a utility to generate TypeScript definition files
from Go structs. This is crucial for maintaining type safety and consistency
between the Go backend and the TypeScript frontend, particularly when dealing
with JSON data structures that are exchanged between them. The core problem this
module solves is bridging the gap between Go's static typing and TypeScript's
type system for data interchange, ensuring that changes in Go struct definitions
are automatically reflected in the frontend's TypeScript types.
The primary component is the `main.go` file. Its responsibility is to:
1. **Parse command-line arguments**: It accepts an output path (`-o`) where the
generated TypeScript file will be written.
2. **Instantiate a `go2ts.Generator`**: This is the core engine from the
`go/go2ts` library responsible for the Go-to-TypeScript conversion.
3. **Configure the generator**:
- `GenerateNominalTypes = true`: This setting likely ensures that the
generated TypeScript types are nominal (i.e., types are distinct based
on their name, not just their structure), which can provide stronger
type checking.
- `AddIgnoreNil`: This is used for specific Go types like
`paramtools.Params`, `paramtools.ParamSet`,
`paramtools.ReadOnlyParamSet`, and `types.TraceSet`. This suggests that
`nil` values for these types in Go should likely be treated as optional
or nullable fields in TypeScript, or perhaps excluded from the generated
types if they are always expected to be non-nil when serialized.
4. **Register Go structs and unions for conversion**:
- The code extensively uses `generator.AddMultiple` to register a wide
array of Go structs from various `perf` submodules (e.g., `alerts`,
`chromeperf`, `clustering2`, `frontend/api`, `regression`). These are
the structs that are serialized to JSON and consumed by the frontend. By
registering them, the generator knows which Go types to convert into
corresponding TypeScript interfaces or types.
- The `addMultipleUnions` helper function and
`generator.AddUnionToNamespace` are used to register Go union types
(often represented as a collection of constants or an interface
implemented by several types). This ensures that TypeScript enums or
union types are generated, reflecting the possible values or types a Go
field can hold. The `typeName` argument in `unionAndName` and the
namespace argument in `AddUnionToNamespace` control how these unions are
named and organized in the generated TypeScript.
- `generator.AddToNamespace` is used to group related types under a
specific namespace in the generated TypeScript, improving organization
(e.g., `pivot.Request{}` is added to the `pivot` namespace).
5. **Render the TypeScript output**: Finally, `generator.Render(w)` writes the
generated TypeScript definitions to the specified output file.
The design decision to use a dedicated program for this generation task, rather
than manual synchronization or other methods, highlights the importance of
automation and reducing the likelihood of human error in keeping backend and
frontend types aligned. The reliance on the `go/go2ts` library centralizes the
core conversion logic, making this module a consumer and orchestrator of that
library for the specific needs of the Skia Perf application.
A key workflow is triggered by the `//go:generate` directive at the top of
`main.go`: `//go:generate bazelisk run --config=mayberemote //:go -- run . -o
../../modules/json/index.ts`
This command, when `go generate` is run (typically as part of a build process),
executes the compiled `go/ts` program.
Workflow:
1. Developer modifies a Go struct in a `perf` submodule that is serialized to
JSON for the UI.
2. Developer (or an automated build step) runs `go generate` within the `go/ts`
module's directory (or a higher-level directory that includes it).
3. The `go:generate` directive executes the `main` function in `go/ts/main.go`.
4. `main.go` -> Uses `go2ts.Generator` -> Registers relevant Go structs and
unions.
5. `go2ts.Generator` -> Analyzes registered Go types -> Generates corresponding
TypeScript definitions.
6. `main.go` -> Writes the TypeScript definitions to
`../../modules/json/index.ts`.
7. The frontend can now import and use these up-to-date TypeScript types,
ensuring type safety when interacting with JSON data from the backend.
The choice of specific structs and unions registered in `main.go` reflects the
data contracts between the Perf backend and its frontend UI. Any Go struct that
is part of an API response or request payload handled by the frontend needs to
be included here.
# Module: /go/types
## Go Types Module
This module defines core data types used throughout the Perf application. These
types provide a standardized way to represent fundamental concepts related to
commits, performance data (traces), and alert configurations. The design
prioritizes clarity, type safety, and consistency across different parts of the
system.
### Key Concepts and Components:
#### Commit and Tile Numbering:
- **`CommitNumber` (`types.go`)**: Represents a unique, sequential identifier
for a commit within a repository.
- **Why**: To provide a simple, linear way to reference commits. It
assumes a straightforward, non-branching history for easier indexing and
retrieval of performance data associated with specific code changes. The
first commit in a repository is assigned `CommitNumber(0)`.
- **How**: Implemented as an `int32`. It includes an `Add` method for safe
offsetting and a `BadCommitNumber` constant (`-1`) to represent invalid
or non-existent commit numbers.
- **`CommitNumberSlice` (`types.go`)**: A utility type to enable sorting
of `CommitNumber` slices, which is useful for various data processing
and display tasks.
- **`TileNumber` (`types.go`)**: Represents an index for a "tile" in the
`TraceStore`. Performance data (traces) are often stored in chunks or tiles
for efficient storage and retrieval.
- **Why**: Tiling allows for optimized access to performance data,
especially for large datasets. Instead of loading entire traces, only
relevant tiles need to be accessed.
- **How**: Implemented as an `int32`. Functions like
`TileNumberFromCommitNumber` and `TileCommitRangeForTileNumber` manage
the mapping between commit numbers and tile numbers based on a
configurable `tileSize`. The `Prev()` method allows navigation to the
preceding tile, and `BadTileNumber` (`-1`) indicates an invalid tile.
**Workflow: Commit to Tile Mapping**
```
CommitNumber ----(tileSize)----> TileNumberFromCommitNumber() ----> TileNumber
|
V
TileCommitRangeForTileNumber() ----> (StartCommit, EndCommit)
```
#### Performance Data Representation:
- **`Trace` (`types.go`)**: Represents a sequence of performance measurements,
typically corresponding to a specific metric over a series of commits.
- **Why**: To provide a simple and efficient way to store and manipulate
time-series performance data.
- **How**: Implemented as a `[]float32`. The `NewTrace` function
initializes a trace of a given length with a special
`vec32.MISSING_DATA_SENTINEL` value, which is crucial for distinguishing
between actual zero values and missing data points. This leverages the
`go.skia.org/infra/go/vec32` package for optimized float32 vector
operations.
- **`TraceSet` (`types.go`)**: A collection of `Trace`s, keyed by a string
identifier (trace ID).
- **Why**: To group related traces, often corresponding to different
metrics measured for the same test or configuration.
- **How**: Implemented as a `map[string]Trace`.
#### Regression Detection and Alerting:
- **`RegressionDetectionGrouping` (`types.go`)**: An enumeration defining how
traces are grouped for regression detection.
- **Why**: Different grouping strategies can be more effective for
different types of performance data. This allows flexibility in the
regression detection process.
- **How**: Defined as a string type with constants like `KMeansGrouping`
(cluster traces by shape) and `StepFitGrouping` (analyze each trace
individually for steps). `ToClusterAlgo` provides a safe way to convert
strings to this type.
- **`StepDetection` (`types.go`)**: An enumeration defining the algorithms
used to detect significant steps (changes) in individual traces or cluster
centroids.
- **Why**: Various statistical methods can be employed to identify
meaningful performance regressions or improvements. This allows
selection of the most appropriate method for the data characteristics.
- **How**: Defined as a string type with constants representing different
detection methods, such as `OriginalStep`, `AbsoluteStep`,
`PercentStep`, `CohenStep`, and `MannWhitneyU`. `ToStepDetection`
ensures type-safe conversion from strings.
- **`AlertAction` (`types.go`)**: An enumeration defining the actions to be
taken when an anomaly (potential regression) is detected by an alert
configuration.
- **Why**: To allow configurable responses to detected anomalies, ranging
from no action to filing issues or triggering bisection jobs.
- **How**: Defined as a string type with constants like `NoAction`,
`FileIssue`, and `Bisection`.
- **`Domain` (`types.go`)**: Specifies the range of commits over which an
operation (like regression detection) should be performed.
- **Why**: To precisely define the scope of analysis.
- **How**: A struct containing either `N` (number of commits) and `End`
(timestamp for the end of the range) or an `Offset` (a specific commit
number).
- **`ProgressCallback` (`types.go`)**: A function type used to provide
feedback on the progress of long-running operations.
- **Why**: To enable user interfaces or logging systems to display the
status of tasks like regression detection.
- **How**: Defined as `func(message string)`.
- **`CL` (`types.go`)**: Represents a Change List identifier (e.g., a GitHub
Pull Request number).
- **Why**: To associate performance data or alerts with specific code
changes under review.
- **How**: Defined as a `string`.
- **`AnomalyDetectionNotifyType` (`types.go`)**: Defines the notification
mechanism for anomalies.
- **Why**: Allows flexibility in how users are informed about detected
performance issues.
- **How**: String type with constants `IssueNotify` (send to issue
tracker) and `NoneNotify` (no notification).
#### Miscellaneous:
- **`ProjectId` (`types.go`)**: Represents a project identifier.
- **Why**: Useful in multi-project environments to scope data or
configurations.
- **How**: Defined as a `string` with a predefined list `AllProjectIds`.
- **`AllMeasurementStats` (`types.go`)**: A list of valid statistical suffixes
that can be part of performance measurement keys (e.g., "avg", "max").
- **Why**: To ensure consistency and provide a reference for valid stat
types when parsing or generating metric keys.
- **How**: A `[]string` slice.
The unit tests in `types_test.go` focus on validating the logic of
`CommitNumber` arithmetic and the mapping between `CommitNumber` and
`TileNumber`, ensuring the core indexing mechanisms are correct.
# Module: /go/ui
The `/go/ui` module is responsible for handling frontend requests and preparing
data for display in the Perf UI. Its primary purpose is to bridge the gap
between user interactions on the frontend (e.g., selecting time ranges, defining
queries, or applying formulas) and the backend data sources and processing
logic.
This module is designed to be the central point for fetching and transforming
performance data into a format that can be readily consumed by the UI. It
orchestrates interactions with various other modules, such as those responsible
for accessing Git history (`/go/git`), building dataframes (`/go/dataframe`),
handling data shortcuts (`/go/shortcut`), and calculating derived metrics
(`/go/calc`).
The key rationale behind this module's existence is to encapsulate the
complexity of data retrieval and preparation, providing a clean and consistent
API for the frontend. This separation of concerns allows the frontend to focus
on presentation and user interaction, while the backend handles the intricacies
of data access and manipulation.
The main workflow involves receiving a `FrameRequest` from the frontend,
processing it to fetch and transform data, and then returning a `FrameResponse`
containing the prepared data and display instructions.
### Key Components and Files:
- **`/go/ui/frame/frame.go`**: This is the core file of the module.
- **Responsibilities**:
- Defines the structure of frontend requests (`FrameRequest`) and backend
responses (`FrameResponse`). `FrameRequest` captures user inputs like
time ranges, queries, formulas, and pivot table configurations.
`FrameResponse` packages the resulting data, along with display hints
and any relevant messages.
- Manages the processing of `FrameRequest` objects. This involves
dispatching tasks to other modules based on the request parameters. For
example, it uses the `dataframe.DataFrameBuilder` to fetch data based on
queries or trace keys, the `calc` module to evaluate formulas, and the
`pivot` module to restructure data for pivot tables.
- Handles different types of requests, such as those based on a specific
time range (`REQUEST_TIME_RANGE`) or a fixed number of recent commits
(`REQUEST_COMPACT`).
- Orchestrates the retrieval of anomalies from an `anomalies.Store` and
associates them with the relevant traces in the response. This can be
done based on time ranges or commit revision numbers.
- Includes logic to determine the appropriate display mode for the
frontend (e.g., plot, pivot table, or just a query input).
- Implements safeguards like truncating the number of traces in the
response if it exceeds a predefined limit, to prevent overwhelming the
frontend or the network.
- Provides functionality to identify "SKP changes" (significant file
changes in the Git repository, historically related to Skia Picture
files) within the requested commit range, which can be highlighted in
the UI.
- **Design Choices & Implementation Details**:
- The `ProcessFrameRequest` function is the main entry point for handling
a request. It creates a `frameRequestProcess` struct to manage the state
of the request processing.
- The processing is broken down into distinct steps: handling queries,
formulas, and keys (shortcuts). Each step typically involves fetching
data and then joining it into a single `DataFrame`.
- Error handling is centralized in `reportError` to ensure consistent
logging and error propagation.
- Progress tracking is integrated via the `progress.Progress` interface,
allowing the frontend to display updates during long-running requests.
- The decision to support both `REQUEST_TIME_RANGE` and `REQUEST_COMPACT`
request types caters to different user needs: exploring specific
historical periods versus viewing the latest trends.
- The inclusion of anomaly data directly in the `FrameResponse` aims to
provide users with immediate context about significant performance
changes alongside the raw data. The system supports fetching anomalies
based on either time or revision ranges, offering flexibility depending
on how anomalies are tracked and stored.
- The `ResponseFromDataFrame` function acts as a final assembly step,
taking a processed `DataFrame` and enriching it with SKP change
information, display mode, and handling potential truncation.
A typical request processing flow might look like this:
```
Frontend Request (FrameRequest)
|
V
ProcessFrameRequest() in frame.go
|
+------------------------------+-----------------------------+--------------------------+
| | | |
V V V V
(If Queries exist) (If Formulas exist) (If Keys exist) (If Pivot requested)
doSearch() doCalc() doKeys() pivot.Pivot()
| | | |
V V V V
dfBuilder.NewFromQuery...() calc.Eval() with dfBuilder.NewFromKeys...() Restructure DataFrame
rowsFromQuery/Shortcut()
| | | |
+------------------------------+-----------------------------+--------------------------+
|
V
DataFrame construction and merging
|
V
(If anomaly search enabled)
addTimeBasedAnomaliesToResponse() OR addRevisionBasedAnomaliesToResponse()
|
V
anomalyStore.GetAnomalies...()
|
V
ResponseFromDataFrame()
|
V
getSkps() (Find significant file changes)
|
V
Truncate response if too large
|
V
Set DisplayMode
|
V
Backend Response (FrameResponse)
|
V
Frontend UI
```
# Module: /go/urlprovider
## URL Provider Module
The `urlprovider` module is designed to generate URLs for various pages within
the Perf application. This centralized approach ensures consistency in URL
generation across different parts of the application and simplifies the process
of linking to specific views with pre-filled parameters. The key motivation is
to abstract away the complexities of URL query parameter construction and to
provide a simple interface for generating links to common Perf views like
"Explore", "MultiGraph", and "GroupReport".
The core component of this module is the `URLProvider` struct. An instance of
`URLProvider` is initialized with a `perfgit.Git` object. This dependency is
crucial because some URL generation, particularly for time-range-based views,
requires fetching commit information (specifically timestamps) from the Git
repository to define the "begin" and "end" parameters of the URL.
### Key Responsibilities and Components:
- **`urlprovider.go`**: This file contains the primary logic for the URL
provider.
- **`URLProvider` struct**: Holds a reference to a `perfgit.Git` instance.
This allows it to interact with the Git repository to fetch commit
details needed for constructing time-based query parameters.
- **`New(perfgit perfgit.Git) *URLProvider`**: This constructor function
creates and returns a new instance of `URLProvider`. It takes a
`perfgit.Git` object as an argument, which is stored within the struct.
This design choice makes the `URLProvider` stateful with respect to its
Git interaction capabilities.
- **`Explore(...) string`**: This method generates a URL for the "Explore"
page (`/e/`).
- **Why**: The "Explore" page is used for in-depth analysis of performance
data based on various parameters and a specific commit range.
- **How**:
1. It calls `getQueryParams` to construct the common query parameters
like `begin`, `end`, and `disable_filter_parent_traces`. The `begin`
and `end` timestamps are derived from the provided
`startCommitNumber` and `endCommitNumber` by querying the `perfGit`
instance. The `end` timestamp is intentionally shifted forward by
one day to ensure that anomalies at the very end of the selected
range are visible on the graph.
2. It then serializes the `parameters` map (which contains key-value
pairs for filtering traces) into a URL-encoded query string using
`GetQueryStringFromParameters`. This encoded string is assigned to
the `queries` parameter of the final URL.
3. Additional `queryParams` (passed as `url.Values`) can be merged into
the URL.
4. The final URL is constructed by appending the encoded query
parameters to the base path `/e/?`.
- **`MultiGraph(...) string`**: This method generates a URL for the
"MultiGraph" page (`/m/`).
- **Why**: The "MultiGraph" page allows users to view multiple graphs
simultaneously, often identified by a shortcut ID.
- **How**:
1. Similar to `Explore`, it uses `getQueryParams` to build the common
time-range and filtering parameters.
2. It specifically adds the `shortcut` parameter with the provided
`shortcutId`.
3. Additional `queryParams` can also be merged.
4. The final URL is constructed by appending the encoded query
parameters to the base path `/m/?`.
- **`GroupReport(param string, value string) string`**: This _static_
function generates a URL for the "Group Report" page (`/u/`).
- **Why**: The "Group Report" page displays information related to groups
of anomalies, specific anomalies, bugs, or revisions. Unlike `Explore`
and `MultiGraph`, it does not inherently depend on a time range derived
from commits, nor does it require complex parameter encoding.
- **How**:
1. It validates the input `param` against a predefined list of allowed
parameters (`anomalyGroupID`, `anomalyIDs`, `bugID`, `rev`, `sid`).
This is a security and correctness measure to prevent arbitrary
parameters from being injected.
2. If the `param` is valid, it constructs a simple URL with the
provided `param` and `value`.
3. It returns an empty string if the `param` is invalid.
4. This function is static (not a method on `URLProvider`) because it
doesn't need access to the `perfGit` instance or any other state
within `URLProvider`. This simplifies its usage for cases where only
a group report URL is needed without initializing a full
`URLProvider`.
- **`getQueryParams(...) url.Values`**: This private helper method is
responsible for creating the base set of query parameters common to
`Explore` and `MultiGraph`.
- **How**:
1. It calls `fillCommonParams` to set the `begin` and `end` parameters
based on commit numbers.
2. It conditionally adds `disable_filter_parent_traces=true` if
requested.
3. It merges any additional `queryParams` provided by the caller.
- **`fillCommonParams(...)`**: This private helper populates the `begin`
and `end` timestamp parameters in the provided `url.Values`.
- **How**: It uses the `perfGit` instance to look up the `Commit` objects
corresponding to the `startCommitNumber` and `endCommitNumber`. The
timestamps from these commits are then used. As mentioned earlier, the
`end` timestamp is adjusted by adding one day. This separation of
concerns keeps the main `Explore` and `MultiGraph` methods cleaner.
- **`GetQueryStringFromParameters(parameters map[string][]string)
string`**: This helper method converts a map of string slices
(representing query parameters where a single key can have multiple
values) into a URL-encoded query string.
### Key Workflows:
1. **Generating an "Explore" Page URL:**
```
Caller provides: context, startCommitNum, endCommitNum, filterParams, disableFilterParent, otherQueryParams
|
v
URLProvider.Explore()
|
+-------------------------------------+
| |
v v
getQueryParams() GetQueryStringFromParameters(filterParams)
| |
+--> fillCommonParams() +--> Encode filterParams
| | |
| +--> perfGit.CommitFromCommitNumber() -> Get start timestamp
| | |
| +--> perfGit.CommitFromCommitNumber() -> Get end timestamp, add 1 day
| | |
| +----------------------------------------+
| |
| v
| Combine begin, end, disableFilterParent, otherQueryParams into url.Values
| |
+-------------------------------------+
|
v
Combine base URL ("/e/?"), common query params, and encoded filterParams string
|
v
Return final URL string
```
2. **Generating a "MultiGraph" Page URL:**
```
Caller provides: context, startCommitNum, endCommitNum, shortcutId, disableFilterParent, otherQueryParams
|
v
URLProvider.MultiGraph()
|
v
getQueryParams()
|
+--> fillCommonParams()
| |
| +--> perfGit.CommitFromCommitNumber() -> Get start timestamp
| |
| +--> perfGit.CommitFromCommitNumber() -> Get end timestamp, add 1 day
| |
| +----------------------------------------+
| |
| v
| Combine begin, end, disableFilterParent, otherQueryParams into url.Values
|
v
Add "shortcut=shortcutId" to url.Values
|
v
Combine base URL ("/m/?") and all query params
|
v
Return final URL string
```
3. **Generating a "Group Report" Page URL:**
```
Caller provides: paramName, paramValue
|
v
urlprovider.GroupReport()
|
v
Validate paramName against allowed list
|
+-- (Valid) --> Construct URL: "/u/?" + paramName + "=" + paramValue
| |
| v
| Return URL string
|
+-- (Invalid) --> Return "" (empty string)
```
The design emphasizes reusability of common parameter generation logic
(`getQueryParams`, `fillCommonParams`) and clear separation of concerns for
generating URLs for different Perf pages. The dependency on `perfgit.Git` is
explicitly managed through the `URLProvider` struct, making it clear when Git
interaction is necessary.
# Module: /go/userissue
The `userissue` module is responsible for managing the association between
specific data points in Perf (identified by a trace key and a commit position)
and Buganizer issues. This allows users to flag specific performance regressions
or anomalies and link them directly to a tracking issue.
The core of this module is the `Store` interface, which defines the contract for
persisting and retrieving these user-issue associations. The primary
implementation of this interface is `sqluserissuestore`, which leverages a SQL
database (specifically CockroachDB in this context) to store the data.
**Key Responsibilities and Components:**
- **`store.go`**: This file defines the central `UserIssue` struct and the
`Store` interface.
- **`UserIssue` struct**: Represents a single association. It contains:
- `UserId`: The email of the user who made the association.
- `TraceKey`: A string uniquely identifying a performance metric's trace
(e.g., ",arch=x86,config=Release,test=MyTest,").
- `CommitPosition`: An integer representing a specific point in the commit
history where the data point exists.
- `IssueId`: The numerical ID of the Buganizer issue.
- **`Store` interface**: This interface dictates the operations that any
backing store for user issues must support:
- `Save(ctx context.Context, req *UserIssue) error`: Persists a new
`UserIssue` association. The implementation must handle potential
conflicts, such as trying to save a duplicate entry (same trace key and
commit position).
- `Delete(ctx context.Context, traceKey string, commitPosition int64)
error`: Removes an existing user-issue association based on its unique
trace key and commit position. It should handle cases where the
specified association doesn't exist.
- `GetUserIssuesForTraceKeys(ctx context.Context, traceKeys []string,
startCommitPosition int64, endCommitPosition int64) ([]UserIssue,
error)`: Retrieves all `UserIssue` associations for a given set of trace
keys within a specified range of commit positions. This is crucial for
displaying these associations on performance graphs or reports.
- **`sqluserissuestore/sqluserissuestore.go`**: This is the SQL-backed
implementation of the `Store` interface.
- **Design Rationale**: Using a SQL database provides robust data
integrity, transactional guarantees, and the ability to perform complex
queries if needed in the future. CockroachDB is chosen for its
scalability and compatibility with PostgreSQL syntax.
- **Implementation Details**:
- It uses a `go.skia.org/infra/go/sql/pool` for managing database
connections.
- SQL statements are defined as constants and, in the case of
`listUserIssues`, use Go's `text/template` package to dynamically
construct the `IN` clause for multiple `traceKeys`. This is a common
pattern to avoid SQL injection vulnerabilities and handle variadic
inputs efficiently.
- `Save`: Inserts a new row into the `UserIssues` table. It includes a
`last_modified` timestamp.
- `Delete`: First, it attempts to retrieve the issue to ensure it exists
before attempting deletion. This provides a more informative error
message if the record is not found.
- `GetUserIssuesForTraceKeys`: Constructs a SQL query using a template to
select issues matching the provided trace keys and commit position
range. It then iterates over the query results and populates a slice of
`UserIssue` structs.
- **`sqluserissuestore/schema/schema.go`**: This file defines the Go struct
`UserIssueSchema` which directly maps to the SQL table schema for
`UserIssues`.
- **Purpose**: This provides a typed representation of the database table,
making it easier to reason about the data structure and to potentially
use with ORM-like tools or schema migration utilities.
- **Key Fields**:
- `user_id TEXT NOT NULL`
- `trace_key TEXT NOT NULL`
- `commit_position INT NOT NULL`
- `issue_id INT NOT NULL`
- `last_modified TIMESTAMPTZ DEFAULT now()`
- `PRIMARY KEY(trace_key, commit_position)`: The combination of
`trace_key` and `commit_position` uniquely identifies a user issue,
preventing multiple issues from being associated with the exact same
data point.
- **`mocks/Store.go`**: This contains a mock implementation of the `Store`
interface, generated using the `testify/mock` library.
- **Purpose**: This is essential for unit testing components that depend
on the `userissue.Store` without requiring a live database connection.
It allows developers to define expected calls and return values for the
store's methods.
**Workflow Example: Saving a User Issue**
1. **User Action**: A user on the Perf frontend identifies a data point (e.g.,
on a graph) and associates it with a Buganizer issue ID.
2. **API Request**: The frontend sends a request to a backend API endpoint.
3. **Backend Handler**: The API handler receives the request, which includes
the user's ID, the trace key, the commit position, and the issue ID.
4. **Store Interaction**: The handler creates a `userissue.UserIssue` struct
and calls the `Save` method on an instance of `userissue.Store` (likely
`sqluserissuestore.UserIssueStore`). `User Request (UI) | v API Endpoint | v
Backend Handler | | Creates userissue.UserIssue{UserId:"...",
TraceKey:"...", CommitPosition:123, IssueId:45678} v
userissue.Store.Save(ctx, &issue) | v
sqluserissuestore.UserIssueStore.Save() | | Constructs SQL: INSERT INTO
UserIssues (...) VALUES ($1, $2, $3, $4, $5) v SQL Database (UserIssues
Table) <-- Row inserted`
**Workflow Example: Retrieving User Issues for a Chart**
1. **User Action**: A user views a performance chart displaying multiple traces
over a range of commits.
2. **Frontend Request**: The frontend needs to know if any data points on the
visible traces and commit range have associated issues. It requests this
information from a backend API.
3. **Backend Handler**: The API handler receives the list of trace keys visible
on the chart and the start/end commit positions.
4. **Store Interaction**: The handler calls `GetUserIssuesForTraceKeys` on the
`userissue.Store`. `Chart Display Request (UI) | | Provides:
traceKeys=["trace1", "trace2"], startCommit=100, endCommit=200 v API
Endpoint | v Backend Handler | v
userissue.Store.GetUserIssuesForTraceKeys(ctx, traceKeys, startCommit,
endCommit) | v sqluserissuestore.UserIssueStore.GetUserIssuesForTraceKeys()
| | Constructs SQL: SELECT ... FROM UserIssues WHERE trace_key IN ('trace1',
'trace2') AND commit_position>=100 AND commit_position<=200 v SQL Database
(UserIssues Table) | | Returns rows matching the query v Backend Handler | |
Formats response v API Endpoint | v UI (displays issue markers on chart)`
The design emphasizes a clear separation of concerns with the `Store` interface,
allowing for different storage backends if necessary (though SQL is the current
and likely long-term choice). The SQL implementation is straightforward, using
parameterized queries for security and templates for dynamic query construction
where appropriate.
# Module: /go/workflows
## Module: /go/workflows
### Overview
This module defines and implements Temporal workflows for automating tasks
related to performance anomaly detection and analysis in Skia Perf. It
orchestrates interactions between various services like the AnomalyGroup
service, Culprit service, and Gerrit service to achieve end-to-end automation.
The primary goal is to streamline the process of identifying performance
regressions, finding their root causes (culprits), and notifying relevant
parties.
The workflows are designed to be resilient and fault-tolerant, leveraging
Temporal's capabilities for retries and state management. This ensures that even
if individual steps or external services encounter transient issues, the overall
process can continue and eventually complete.
### Responsibilities and Key Components
The module is structured into a public API (`workflows.go`) and an internal
implementation package (`internal/`).
**`workflows.go`**:
- **Purpose**: Defines the public interface for the workflows, including their
names and the data structures for their parameters and results.
- **Why**: This separation allows other modules (clients) to trigger these
workflows without needing to know the internal implementation details or
depend on the specific libraries used within the workflows. It acts as a
contract.
- **Key Contents**:
- **Workflow Name Constants (`ProcessCulprit`, `MaybeTriggerBisection`)**:
These string constants are the canonical names used to invoke the
respective workflows via the Temporal client. Using constants helps
avoid typos and ensures consistency.
- **Parameter and Result Structs (`ProcessCulpritParam`,
`ProcessCulpritResult`, `MaybeTriggerBisectionParam`,
`MaybeTriggerBisectionResult`)**: These structs define the data that
needs to be passed into a workflow and the data that a workflow is
expected to return upon completion. They ensure type safety and clarity
in communication.
**`internal/` package**: This package contains the actual implementation of the
workflows and their associated activities. Activities are the building blocks of
Temporal workflows, representing individual units of work that can be executed,
retried, and timed out independently.
- **`options.go`**:
- **Purpose**: Centralizes the configuration for Temporal activities and
child workflows.
- **Why**: Provides a consistent way to define timeouts and retry
policies. This makes it easier to manage and adjust these settings
globally or for specific categories of operations. For example,
short-lived activities interacting with external services have different
reliability characteristics than long-running child workflows.
- **Key Components**:
- `regularActivityOptions`: Defines default options (e.g., 1-minute
timeout, 10 retry attempts) for standard activities that are expected to
complete quickly, like API calls to other services.
- `childWorkflowOptions`: Defines options for child workflows (e.g.,
12-hour execution timeout, 4 retry attempts). This longer timeout
accommodates potentially resource-intensive tasks like bisections which
involve compilation and testing.
- **`maybe_trigger_bisection.go`**:
- **Purpose**: Implements the `MaybeTriggerBisectionWorkflow`, which is
the core logic for deciding whether to automatically find the cause of a
performance regression (bisection) or to simply report the anomaly.
- **Why**: This workflow automates a critical decision point in the
performance analysis pipeline. It aims to reduce manual intervention by
automatically initiating bisections for significant regressions while
still allowing for manual reporting of less critical issues.
- **Key Workflow Steps**:
* **Wait**: Pauses for a defined duration (`_WAIT_TIME_FOR_ANOMALIES`,
e.g., 30 minutes). This allows time for related anomalies to be detected
and grouped together, potentially providing a more comprehensive picture
before taking action. `Wait for more anomalies ->`
* **Load Anomaly Group**: Retrieves details of the specific anomaly group
using an activity that calls the AnomalyGroup service. `Load Anomaly
Group (Activity) AnomalyGroup Service <---> Workflow`
* **Decision (Bisect or Report)**: Based on the `GroupAction` field of the
anomaly group: - **If `BISECT`**: a. **Load Top Anomaly**: Fetches the most
significant anomaly within the group. b. **Resolve Commit Hashes**:
Converts the start and end commit positions of the anomaly into Git
commit hashes using an activity that interacts with a Gerrit/Crrev
service. `Get Commit Hashes (Activity) Gerrit/Crrev Service <--->
Workflow` c. **Launch Bisection (Child Workflow)**: Triggers a
separate `CulpritFinderWorkflow` (defined in the
`pinpoint/go/workflows` module) as a child workflow. This child
workflow is responsible for performing the actual bisection. - A
unique ID is generated for the Pinpoint job. - The child workflow is
configured with `ParentClosePolicy: ABANDON`, meaning it will
continue running even if this parent workflow terminates. This is
crucial because bisections can be long-running. - Callback
parameters are passed to the child workflow so it knows how to
report its findings back (e.g., which Anomaly Group ID it's
associated with, which Culprit service to use). `Launch Pinpoint
Bisection Workflow -----------------> Pinpoint.CulpritFinderWorkflow
(Child)` d. **Update Anomaly Group**: Records the ID of the launched
bisection job back into the AnomalyGroup. `Update Anomaly Group with
Bisection ID (Activity) AnomalyGroup Service <---> Workflow` - **If `REPORT`**: a. **Load Top Anomalies**: Fetches a list of the
top N anomalies in the group. b. **Notify User**: Calls an activity
that uses the Culprit service to file a bug or send a notification
about these anomalies. `Notify User of Anomalies (Activity) Culprit
Service <--------> Workflow`
- **Helper Functions**:
- `parseStatisticNameFromChart`, `benchmarkStoriesNeedUpdate`,
`updateStoryDescriptorName`: These functions handle specific data
transformations needed to correctly format parameters for the Pinpoint
bisection request, often due to legacy conventions or differences in how
metrics are named.
- **`process_culprit.go`**:
- **Purpose**: Implements the `ProcessCulpritWorkflow`, which handles the
results of a completed bisection (i.e., when one or more culprits are
identified).
- **Why**: This workflow bridges the gap between a successful bisection
and making that information actionable. It ensures that found culprits
are stored and that users are notified appropriately.
- **Key Workflow Steps**:
* **Convert Commits**: Transforms the commit data from the Pinpoint format
to the format expected by the Culprit service. This involves parsing
repository URLs.
* **Persist Culprit**: Calls an activity to store the identified
culprit(s) in a persistent datastore via the Culprit service. `Persist
Culprit (Activity) Culprit Service <--------> Workflow`
* **Notify User of Culprit**: Calls an activity to notify users (e.g., by
filing or updating a bug) about the identified culprit(s) via the
Culprit service. `Notify User of Culprit (Activity) Culprit Service
<--------> Workflow`
- **Helper Function**:
- `ParsePinpointCommit`: Handles the parsing of repository URLs from the
Pinpoint commit format (e.g., `https://{host}/{project}.git`) into
separate host and project components required by the Culprit service.
- **`anomalygroup_service_activity.go`**:
- **Purpose**: Defines activities that interact with the AnomalyGroup gRPC
service.
- **Why**: Encapsulates the client-side logic for communicating with the
AnomalyGroup service. This makes the workflows themselves cleaner and
focuses them on orchestration rather than low-level RPC details.
- **Key Activities**:
- `LoadAnomalyGroupByID`: Fetches an anomaly group by its ID.
- `FindTopAnomalies`: Retrieves the most significant anomalies within a
group.
- `UpdateAnomalyGroup`: Updates an existing anomaly group (e.g., to add a
bisection ID).
- **`culprit_service_activity.go`**:
- **Purpose**: Defines activities that interact with the Culprit gRPC
service.
- **Why**: Similar to `anomalygroup_service_activity.go`, this
encapsulates communication with the Culprit service.
- **Key Activities**:
- `PeristCulprit`: Stores culprit information.
- `NotifyUserOfCulprit`: Notifies users about a found culprit (e.g., by
creating a bug).
- `NotifyUserOfAnomaly`: Notifies users about a set of anomalies (used
when the group action is `REPORT`).
- **`gerrit_service_activity.go`**:
- **Purpose**: Defines activities for interacting with Gerrit or a
Gerrit-like service (specifically Crrev in this case) to resolve commit
positions to commit hashes.
- **Why**: Bisection workflows often start with commit positions (which
are easier for humans or detection systems to reason about initially)
but need actual Git hashes to perform the bisection. This activity
provides that translation.
- **Key Activity**:
- `GetCommitRevision`: Takes a commit position (as an integer) and returns
its corresponding Git hash.
**`worker/main.go`**:
- **Purpose**: This is the entry point for the Temporal worker process that
hosts and executes the workflows and activities defined in this module.
- **Why**: Temporal workers are the processes that actually run the workflow
and activity code. This `main` function sets up the worker, connects it to
the Temporal server, and registers the workflows and activities it's capable
of handling.
- **Key Operations**:
1. **Initialization**: Sets up logging and Prometheus metrics.
2. **Temporal Client Creation**: Establishes a connection to the Temporal
frontend service.
3. **Worker Creation**: Creates a new Temporal worker associated with a
specific task queue (e.g., `localhost.dev` or a production queue name).
Workflows and activities are dispatched to workers listening on the
correct task queue.
4. **Workflow Registration**: Registers `ProcessCulpritWorkflow` and
`MaybeTriggerBisectionWorkflow` with the worker, associating them with
their public names (e.g., `workflows.ProcessCulprit`).
5. **Activity Registration**: Registers instances of the activity structs
(e.g., `CulpritServiceActivity`, `AnomalyGroupServiceActivity`,
`GerritServiceActivity`) with the worker.
6. **Worker Start**: Starts the worker, which begins polling the specified
task queue for tasks to execute.
### Key Workflows/Processes
**1. Anomaly Group Processing and Potential Bisection
(`MaybeTriggerBisectionWorkflow`)**
```
External Trigger (e.g., new AnomalyGroup created)
|
v
Start MaybeTriggerBisectionWorkflow(AG_ID)
|
+----------------------------------+
| Wait (e.g., 30 mins) |
+----------------------------------+
|
v
LoadAnomalyGroupByID(AG_ID) ----> AnomalyGroup Service
|
+-----------+
| GroupAction?|
+-----------+
/ \
/ \
BISECT REPORT
| |
v v
FindTopAnomalies(AG_ID, Limit=1) FindTopAnomalies(AG_ID, Limit=10)
| |
v v
GetCommitRevision(StartCommit) --> Gerrit Anomalies --> Convert to CulpritService format
| |
v v
GetCommitRevision(EndCommit) --> Gerrit NotifyUserOfAnomaly(AG_ID, Anomalies) --> Culprit Service
|
v
Execute Pinpoint.CulpritFinderWorkflow (Child)
| (Async, ParentClosePolicy=ABANDON)
| Params: {StartHash, EndHash, Config, Benchmark, Story, ...
| CallbackParams: {AG_ID, CulpritServiceURL, GroupingTaskQueue}}
|
v
UpdateAnomalyGroup(AG_ID, BisectionID) --> AnomalyGroup Service
|
v
End Workflow
```
**2. Processing Bisection Results (`ProcessCulpritWorkflow`)**
This workflow is typically triggered as a callback by the Pinpoint
`CulpritFinderWorkflow` when it successfully identifies a culprit.
```
Pinpoint.CulpritFinderWorkflow completes
| (Calls back to Temporal, invoking ProcessCulpritWorkflow)
v
Start ProcessCulpritWorkflow(Commits, AG_ID, CulpritServiceURL)
|
+----------------------------------+
| Convert Pinpoint Commits to |
| Culprit Service Format |
| (Parse Repository URLs) |
+----------------------------------+
|
v
PersistCulprit(Commits, AG_ID) --------> Culprit Service
| (Returns CulpritIDs)
v
NotifyUserOfCulprit(CulpritIDs, AG_ID) -> Culprit Service
| (Returns IssueIDs, e.g., bug numbers)
v
End Workflow
```
# Module: /integration
The `/integration` module provides a dataset and tools for conducting
integration tests on the Perf performance monitoring system. Its primary purpose
is to offer a controlled and reproducible environment for verifying the
ingestion and processing capabilities of Perf.
The core of this module is the `data` subdirectory. This directory houses a
collection of JSON files, each representing performance data associated with
specific commits from the `perf-demo-repo`
(https://github.com/skia-dev/perf-demo-repo.git). These files are structured
according to the `format.Format` schema defined in
`go.skia.org/infra/perf/go/ingest/format`. This standardized format is crucial
as it allows Perf's 'dir' type ingester to directly consume these files. The
dataset is intentionally designed to include a mix of valid data points and
specific error conditions:
- **Nine "good" files:** These represent typical, valid performance data that
Perf should successfully ingest and process. Each file corresponds to a
known commit in the `perf-demo-repo`.
- **One file with a "bad" commit:** This file (`demo_data_commit_10.json`)
contains a `git_hash` that does not correspond to an actual commit in the
`perf-demo-repo`. This allows testing how Perf handles data associated with
unknown or invalid commit identifiers.
- **One malformed JSON file:** `malformed.json` is intentionally not a valid
JSON file. This is used to test Perf's error handling capabilities when
encountering incorrectly formatted input data.
The generation of these data files is handled by `generate_data.go`. This Go
program is responsible for creating the JSON files in the `data` directory. It
uses a predefined list of commit hashes from the `perf-demo-repo` and generates
random but plausible performance metrics for each. The inclusion of this
generator script is important because it allows developers to easily modify,
expand, or regenerate the test dataset if the testing requirements change or if
new scenarios need to be covered. The script uses `math/rand` for generating
some variability in the measurement values, ensuring the data isn't entirely
static while still being predictable.
The key workflow for utilizing this module in an integration test scenario would
look something like this:
1. **Setup Perf:** Configure a local instance of Perf.
2. **Configure Ingester:** Point Perf's 'dir' type ingester to the
`/integration/data` directory. `Perf Instance --> Ingester (type: 'dir') -->
/integration/data/*.json`
3. **Run Ingestion:** Trigger the ingestion process in Perf.
4. **Verify:**
- Confirm that the data from the nine "good" files is correctly ingested
and displayed in Perf.
- Check that Perf appropriately handles the file with the "bad" commit
(e.g., logs an error, flags the data).
- Verify that Perf correctly identifies and reports the error with the
`malformed.json` file.
The `BUILD.bazel` file defines how the components of this module are built.
- The `data` `filegroup` makes the JSON test files available to other parts of
the system, specifically for use in performance testing
(`//perf:__subpackages__`).
- The `integration_lib` `go_library` encapsulates the logic from
`generate_data.go`.
- The `integration` `go_binary` provides an executable to run
`generate_data.go`, allowing for easy regeneration of the test data.
In essence, the `/integration` module provides a self-contained,
version-controlled set of test data and a mechanism to regenerate it. This is
crucial for ensuring the stability and correctness of Perf's data ingestion
pipeline by providing a consistent baseline for integration testing. The choice
to include both valid and intentionally erroneous data points allows for
comprehensive testing of Perf's data handling capabilities, including its
robustness in the face of invalid input.
# Module: /jupyter
The `/jupyter` module provides tools and examples for interacting with Skia's
performance data, specifically data from `perf.skia.org`. The primary goal is to
enable users to programmatically query, analyze, and visualize performance
metrics using the power of Python libraries like Pandas, NumPy, and Matplotlib
within a Jupyter Notebook environment.
The core functionality revolves around fetching and processing performance data.
This is achieved by providing Python functions that abstract the complexities of
interacting with the `perf.skia.org` API. This allows users to focus on the data
analysis itself rather than the underlying data retrieval mechanisms.
**Key Components/Files:**
- **`/jupyter/Perf+Query.ipynb`**: This is a Jupyter Notebook that serves as
both an example and a utility library.
- **Why**: It demonstrates how to use the provided Python functions to
query performance data. It also contains the definitions of these key
functions, making it a self-contained environment for performance
analysis. The notebook format is chosen for its interactive nature,
allowing users to execute code snippets, see results immediately, and
experiment with different queries and visualizations.
- **How**:
- **`perf_calc(formula)`**: This function is designed to evaluate a
specific formula against the performance data. It takes a string
`formula` (e.g., `'count(filter(\"\"))'`) as input. The formula is sent
to the `perf.skia.org` backend for processing. This function is useful
when you need to perform calculations or aggregations on the data
directly on the server side before retrieving it.
- **`perf_query(query)`**: This function allows for more direct querying
of performance data based on key-value pairs. It takes a query string
(e.g., `'source_type=skp&sub_result=min_ms'`) that specifies the
parameters for data retrieval. This is suitable when you want to fetch
raw or filtered trace data.
- **`perf_impl(body)`**: This is an internal helper function used by both
`perf_calc` and `perf_query`. It handles the actual HTTP communication
with `perf.skia.org`. It first determines the time range for the query
(typically the last 50 commits by default) by fetching initial page
data. Then, it sends the query or formula to the `/_/frame/start`
endpoint, polls the `/_/frame/status` endpoint until the request is
successful, and finally retrieves the results from `/_/frame/results`.
The results are then processed into a Pandas DataFrame, which is a
powerful data structure for analysis in Python. A special value `1e32`
from the backend (often representing missing or invalid data) is
converted to `np.nan` (Not a Number) for better handling in Pandas.
- **`paramset()`**: This utility function fetches the available parameter
set from `perf.skia.org`. This is useful for discovering the possible
values for different dimensions like 'model', 'test', 'cpu_or_gpu',
etc., which can then be used to construct more targeted queries.
- **Examples**: The notebook is rich with examples showcasing how to use
`perf_calc` and `perf_query`, plot the resulting DataFrames using
Pandas' built-in plotting capabilities or Matplotlib directly, normalize
data, calculate means, and perform more complex analyses like finding
the noisiest hardware models or comparing CPU vs. GPU performance for
specific tests. These examples serve as practical starting points for
users.
- **Workflow (Simplified `perf_impl`):**
* `Client (Jupyter Notebook)` -- `GET /_/initpage/` --> `perf.skia.org`
(Get time bounds)
* `perf.skia.org` -- `Initial Data (JSON)` --> `Client`
* `Client` -- `POST /_/frame/start (with query/formula & time bounds)` -->
`perf.skia.org`
* `perf.skia.org` -- `Request ID (JSON)` --> `Client`
* `Client` -- `GET /_/frame/status/{ID}` --> `perf.skia.org` (Loop until
'Success')
* `perf.skia.org` -- `Status (JSON)` --> `Client`
* `Client` -- `GET /_/frame/results/{ID}` --> `perf.skia.org`
* `perf.skia.org` -- `Performance Data (JSON)` --> `Client`
* `Client (Python)`: Parse JSON -> Create Pandas DataFrame -> Return
DataFrame to user.
- **`/jupyter/README.md`**: This file provides instructions on setting up the
necessary Python environment to run Jupyter Notebooks and the required
libraries (Pandas, SciPy, Matplotlib).
- **Why**: Python environment management can be tricky, especially with
system-wide installations. Using a virtual environment (`virtualenv`) is
recommended to isolate project dependencies and avoid conflicts.
- **How**: It guides the user through installing `pip`, `python-dev`, and
`python-virtualenv` using `apt-get` (assuming a Debian-based Linux
system). It then shows how to create a virtual environment, activate it,
upgrade `pip`, and install `jupyter`, `notebook`, `scipy`, `pandas`, and
`matplotlib` within that isolated environment. Finally, it explains how
to run the Jupyter Notebook server and deactivate the environment when
done. This ensures a reproducible and clean setup for users wanting to
utilize the `Perf+Query.ipynb` notebook.
The design emphasizes ease of use for data analysts and developers who need to
interact with Skia's performance data. By leveraging Jupyter Notebooks, it
provides an interactive and visual way to explore performance trends and issues.
The abstraction of API calls into simple Python functions (`perf_calc`,
`perf_query`) significantly lowers the barrier to entry for accessing this rich
dataset.
# Module: /lint
The `/lint` module is responsible for ensuring code quality and consistency
within the project by integrating and configuring JSHint, a popular JavaScript
linting tool.
The primary goal of this module is to provide a standardized way to identify and
report potential errors, stylistic issues, and anti-patterns in the JavaScript
codebase. This helps maintain code readability, reduces the likelihood of bugs,
and promotes adherence to established coding conventions.
The core component of this module is the `reporter.js` file. This file defines a
custom reporter function that JSHint will use to format and output the linting
results.
The decision to implement a custom reporter stems from the need to present
linting errors in a clear, concise, and actionable format. Instead of relying on
JSHint's default output, which might be too verbose or not ideally suited for
the project's workflow, `reporter.js` provides a tailored presentation.
The `reporter` function within `reporter.js` takes an array of error objects
(`res`) as input, where each object represents a single linting issue found by
JSHint. It then iterates through these error objects and constructs a formatted
string for each error. The format chosen is `filename:line:character message`,
which directly points developers to the exact location of the issue in the
source code.
For example: `src/myFile.js:10:5 Missing semicolon`
This specific format is chosen for its commonality in development tools and its
ease of integration with various editors and IDEs, allowing developers to
quickly navigate to the reported errors.
After processing all errors, if any were found, the `reporter` function
aggregates the formatted error strings and prints them to the standard output
(`process.stdout.write`). Additionally, it appends a summary line indicating the
total number of errors found, ensuring that developers have a quick overview of
the linting status. The pluralization of "error" vs. "errors" is also handled
for grammatical correctness.
The workflow can be visualized as:
```
JSHint analysis --[error objects]--> reporter.js --[formatted errors & summary]--> stdout
```
By controlling the output format, this module ensures that linting feedback is
consistently presented and easily digestible, contributing to a more efficient
development process. The design prioritizes providing actionable information to
developers, enabling them to address code quality issues promptly.
# Module: /migrations
This module is responsible for managing SQL database schema migrations for Perf.
Perf utilizes SQL backends to store various data, including trace data,
shortcuts, and alerts. As the application evolves, the database schema may need
to change. This module provides the mechanism to apply these changes and to
upgrade existing databases to the schema expected by the current Perf version.
The core of this system relies on the `github.com/golang-migrate/migrate/v4`
library. This library provides a robust framework for versioning database
schemas and applying migrations in a controlled manner.
The key design principle is to have a versioned set of SQL scripts for each
supported SQL dialect. This allows Perf to:
1. **Initialize a new database** with the correct schema.
2. **Upgrade an existing database** from an older schema version to the current
one.
3. **Rollback schema changes** if necessary, by providing "down" migrations.
Each SQL dialect (e.g., CockroachDB) has its own subdirectory within the
`/migrations` module. The naming convention for these directories is critical:
they must match the values defined in `sql.Dialect`.
Inside each dialect-specific directory, migration files are organized by
version.
- File names are prefixed with a 0-padded version number (e.g., `0001_`,
`0002_`).
- For each version, there are two files:
- An `.up.` file (e.g., `0001_create_initial_tables.up.sql`): Contains SQL
statements to apply the schema changes for that version.
- A `.down.` file (e.g., `0001_create_initial_tables.down.sql`): Contains
SQL statements to revert the schema changes introduced by the
corresponding `.up.` file.
This paired approach ensures that migrations can be applied and rolled back
smoothly.
**Key Files and Responsibilities:**
- `README.md`: Provides a high-level overview of the migration system,
explaining its purpose and the use of the `golang-migrate/migrate` library.
It also details the directory structure and file naming conventions for
migration scripts.
- `cockroachdb/`: This directory contains the migration scripts specifically
for the CockroachDB dialect.
- `cockroachdb/0001_create_initial_tables.up.sql`: This is the first
migration script for CockroachDB. It defines the initial schema for
Perf, creating tables such as `TraceValues`, `SourceFiles`, `ParamSets`,
`Postings`, `Shortcuts`, `Alerts`, `Regressions`, and `Commits`. The
table definitions include primary keys, indexes, and column types
tailored for efficient data storage and retrieval specific to Perf's
needs (e.g., storing trace data, associating traces with source files,
managing alert configurations, and tracking commit history). The schema
is designed to support the various functionalities of Perf, such as
querying traces by parameters, retrieving trace values over commit
ranges, and linking regressions to specific alerts and commits.
- `cockroachdb/0001_create_initial_tables.down.sql`: This file is intended
to contain SQL statements to drop the tables created by its
corresponding `.up.` script. However, as a safety precaution against
accidental data loss, it is currently empty. The design acknowledges the
potential danger of automated table drops in a production environment.
- `cdb.sql`: This is a utility SQL script designed for developers to interact
with and test queries against a CockroachDB instance populated with Perf
data. It includes sample `INSERT` statements to populate tables with test
data and various `SELECT` queries demonstrating common data retrieval
patterns used by Perf. This file is not part of the automated migration
process but serves as a helpful tool for development and debugging. It
showcases how to query for traces based on parameters, retrieve trace
values, find the most recent tile, and get source file information. It also
includes examples of more complex queries involving `INTERSECT` and `JOIN`
operations, reflecting the kinds of queries Perf might execute.
- `test.sql`: Similar to `cdb.sql`, this script is for testing and
experimentation, but it's tailored for a SQLite database. It creates a
schema similar to the CockroachDB one (though potentially simplified or with
slight variations due to dialect differences) and populates it with test
data. It contains a series of `CREATE TABLE`, `INSERT`, and `SELECT`
statements that developers can use to quickly set up a local test
environment and verify SQL logic.
- `batch-delete.sh` and `batch-delete.sql`: These files provide a mechanism
for performing batch deletions of specific parameter data from the
`ParamSets` table in a CockroachDB instance.
- `batch-delete.sql`: Contains the `DELETE` SQL statement. It is designed
to be edited directly to specify the deletion criteria (e.g.,
`tile_number`, `param_key`, `param_value` ranges) and the `LIMIT` for
the number of rows deleted in each batch. This batching approach is
crucial for deleting large amounts of data without overwhelming the
database or causing long-running transactions.
- `batch-delete.sh`: A shell script that repeatedly executes
`batch-delete.sql` using the `cockroach sql` command-line tool. It runs
in a loop with a short sleep interval, allowing for controlled,
iterative deletion. This script assumes that a port-forward to the
CockroachDB instance is already established. This utility is likely used
for data cleanup or maintenance tasks that require removing specific,
potentially large, datasets.
**Migration Workflow (Conceptual):**
When Perf starts or when a migration command is explicitly run:
1. **Determine Current Schema Version:** The `golang-migrate/migrate` library
connects to the database and checks the current schema version (often stored
in a dedicated migrations table managed by the library itself).
2. **Identify Target Schema Version:** This is typically the highest version
number found among the migration files for the configured SQL dialect.
3. **Apply Pending Migrations:**
- If the current schema version is lower than the target version, the
library iteratively executes the `.up.sql` files in ascending order of
their version numbers, starting from the version immediately following
the current one, up to the target version.
- Each successful `.up.` migration updates the schema version in the
database.
Example: Current Version = 0, Target Version = 2 `DB State (v0) --> Run
0001*\*.up.sql --> DB State (v1) --> Run 0002*\*.up.sql --> DB State (v2)`
4. **Rollback Migrations (if needed):**
- If a user needs to revert to an older schema version, the library can
execute the `.down.sql` files in descending order.
Example: Current Version = 2, Target Rollback Version = 0 `DB State (v2) -->
Run 0002*\*.down.sql --> DB State (v1) --> Run 0001*\*.down.sql --> DB State
(v0)`
The `BUILD.bazel` file defines a `filegroup` named `cockroachdb` which bundles
all files under the `cockroachdb/` subdirectory. This is likely used by other
parts of the Perf build system, perhaps to package these migration scripts or
make them accessible to the Perf application when it needs to perform
migrations.
# Module: /modules
## Modules Documentation
### Overview
The `modules` directory contains a collection of frontend TypeScript modules
that constitute the building blocks of the Perf web application's user
interface. These modules primarily define custom HTML elements (web components)
and utility functions for various UI functionalities, data processing, and
interaction with backend services. The architecture emphasizes modularity,
reusability, and a component-based approach, largely leveraging the Lit library
for creating custom elements and `elements-sk` for common UI widgets.
The design philosophy encourages separation of concerns:
- **UI Components:** Dedicated custom elements encapsulate specific UI
features like plotting, alert configuration, data tables, dialogs, and input
controls.
- **Data Handling:** Modules like `dataframe` and `progress` manage data
fetching, processing, and state.
- **Utilities:** Modules like `paramtools`, `pivotutil`, `cid`, and `trybot`
provide common functionalities for data manipulation, key parsing, and
specific calculations.
- **Styling and Theming:** A centralized `themes` module ensures a consistent
visual appearance, building upon `infra-sk`'s theming capabilities.
- **JSON Contracts:** The `json` module defines TypeScript interfaces that
mirror backend Go structures, ensuring type safety in client-server
communication.
This modular structure aims to create a maintainable and scalable frontend
codebase. Each module typically includes its core logic, associated styles, demo
pages for isolated development and testing, and unit/integration tests.
### Key Responsibilities and Components
A significant portion of the modules is dedicated to creating custom HTML
elements that serve as interactive UI components. These elements often
encapsulate complex behavior and interactions, simplifying their use in
higher-level page components.
**Data Visualization and Interaction:**
- `plot-simple-sk`: A custom-built canvas-based plotting element for rendering
interactive line graphs, optimized for performance with features like dual
canvases, Path2D objects, and k-d trees for point proximity.
- `plot-google-chart-sk`: An alternative plotting element that wraps the
Google Charts library, offering a rich set of features and interactivity
like panning, zooming, and trace visibility toggling.
- `plot-summary-sk`: Displays a summary plot (often using Google Charts) and
allows users to select a range, which is useful for overview and drill-down
scenarios.
- `chart-tooltip-sk`: Provides a detailed, interactive tooltip for data points
on charts, showing commit information, anomaly details, and actions like
bisection or requesting traces.
- `graph-title-sk`: Displays a structured title for graphs, showing key-value
parameter pairs associated with the plotted data.
- `word-cloud-sk`: Visualizes key-value pairs and their frequencies as a
textual list with proportional bars.
**Alert and Regression Management:**
- `alert-config-sk`: A UI for creating and editing alert configurations,
including query definition, detection algorithms, and notification settings.
- `alerts-page-sk`: A page for viewing, creating, and managing all alert
configurations.
- `cluster-summary2-sk`: Displays a detailed summary of a performance cluster,
including a plot, statistics, and triage controls.
- `anomalies-table-sk`: Renders a sortable and interactive table of detected
performance anomalies, allowing for grouping and bulk actions like triage
and graphing.
- `anomaly-sk`: Displays detailed information about a single performance
anomaly.
- `triage-status-sk`: A simple button-like element indicating the current
triage status of a cluster and allowing users to initiate the triage
process.
- `triage-menu-sk`: Provides a menu for bulk triage actions on selected
anomalies, including assigning bugs or marking them as ignored.
- `new-bug-dialog-sk`: A dialog for filing new bugs related to anomalies,
pre-filling details.
- `existing-bug-dialog-sk`: A dialog for associating anomalies with existing
bug reports.
- `user-issue-sk`: Manages the association of user-reported Buganizer issues
with specific data points.
- `bisect-dialog-sk`: A dialog for initiating a Pinpoint bisection process to
find the commit causing a regression.
- `pinpoint-try-job-dialog-sk`: A (legacy) dialog for initiating Pinpoint A/B
try jobs to request additional traces.
- `triage-page-sk`: A page dedicated to viewing and triaging regressions based
on time range and filters.
- `regressions-page-sk`: A page for viewing regressions associated with
specific "subscriptions" (e.g., sheriff configs).
- `subscription-table-sk`: Displays details of a subscription and its
associated alerts.
- `revision-info-sk`: Displays information about anomalies detected around a
specific revision.
**Data Input and Selection:**
- `query-sk`: A comprehensive UI for constructing complex queries by selecting
parameters and their values.
- `paramset-sk`: Displays a set of parameters and their values, often used to
summarize a query or data selection.
- `query-chooser-sk`: Combines `paramset-sk` (for summary) and `query-sk` (in
a dialog) for a compact query selection experience.
- `query-count-sk`: Shows the number of items matching a given query, fetching
this count from a backend endpoint.
- `commit-detail-picker-sk`: Allows users to select a specific commit from a
range, typically presented in a dialog with date range filtering.
- `commit-detail-panel-sk`: Displays a list of commit details, making them
selectable.
- `commit-detail-sk`: Displays information about a single commit with action
buttons.
- `calendar-input-sk`: A date input field combined with a calendar picker
dialog.
- `calendar-sk`: A standalone interactive calendar widget.
- `day-range-sk`: Allows selection of a "begin" and "end" date.
- `domain-picker-sk`: Allows selection of a data domain either by date range
or by a number of recent commits.
- `test-picker-sk`: A guided, multi-step picker for selecting tests or traces
by sequentially choosing parameter values.
- `picker-field-sk`: A text input field with a filterable dropdown menu of
predefined options, built using Vaadin ComboBox.
- `algo-select-sk`: A dropdown for selecting a clustering algorithm.
- `split-chart-menu-sk`: A menu for selecting an attribute by which to split a
chart.
- `pivot-query-sk`: A UI for configuring pivot table requests (group by,
operations, summaries).
- `triage2-sk`: A set of three buttons for selecting a triage status
(positive, negative, untriaged).
- `tricon2-sk`: An icon that visually represents one of the three triage
states.
**Data Display and Structure:**
- `pivot-table-sk`: Displays pivoted DataFrame data in a sortable table.
- `json-source-sk`: A dialog for viewing the raw JSON source data for a
specific trace point.
- `ingest-file-links-sk`: Displays relevant links (e.g., to Swarming,
Perfetto) associated with an ingested data point.
- `point-links-sk`: Displays links from ingestion files and generates commit
range links between data points.
- `commit-range-sk`: Dynamically generates a URL to a commit range viewer
based on begin and end commits.
**Scaffolding and Application Structure:**
- `perf-scaffold-sk`: Provides the consistent layout, header, and navigation
sidebar for all Perf application pages.
- `explore-simple-sk`: The core element for exploring and visualizing
performance data, including querying, plotting, and anomaly interaction.
- `explore-sk`: Wraps `explore-simple-sk`, adding features like user
authentication, default configurations, and optional integration with
`test-picker-sk`.
- `explore-multi-sk`: Allows displaying and managing multiple
`explore-simple-sk` graphs simultaneously, with shared controls and shortcut
management.
- `favorites-dialog-sk`: A dialog for adding or editing bookmarked "favorites"
(named URLs).
- `favorites-sk`: Displays and manages a user's list of favorites.
**Backend Interaction and Data Processing Utilities:**
- `cid/cid.ts`: Provides `lookupCids` to fetch detailed commit information
based on commit numbers.
- `common/plot-builder.ts` & `common/plot-util.ts`: Utilities for transforming
`DataFrame` and `TraceSet` data into formats suitable for plotting libraries
(especially Google Charts) and for creating consistent chart options.
- `common/test-util.ts`: Sets up mocked API responses (`fetch-mock`) for
various backend endpoints, facilitating isolated testing and demo page
development.
- `const/const.ts`: Defines shared constants, notably `MISSING_DATA_SENTINEL`
for representing missing data points, ensuring consistency with the backend.
- `csv/index.ts`: Converts `DataFrame` objects into CSV format for data
export.
- `dataframe/index.ts` & `dataframe/dataframe_context.ts`: Core logic for
managing and manipulating `DataFrame` objects. `DataFrameRepository` (a
LitElement context provider) handles fetching, caching, merging, and
providing `DataFrame` and `DataTable` objects to consuming components.
- `dataframe/traceset.ts`: Utilities for extracting and formatting information
from trace keys within DataFrames/DataTables, such as generating chart
titles and legends.
- `errorMessage/index.ts`: A wrapper around `elements-sk`'s `errorMessage` to
display persistent error messages by default.
- `json/index.ts`: Contains TypeScript interfaces and types that define the
structure of JSON data exchanged with the backend, crucial for type safety
and often auto-generated from Go structs.
- `paramtools/index.ts`: Client-side utilities for creating, parsing, and
manipulating `ParamSet` objects and structured trace keys (e.g., `makeKey`,
`fromKey`, `queryFromKey`).
- `pivotutil/index.ts`: Utilities for validating pivot table requests
(`pivot.Request`) and providing descriptions for pivot operations.
- `progress/progress.ts`: Implements `startRequest` for initiating and polling
the status of long-running server-side tasks, providing progress updates to
the UI.
- `trace-details-formatter/traceformatter.ts`: Provides `TraceFormatter`
implementations (default and Chrome-specific) for converting trace parameter
sets to display strings and vice-versa for querying.
- `trybot/calcs.ts`: Calculates and aggregates `stddevRatio` values from Perf
trybot results, grouping them by parameter to identify performance impacts.
- `trybot-page-sk`: A page for analyzing performance regressions based on
commit or trybot run, using `trybot/calcs` for analysis.
- `window/index.ts`: Utilities related to the browser `window` object,
including parsing build tag information from `window.perf.image_tag`.
**Core Architectural Patterns:**
- **Custom Elements (Web Components):** The UI is primarily built using custom
elements, promoting encapsulation, reusability, and interoperability. Most
elements extend `ElementSk` from `infra-sk`.
- **Lit Library:** Widely used for defining custom elements, providing
efficient templating (`lit-html`) and reactive updates.
- **State Management:**
- Local component state is managed within the elements themselves.
- `stateReflector` (from `infra-sk`) is frequently used to synchronize
component state with URL query parameters, enabling bookmarking and
shareable views (e.g., `alerts-page-sk`, `explore-simple-sk`,
`triage-page-sk`).
- Lit contexts (`@lit/context`) are used for providing shared data down
the component tree without prop drilling, notably in
`dataframe/dataframe_context.ts` for `DataFrame` objects.
- **Event-Driven Communication:** Components often communicate using custom
DOM events. Child components emit events, and parent components listen and
react to them (e.g., `query-sk` emits `query-change`, `triage-status-sk`
emits `start-triage`).
- **Asynchronous Operations:** `fetch` API is used for backend communication.
Promises and `async/await` are standard for handling these asynchronous
operations. Spinners (`spinner-sk`) provide user feedback during loading.
- **Modularity and Dependencies:** Modules are designed to be relatively
self-contained, with clear dependencies declared in `BUILD.bazel` files.
This allows for better organization and easier maintenance.
- **Testing:** Each module typically has associated demo pages (`*-demo.html`,
`*-demo.ts`) for isolated development and visual testing, Karma unit tests
(`*_test.ts`), and Puppeteer end-to-end/screenshot tests
(`*_puppeteer_test.ts`). `fetch-mock` is extensively used in demos and tests
to simulate backend responses.
This comprehensive set of modules forms a rich ecosystem for building and
maintaining the Perf application's frontend, with a strong emphasis on modern
web development practices and reusability.
# Module: /modules/alert
## Alert Module Documentation
### Overview
The `alert` module is responsible for validating the configuration of alerts
within the Perf system. Its primary function is to ensure that alert definitions
adhere to a set of predefined rules, guaranteeing their proper functioning and
preventing errors. This module plays a crucial role in maintaining the
reliability of the alerting system by catching invalid configurations before
they are deployed.
### Design Decisions and Implementation Choices
The core design principle behind this module is simplicity and focused
responsibility. Instead of incorporating complex validation logic directly into
other parts of the system (like the UI or backend services that handle alert
creation/modification), this module provides a dedicated, reusable validation
function. This promotes modularity and makes the validation logic easier to
maintain and update.
The choice of using a simple function (`validate`) that returns a string (empty
for valid, error message for invalid) is intentional. This approach is
straightforward to understand and integrate into various parts of the
application. It avoids throwing exceptions for validation failures, which can
sometimes complicate control flow, and instead provides clear, human-readable
feedback.
The current validation is intentionally minimal, focusing on the essential
requirement of a non-empty query. This is a pragmatic approach, starting with
the most critical validation and allowing for the addition of more complex rules
as the system evolves. The dependency on `//perf/modules/json:index_ts_lib`
indicates that the structure of an `Alert` is defined externally, and this
module consumes that definition.
### Key Components and Responsibilities
- **`index.ts`**: This is the central file of the module.
- **Responsibility**: It houses the primary validation logic for `Alert`
configurations.
- **`validate(alert: Alert): string` function**:
- **Purpose**: This function is the public API of the module. It takes an
`Alert` object (as defined in the `../json` module) as input.
- **How it works**: It performs a series of checks on the properties of
the `alert` object. Currently, it verifies that the `query` property of
the `Alert` is present and not an empty string.
- **Output**: If all checks pass, it returns an empty string, signifying
that the `Alert` configuration is valid. If any check fails, it returns
a string containing a descriptive error message indicating why the
`Alert` is considered invalid. This message is intended to be
user-friendly and help in correcting the configuration.
### Key Workflows
**Alert Validation Workflow:**
```
External System (e.g., UI, API) -- Passes Alert object --> [alert/index.ts: validate()]
|
V
[ Is alert.query non-empty? ]
|
+--------------------------+--------------------------+
| (Yes) | (No)
V V
[ Returns "" (empty string) ] [ Returns "An alert must have a non-empty query." ]
| |
V V
External System <-- Receives validation result -- [ Interprets result (valid/invalid) ]
```
This workflow illustrates how an external system would interact with the
`validate` function. The external system provides an `Alert` object, and the
`validate` function returns a string. The external system then uses this string
to determine if the alert configuration is valid and can proceed accordingly
(e.g., save the alert, display an error to the user).
# Module: /modules/alert-config-sk
The `alert-config-sk` module provides a custom HTML element,
`<alert-config-sk>`, designed for creating and editing alert configurations
within the Perf application. This element serves as a user interface for
defining the conditions under which an alert should be triggered, how
regressions are detected, and where notifications should be sent.
**Core Functionality and Design:**
The primary goal of `alert-config-sk` is to offer a comprehensive yet
user-friendly way to manage alert settings. It encapsulates all the necessary
input fields and logic for defining an `Alert` object, which is a central data
structure in Perf for representing alert configurations.
Key design considerations include:
- **Modularity and Reusability:** By packaging the alert configuration UI as a
custom element, it can be easily integrated into various parts of the Perf
application where alert management is needed.
- **Dynamic UI based on Context:** The UI adapts based on global settings
(e.g., `window.perf.notifications`, `window.perf.display_group_by`,
`window.perf.need_alert_action`). This allows the same component to present
different options depending on the specific Perf instance's configuration or
the user's context. For example, the notification options (email vs. issue
tracker) and the visibility of "Group By" settings can change.
- **Data Binding and Reactivity:** The element uses Lit library for templating
and reactivity. Changes in the input fields directly update the internal
`_config` object, and changes to the element's properties (like `config`,
`paramset`) trigger re-renders.
- **Integration with other Perf modules:** It leverages other custom elements
like `query-chooser-sk` for selecting traces, `algo-select-sk` for choosing
clustering algorithms, and various `elements-sk` components (e.g.,
`select-sk`, `multi-select-sk`, `checkbox-sk`) for standard UI inputs. This
promotes consistency and reduces redundant code.
- **User Feedback and Validation:** The component provides immediate feedback,
such as displaying different threshold units based on the selected step
detection algorithm and validating input for fields like the Issue Tracker
Component ID. It also includes "Test" buttons to verify alert notification
and bug template configurations.
**Key Components and Files:**
- **`alert-config-sk.ts`:** This is the heart of the module, defining the
`AlertConfigSk` class which extends `ElementSk`.
- **Properties:**
- `config`: An `Alert` object representing the current alert configuration
being edited. This is the primary data model for the component.
- `paramset`: A `ParamSet` object providing the available parameters and
their values for constructing queries (used by `query-chooser-sk`).
- `key_order`: An array of strings dictating the preferred order of keys
in the `query-chooser-sk`.
- **Templating (`template` static method):** Uses `lit-html` to define the
structure and content of the element. It dynamically renders sections
based on the current configuration and global settings (e.g.,
`window.perf.notifications`).
- **Event Handling:** Listens to events from child components (e.g.,
`query-change` from `query-chooser-sk`, `selection-changed` from
`select-sk`) to update the `_config` object.
- **Logic for Dynamic UI:**
- The `thresholdDescriptors` object maps step detection algorithms to
their corresponding units and descriptive labels, ensuring the
"Threshold" input field is always relevant.
- Conditional rendering (e.g., using `?` operator in lit-html or `if`
statements in helper functions like `_groupBy`) is used to show/hide UI
elements based on `window.perf` flags.
- **API Interaction:**
- `testBugTemplate()`: Sends a `POST` request to `/_/alert/bug/try` to
test the configured bug URI template.
- `testAlert()`: Sends a `POST` request to `/_/alert/notify/try` to test
the alert notification setup.
- **Helper Functions:**
- `toDirection()`, `toConfigState()`: Convert string values from UI
selections to the appropriate enum types for the `Alert` object.
- `indexFromStep()`: Determines the correct selection index for the "Step
Detection" dropdown based on the current `_config.step` value.
- **`alert-config-sk.scss`:** Contains the SASS styles for the element,
ensuring a consistent look and feel within the Perf application. It imports
styles from `themes_sass_lib` and `buttons_sass_lib` for theming and button
styling.
- **`alert-config-sk-demo.html` and `alert-config-sk-demo.ts`:** Provide a
demonstration page for the `alert-config-sk` element.
- The HTML sets up a basic page structure with an instance of
`alert-config-sk` and buttons to manipulate global `window.perf`
settings, allowing developers to test different UI states of the
component.
- The TypeScript file initializes the demo, sets up mock `paramset` and
`config` data, and provides event listeners for the control buttons to
refresh the `alert-config-sk` component and display its current state.
This is crucial for development and testing.
- **`alert-config-sk_puppeteer_test.ts`:** Contains Puppeteer tests for the
component. These tests verify that the component renders correctly in
different states (e.g., with/without group_by, different notification
options) by interacting with the demo page and taking screenshots.
- **`index.ts`:** A simple entry point that imports and thereby registers the
`alert-config-sk` custom element, making it available for use in HTML.
**Workflow Example: Editing an Alert**
1. **Initialization:**
- An instance of `alert-config-sk` is added to the DOM.
- The `paramset` property is set, providing the available trace
parameters.
- The `config` property is set with the `Alert` object to be edited (or a
default new configuration).
- Global `window.perf` settings influence which UI sections are initially
visible.
2. **User Interaction:**
- The user modifies various fields: Display Name, Category, Query (via
`query-chooser-sk`), Grouping (via `algo-select-sk`), Step Detection,
Threshold, etc.
- As the user changes a field (e.g., selects a new "Step Detection"
algorithm from the `select-sk`):
- An event is dispatched by the child component (e.g.,
`selection-changed`).
- `alert-config-sk` listens for this event.
- The event handler in `alert-config-sk.ts` updates the corresponding
property in its internal `_config` object (e.g.,
`this._config.step = newStepValue`).
- The component re-renders (managed by Lit) to reflect the change. For
instance, if the "Step Detection" changes, the "Threshold" label and
units dynamically update.
```
User interacts with <select-sk id="step">
|
V
<select-sk> emits 'selection-changed' event
|
V
AlertConfigSk.stepSelectionChanged(event) is called
|
V
this._config.step is updated
|
V
this._render() is (indirectly) called by Lit
|
V
UI updates, e.g., label for "Threshold" input changes
```
3. **Testing Configuration (Optional):**
- User clicks "Test" for bug template:
- `AlertConfigSk.testBugTemplate()` is called.
- A POST request is made to `/_/alert/bug/try`.
- The response (a URL to the bug) is opened in a new tab, or an error
is shown.
- User clicks "Test" for alert notification:
- `AlertConfigSk.testAlert()` is called.
- A POST request is made to `/_/alert/notify/try`.
- A success/error message is displayed.
4. **Saving Changes:**
- The parent component or application logic that hosts `alert-config-sk`
is responsible for retrieving the updated `config` object from the
`alert-config-sk` element (e.g., `element.config`) and persisting it
(e.g., by sending it to a backend API). `alert-config-sk` itself does
not handle the saving of the configuration to a persistent store.
This element aims to simplify the complex task of configuring alerts by
providing a structured and reactive interface, abstracting away the direct
manipulation of the underlying `Alert` JSON object for the end-user.
# Module: /modules/alerts-page-sk
## alerts-page-sk Module Documentation
### High-Level Overview
The `alerts-page-sk` module provides a user interface for managing and
configuring alerts within the Perf application. Users can view, create, edit,
and delete alert configurations. The page displays existing alerts in a table
and provides a dialog for detailed configuration of individual alerts. It
interacts with a backend API to fetch and persist alert data.
### Design Decisions and Implementation Choices
**Why a dedicated page for alerts?** Centralizing alert management provides a
clear and focused interface for users responsible for monitoring performance
metrics. This separation of concerns simplifies the overall application
structure and user experience.
**How are alerts displayed and managed?** Alerts are displayed in a tabular
format, offering a quick overview of key information like name, query, owner,
and status. Icons are used for common actions like editing and deleting,
enhancing usability. A modal dialog, utilizing the `<dialog>` HTML element and
the `alert-config-sk` component, is employed for focused editing of individual
alert configurations. This approach avoids cluttering the main page and provides
a dedicated space for detailed settings.
**Why use Lit for templating?** Lit is used for its efficient rendering and
component-based architecture. This allows for a declarative way to define the UI
and manage its state, making the code more maintainable and easier to
understand. The use of `html` tagged template literals provides a clean and
JavaScript-native way to write templates.
**How is user authorization handled?** The page checks if the logged-in user has
an 'editor' role. This is determined by fetching the user's status from
`/_/login/status`. Editing and creation functionalities are disabled if the user
lacks the necessary permissions, preventing unauthorized modifications. The
logged-in user's email is also pre-filled as the owner for new alerts.
**Why is `fetch-mock` used in the demo?** `fetch-mock` is utilized in the demo
(`alerts-page-sk-demo.ts`) to simulate backend API responses. This allows for
isolated testing and development of the frontend component without requiring a
running backend. It enables developers to define expected responses for various
API endpoints, facilitating a predictable environment for UI development and
testing.
**How are API interactions handled?** The component uses the `fetch` API to
communicate with the backend. Helper functions like `jsonOrThrow` and
`okOrThrow` are used to simplify response handling and error management.
Specific endpoints are used for listing (`/_/alert/list/...`), creating
(`/_/alert/new`), updating (`/_/alert/update`), and deleting
(`/_/alert/delete/...`) alerts.
**Why distinguish between "Alert" and "Component" in the UI?** The UI adapts to
display either an "Alert" field or an "Issue Tracker Component" field based on
the `window.perf.notifications` global setting. This allows the application to
integrate with different notification systems. If `markdown_issuetracker` is
configured, it links directly to the relevant issue tracker component.
### Responsibilities and Key Components/Files
- **`alerts-page-sk.ts`**: This is the core TypeScript file defining the
`AlertsPageSk` custom element.
- **Responsibilities**:
- Fetching and displaying a list of alert configurations.
- Providing functionality to create new alerts.
- Enabling editing of existing alerts through a modal dialog.
- Allowing deletion of alerts.
- Handling user authorization for edit/create operations.
- Managing the state of the "show deleted alerts" checkbox.
- Interacting with the backend API for all alert-related operations.
- Rendering the UI using Lit templates.
- **Key Methods**:
- `connectedCallback()`: Initializes the component by fetching initial
data (paramset and alert list).
- `list()`: Fetches and re-renders the list of alerts.
- `add()`: Initiates the creation of a new alert by fetching a default
configuration from the server and opening the edit dialog.
- `edit()`: Opens the edit dialog for an existing alert.
- `accept()`: Handles the submission of changes from the edit dialog,
sending an update request to the server.
- `delete()`: Sends a request to the server to delete an alert.
- `openOnLoad()`: Checks the URL for an alert ID on page load and, if
present, opens the edit dialog for that specific alert. This allows for
direct linking to an alert's configuration.
- **Key Properties**:
- `alerts`: An array holding the currently displayed alert configurations.
- `_cfg`: The `Alert` object currently being edited in the dialog.
- `isEditor`: A boolean indicating if the current user has editing
privileges.
- `dialog`: A reference to the HTML `<dialog>` element used for editing.
- `alertconfig`: A reference to the `alert-config-sk` element within the
dialog.
- **`alerts-page-sk.scss`**: Contains the SASS/CSS styles for the
`alerts-page-sk` element.
- **Responsibilities**: Defines the visual appearance of the alerts table,
buttons, dialog, and other UI elements within the page. It ensures a
consistent look and feel, including theming (dark mode).
- **`alerts-page-sk-demo.ts`**: Provides a demonstration and development
environment for the `alerts-page-sk` component.
- **Responsibilities**:
- Sets up `fetch-mock` to simulate backend API responses for
`/login/status`, `/_/count/`, `/_/alert/update`, `/_/alert/list/...`,
`/_/initpage/`, and `/_/alert/new`. This allows the component to be
developed and tested in isolation.
- Initializes global `window.perf` properties that might affect the
component's behavior (e.g., `key_order`, `display_group_by`,
`notifications`).
- Dynamically inserts `alerts-page-sk` elements into the demo HTML page.
- **`alerts-page-sk-demo.html`**: The HTML structure for the demo page.
- **Responsibilities**: Provides the basic HTML layout where the
`alerts-page-sk` component is rendered for demonstration purposes.
Includes an `<error-toast-sk>` for displaying error messages.
- **`alerts-page-sk_puppeteer_test.ts`**: Contains Puppeteer tests for the
`alerts-page-sk` component.
- **Responsibilities**: Performs automated UI testing, ensuring the
component renders correctly and basic interactions function as expected.
It takes screenshots for visual regression testing.
- **`index.ts`**: A simple entry point that imports and thereby registers the
`alerts-page-sk` custom element.
### Key Workflows
**1. Viewing Alerts:**
```
User navigates to the alerts page
|
V
alerts-page-sk.connectedCallback()
|
+----------------------+
| |
V V
fetch('/_/initpage/') fetch('/_/alert/list/false') // Fetch paramset and initial alert list
| |
V V
Update `paramset` Update `alerts` array
| |
+----------------------+
|
V
_render() // Lit renders the table with alerts
```
**2. Creating a New Alert:**
```
User clicks "New" button (if isEditor === true)
|
V
alerts-page-sk.add()
|
V
fetch('/_/alert/new') // Get a template for a new alert
|
V
Update `cfg` with the new alert template (owner set to current user)
|
V
dialog.showModal() // Show the alert-config-sk dialog
|
V
User fills in alert details in alert-config-sk
|
V
User clicks "Accept"
|
V
alerts-page-sk.accept()
|
V
cfg = alertconfig.config // Get updated config from alert-config-sk
|
V
fetch('/_/alert/update', { method: 'POST', body: JSON.stringify(cfg) }) // Send new alert to backend
|
V
alerts-page-sk.list() // Refresh the alert list
```
**3. Editing an Existing Alert:**
```
User clicks "Edit" icon next to an alert (if isEditor === true)
|
V
alerts-page-sk.edit() with the selected alert's data
|
V
Set `origCfg` (deep copy of current `cfg`)
Set `cfg` to the selected alert's data
|
V
dialog.showModal() // Show the alert-config-sk dialog pre-filled with alert data
|
V
User modifies alert details in alert-config-sk
|
V
User clicks "Accept"
|
V
alerts-page-sk.accept()
|
V
cfg = alertconfig.config // Get updated config
|
V
IF JSON.stringify(cfg) !== JSON.stringify(origCfg) THEN
fetch('/_/alert/update', { method: 'POST', body: JSON.stringify(cfg) }) // Send updated alert
|
V
alerts-page-sk.list() // Refresh list
ENDIF
```
**4. Deleting an Alert:**
```
User clicks "Delete" icon next to an alert (if isEditor === true)
|
V
alerts-page-sk.delete() with the selected alert's ID
|
V
fetch('/_/alert/delete/{alert_id}', { method: 'POST' }) // Send delete request
|
V
alerts-page-sk.list() // Refresh the alert list
```
**5. Toggling "Show Deleted Configs":**
```
User clicks "Show deleted configs" checkbox
|
V
alerts-page-sk.showChanged()
|
V
Update `showDeleted` property based on checkbox state
|
V
alerts-page-sk.list() // Fetches alerts based on the new `showDeleted` state
```
# Module: /modules/algo-select-sk
## Algo Select SK Module
The `algo-select-sk` module provides a custom HTML element that allows users to
select a clustering algorithm. This component is crucial for applications where
different clustering approaches might yield better results depending on the data
or the analytical goal.
### High-Level Overview
The core purpose of this module is to present a user-friendly way to switch
between available clustering algorithms, specifically "k-means" and "stepfit".
It encapsulates the selection logic and emits an event when the chosen algorithm
changes, allowing other parts of the application to react accordingly.
### Design and Implementation
The "why" behind this module is the need for a standardized and reusable UI
component for algorithm selection. Instead of each part of an application
implementing its own dropdown or radio buttons for algorithm choice,
`algo-select-sk` provides a consistent look and feel.
The "how" involves leveraging the `select-sk` custom element from the
`elements-sk` library to provide the actual dropdown functionality.
`algo-select-sk` builds upon this by:
1. **Defining specific algorithm options:** It hardcodes "k-means" and
"stepfit" as the available choices, along with descriptive tooltips.
2. **Managing state:** It uses an `algo` attribute (and corresponding property)
to store and reflect the currently selected algorithm.
3. **Emitting a custom event:** When the selection changes, it dispatches an
`algo-change` event with the new algorithm in the `detail` object. This
decoupling allows other components to listen for changes without direct
dependencies on `algo-select-sk`.
The choice to use `select-sk` as a base provides a consistent styling and
behavior aligned with other elements in the Skia infrastructure.
### Responsibilities and Key Components
- **`algo-select-sk.ts`**: This is the heart of the module.
- **`AlgoSelectSk` class**: This `ElementSk` subclass defines the custom
element's behavior.
- **`template`**: Uses `lit-html` to render the underlying `select-sk`
element with predefined `div` elements representing the algorithm
options ("K-Means" and "Individual" which maps to "stepfit"). The
`selected` attribute on these divs is dynamically updated based on the
current `algo` property.
- **`connectedCallback` and `attributeChangedCallback`**: Ensure the
element renders correctly when added to the DOM or when its `algo`
attribute is changed programmatically.
- **`_selectionChanged` method**: This is the event handler for the
`selection-changed` event from the inner `select-sk` element. When
triggered, it updates the `algo` property of `algo-select-sk` and then
dispatches the `algo-change` custom event. This is the primary mechanism
for communicating the selected algorithm to the outside world. `User
interacts with <select-sk> | V <select-sk> emits 'selection-changed'
event | V AlgoSelectSk._selectionChanged() is called | V Updates
internal 'algo' property | V Dispatches 'algo-change' event with { algo:
"new_value" }`
- **`algo` getter/setter**: Provides a programmatic way to get and set the
selected algorithm. The setter ensures that only valid algorithm values
('kmeans' or 'stepfit') are set, defaulting to 'kmeans' for invalid
inputs. This adds a layer of robustness.
- **`toClusterAlgo` function**: A utility function to validate and
normalize the input string to one of the allowed `ClusterAlgo` types.
This prevents invalid algorithm names from being propagated.
- **`AlgoSelectAlgoChangeEventDetail` interface**: Defines the structure
of the `detail` object for the `algo-change` event, ensuring type safety
for event consumers.
- **`algo-select-sk.scss`**: Provides minimal styling, primarily ensuring that
the cursor is a pointer when hovering over the element, indicating
interactivity. It imports shared color and theme styles.
- **`index.ts`**: A simple entry point that imports `algo-select-sk.ts`,
ensuring the custom element is defined and available for use when the module
is imported.
- **`algo-select-sk-demo.html` and `algo-select-sk-demo.ts`**: These files
provide a demonstration page for the `algo-select-sk` element.
- The HTML sets up a few instances of `algo-select-sk`, including one with
a pre-selected algorithm and one in dark mode, to showcase its
appearance.
- The TypeScript for the demo listens to the `algo-change` event from one
of the instances and displays the event detail in a `<pre>` tag. This
serves as a live example of how to consume the event.
- **`algo-select-sk_puppeteer_test.ts`**: Contains Puppeteer tests to verify
the component renders correctly and basic functionality. It checks for the
presence of the elements on the demo page and takes a screenshot for visual
regression testing.
The component is designed to be self-contained and easy to integrate. By simply
including the element in HTML and listening for the `algo-change` event,
developers can incorporate algorithm selection functionality into their
applications.
# Module: /modules/anomalies-table-sk
## Anomalies Table (`anomalies-table-sk`)
The `anomalies-table-sk` module provides a custom HTML element for displaying a
sortable and interactive table of performance anomalies. Its primary purpose is
to present anomaly data in a clear, actionable format, allowing users to quickly
identify, group, triage, and investigate performance regressions or
improvements.
### Key Responsibilities:
- **Displaying Anomalies:** Renders a list of `Anomaly` objects in a tabular
format. Each row represents an anomaly and displays key information such as
bug ID, revision range, test path, and metrics like delta percentage and
absolute delta.
- **Grouping Anomalies:** Automatically groups anomalies that share
overlapping revision ranges. This helps users identify related issues or
multiple manifestations of the same underlying problem. Groups can be
expanded or collapsed for better readability.
- **User Interaction:**
- **Sorting:** Allows users to sort the table by various columns (e.g.,
Bug ID, Revisions, Test, Delta %).
- **Selection:** Users can select individual anomalies or entire groups of
anomalies using checkboxes.
- **Bulk Actions:** Provides "Triage" and "Graph" buttons that operate on
the currently selected anomalies.
- **Triage Integration:** Integrates with `triage-menu-sk` to allow users to
assign bug IDs, mark anomalies as invalid or ignored, or reset their triage
state.
- **Navigation and Investigation:**
- Provides links to individual anomaly reports (e.g.,
`/u/?anomalyIDs=...`).
- Generates links to view graphs of selected anomalies in the multi-graph
explorer (`/m/...`).
- Links bug IDs to the configured bug tracking system (e.g.,
`/u/?bugID=...`).
- Allows unassociating a bug ID from an anomaly.
### Design and Implementation Choices:
- **LitElement for Web Component:** The component is built using LitElement, a
lightweight library for creating Web Components. This promotes
encapsulation, reusability, and interoperability with other web
technologies.
- **Client-Side Grouping:** Anomaly grouping based on revision range
intersection is performed client-side. This simplifies the backend and
provides immediate feedback to the user as they interact with the table. The
`groupAnomalies` method iterates through the anomaly list, merging anomalies
into existing groups if their revision ranges intersect, or creating new
groups otherwise.
- **Client-Side Sorting:** Sorting is handled by the `sort-sk` element, which
observes changes to data attributes on the table rows. This avoids server
roundtrips for simple sorting operations.
- **Selective Rendering:** The table is re-rendered (using `this._render()`)
only when necessary, such as when data changes, groups are
expanded/collapsed, or selections are updated. This improves performance.
- **`AnomalyGroup` Class:** A simple `AnomalyGroup` class is used to manage
collections of related anomalies and their expanded state. This provides a
clear structure for handling grouped data.
- **Popup for Triage:** The triage menu is presented in a popup to save screen
real estate and provide a focused interface for triage actions. The popup's
visibility is controlled by the `showPopup` boolean property.
- **Event-Driven Communication:** The component emits a custom event
`anomalies_checked` when the selection state of an anomaly changes. This
allows parent components or other parts of the application to react to user
selections.
- **API Integration for Graphing and Reporting:**
- When graphing multiple anomalies, it first calls the
`/_anomalies/group_report` backend API. This API is designed to provide
a consolidated view or a shared identifier (`sid`) for a group of
anomalies, which is then used to construct the graph URL. This is
preferred over constructing potentially very long URLs with many
individual anomaly IDs.
- For single anomaly graphing, it fetches additional time range
information via the same `group_report` API to provide context (one week
before and after the anomaly) in the graph.
- **Trace Formatting:** Uses `ChromeTraceFormatter` to correctly format trace
queries for linking to the graph explorer.
- **Styling:** SCSS is used for styling, importing shared styles from
`themes_sass_lib`, `buttons_sass_lib`, and `select_sass_lib` for a
consistent look and feel. Specific styles handle the appearance of
regression vs. improvement, expanded rows, and the triage popup.
### Key Files:
- **`anomalies-table-sk.ts`:** This is the core file containing the LitElement
class definition for `AnomaliesTableSk`. It implements all the logic for
rendering the table, handling user interactions, grouping anomalies, and
interacting with backend services for triage and graphing.
- `populateTable(anomalyList: Anomaly[])`: The primary method to load data
into the table. It triggers grouping and rendering.
- `generateTable()`, `generateGroups()`, `generateRows()`: Template
methods responsible for constructing the HTML structure of the table
using `lit-html`.
- `groupAnomalies()`: Implements the logic for grouping anomalies based on
overlapping revision ranges.
- `openReport()`: Handles the logic for generating a URL to graph the
selected anomalies, potentially calling the `/_anomalies/group_report`
API.
- `togglePopup()`: Manages the visibility of the triage menu popup.
- `anomalyChecked()`: Handles checkbox state changes and updates the
`checkedAnomaliesSet`.
- `openMultiGraphUrl()`: Constructs the URL for viewing an anomaly's trend
in the multi-graph explorer, fetching time range context via an API
call.
- **`anomalies-table-sk.scss`:** Contains the SCSS styles specific to the
anomalies table, defining its layout, appearance, and the styling for
different states (e.g., improvement, regression, expanded rows).
- **`index.ts`:** A simple entry point that imports and registers the
`anomalies-table-sk` custom element.
- **`anomalies-table-sk-demo.ts` and `anomalies-table-sk-demo.html`:** Provide
a demonstration page for the component, showcasing its usage with sample
data and interactive buttons to populate the table and retrieve checked
anomalies. The demo also sets up a global `window.perf` object with
configuration typically provided by the Perf application environment.
### Workflows:
**1. Displaying and Grouping Anomalies:**
```
[User Action: Page Load with Anomaly Data]
|
v
AnomaliesTableSk.populateTable(anomalyList)
|
v
AnomaliesTableSk.groupAnomalies()
|-> For each Anomaly in anomalyList:
| |-> Try to merge with existing AnomalyGroup (if revision ranges intersect)
| |-> Else, create new AnomalyGroup
|
v
AnomaliesTableSk._render()
|
v
[DOM Update: Table is rendered with grouped anomalies, groups initially collapsed]
```
**2. Selecting and Triaging Anomalies:**
```
[User Action: Clicks checkbox for an anomaly or group]
|
v
AnomaliesTableSk.anomalyChecked() or AnomalySk.toggleChildrenCheckboxes()
|-> Updates `checkedAnomaliesSet`
|-> Updates header checkbox state if needed
|-> Emits 'anomalies_checked' event
|-> Enables/Disables "Triage" and "Graph" buttons based on selection
|
v
[User Action: Clicks "Triage" button (if enabled)]
|
v
AnomaliesTableSk.togglePopup()
|-> Shows TriageMenuSk popup
|-> TriageMenuSk.setAnomalies(checkedAnomalies)
|
v
[User interacts with TriageMenuSk (e.g., assigns bug, marks invalid)]
|
v
TriageMenuSk makes API request (e.g., to /_/triage)
|
v
[Application reloads data or updates table based on triage result]
```
**3. Graphing Selected Anomalies:**
```
[User Action: Selects one or more anomalies]
|
v
[User Action: Clicks "Graph" button (if enabled)]
|
v
AnomaliesTableSk.openReport()
|
|--> If single anomaly selected:
| |-> window.open(`/u/?anomalyIDs={id}`, '_blank')
|
|--> If multiple anomalies selected:
|-> Call fetchGroupReportApi(idString)
| |-> POST to /_/anomalies/group_report with anomaly IDs
| |-> Receives response with `sid` (shared ID)
|
|-> window.open(`/u/?sid={sid}`, '_blank')
```
**4. Expanding/Collapsing an Anomaly Group:**
```
[User Action: Clicks expand/collapse button on a group row]
|
v
AnomaliesTableSk.expandGroup(anomalyGroup)
|-> Toggles `anomalyGroup.expanded` boolean
|
v
AnomaliesTableSk._render()
|
v
[DOM Update: Rows within the group are shown or hidden]
```
# Module: /modules/anomaly-sk
The `anomaly-sk` module provides a custom HTML element `<anomaly-sk>` and
related functionalities for displaying details about performance anomalies. It's
designed to present information about a specific anomaly, including its
severity, the affected revision range, and a link to the associated bug report.
A key utility function, `getAnomalyDataMap`, is also provided to process raw
anomaly data into a format suitable for plotting.
**Key Responsibilities and Components:**
- **`anomaly-sk.ts`**: This is the core file defining the `<anomaly-sk>`
custom element.
- **Why**: To encapsulate the logic and presentation of individual anomaly
details in a reusable web component. This promotes modularity and makes
it easy to integrate anomaly information into various parts of the Perf
application.
- **How**: It extends `ElementSk` and uses the `lit-html` library for
templating. It accepts an `Anomaly` object as a property and dynamically
renders a table displaying information like the score before and after
the anomaly, percentage change, revision range, improvement status, and
bug ID.
- It fetches commit details (hashes) using the `lookupCids` function from
the `cid` module to construct a clickable link to the commit range.
- It formats numbers and percentages for better readability.
- It handles different bug ID states (e.g., 0 for no bug, -1 for invalid
alert, -2 for ignored alert) by displaying appropriate text or a link to
the bug tracking system. The `bug_host_url` property allows
customization of the bug tracker URL.
- The `formatRevisionRange` method asynchronously fetches commit hashes
for the start and end revisions of the anomaly to create a link to the
commit range view. If `window.perf.commit_range_url` is not defined, it
simply displays the revision numbers.
- **`getAnomalyDataMap` (function in `anomaly-sk.ts`)**:
- **Why**: To transform raw trace data and anomaly information into a
structured format that can be easily consumed by plotting components
like `plot-simple-sk`. This function bridges the gap between the raw
data representation and the visual representation of anomalies on a
graph.
- **How**: It takes a `TraceSet` (a collection of traces),
`ColumnHeader[]` (representing commit points on the x-axis), an
`AnomalyMap` (mapping trace IDs and commit IDs to `Anomaly` objects),
and a list of `highlight_anomalies` IDs.
- It iterates through each trace in the `TraceSet`. If a trace has
anomalies listed in the `AnomalyMap`, it then iterates through those
anomalies.
- For each anomaly, it finds the corresponding x-coordinate by matching
the anomaly's commit ID (`cid`) with the `offset` in the `ColumnHeader`.
A crucial detail is that if an exact commit ID match isn't found in the
header (e.g., due to a data upload failure for that specific commit), it
will associate the anomaly with the _next available_ commit point. This
ensures that anomalies are still visualized even if their precise commit
data point is missing, rather than being omitted entirely.
- The y-coordinate is taken directly from the trace data at that
x-coordinate.
- It determines if an anomaly should be highlighted based on the
`highlight_anomalies` input.
- The output is an object where keys are trace IDs and values are arrays
of `AnomalyData` objects, each containing the `x`, `y` coordinates, the
`Anomaly` object itself, and a `highlight` flag.
```
Input:
TraceSet: { "traceA": [10, 12, 15*], ... } (*value at commit 101)
Header: [ {offset: 99}, {offset: 100}, {offset: 101} ]
AnomalyMap: { "traceA": { "101": AnomalyObjectA } }
HighlightList: []
getAnomalyDataMap
|
V
Output:
{
"traceA": [
{ x: 2, y: 15, anomaly: AnomalyObjectA, highlight: false }
],
...
}
```
- **`anomaly-sk.scss`**: This file contains the SCSS styles for the
`<anomaly-sk>` element.
- **Why**: To provide a consistent visual appearance for the anomaly
details table, aligning with the overall theme of the application
(`themes_sass_lib`).
- **How**: It defines basic table styling, such as text alignment and
padding for `th` and `td` elements within the `anomaly-sk` component.
- **`anomaly-sk-demo.html` and `anomaly-sk-demo.ts`**: These files set up a
demonstration page for the `<anomaly-sk>` element.
- **Why**: To provide a sandbox environment for developers to see the
component in action with various inputs and to facilitate isolated
testing and development.
- **How**: `anomaly-sk-demo.html` includes instances of `<anomaly-sk>`
with different IDs. `anomaly-sk-demo.ts` initializes these components
with sample `Anomaly` data. It also mocks the `/_/cid/` API endpoint
using `fetch-mock` to simulate responses for commit detail lookups,
which is crucial for the `formatRevisionRange` functionality to work in
the demo. Global `window.perf` configurations are also set up, as the
component relies on them (e.g., `commit_range_url`).
- **Test Files (`anomaly-sk_test.ts`, `anomaly-sk_puppeteer_test.ts`)**:
- **Why**: To ensure the correctness and reliability of the component's
logic and rendering.
- **`anomaly-sk_test.ts`**: Contains unit tests for the
`getAnomalyDataMap` function (verifying its mapping logic, especially
the handling of missing commit points) and for static utility methods
within `AnomalySk` like `formatPercentage` and the asynchronous
`formatRevisionRange`. It uses `fetch-mock` to control API responses for
CID lookups.
- **`anomaly-sk_puppeteer_test.ts`**: Contains browser-based integration
tests using Puppeteer. It verifies that the demo page renders correctly
and takes screenshots for visual regression testing.
**Workflow for Displaying an Anomaly:**
1. An `Anomaly` object is passed to the `anomaly` property of the
`<anomaly-sk>` element. `<anomaly-sk
.anomaly=${someAnomalyObject}></anomaly-sk>`
2. The `set anomaly()` setter in `AnomalySk` is triggered.
3. It calls `this.formatRevisionRange()` to asynchronously prepare the revision
range display.
- `formatRevisionRange` extracts `start_revision` and `end_revision`.
- It calls `lookupCids([start_rev_num, end_rev_num])` which makes a POST
request to `/_/cid/`.
- The response provides commit hashes.
- If `window.perf.commit_range_url` is set, it constructs an `<a>` tag
with the URL populated with the fetched hashes. Otherwise, it just
formats the revision numbers as text.
- The resulting `TemplateResult` is stored in `this._revision`.
4. `this._render()` is called, which re-renders the component's template.
5. The template (`AnomalySk.template`) displays the table:
- Score, Prior Score, Percent Change (calculated using
`getPercentChange`).
- Revision Range (using the `this.revision` template generated in step 3).
- Improvement status.
- Bug ID (formatted using `AnomalySk.formatBug`, potentially linking to
`this.bugHostUrl`).
This module effectively isolates the presentation and data transformation logic
related to individual anomalies, making it a maintainable and reusable piece of
the Perf frontend. The handling of potentially missing data points in
`getAnomalyDataMap` shows a robust design choice for dealing with real-world
data imperfections.
# Module: /modules/bisect-dialog-sk
## Bisect Dialog (`bisect-dialog-sk`)
The `bisect-dialog-sk` module provides a user interface element for initiating a
bisection process within the Perf application. This is specifically designed to
help pinpoint the commit that introduced a performance regression or
improvement, primarily for Chrome.
### Core Responsibility
The primary responsibility of this module is to present a dialog to the user,
pre-filled with relevant information extracted from a chart tooltip (e.g., when
a user identifies an anomaly in a performance graph). It allows the user to
confirm or modify these parameters and then submit a request to the backend to
start a bisection task.
### Why a Dedicated Dialog?
Performance analysis often involves identifying the exact change that caused a
shift in metrics. A manual bisection process can be tedious and error-prone.
This dialog streamlines this by:
1. **Pre-filling Data:** It leverages context from the chart (like the test
path and commit range) to pre-populate the necessary fields, reducing manual
data entry and potential mistakes.
2. **Structured Input:** It provides a clear form for all required parameters
for a bisection request, ensuring that the backend receives all necessary
information.
3. **User Authentication Awareness:** It integrates with the `alogin-sk` module
to fetch the logged-in user's email, which is a required parameter for the
bisect request.
4. **Feedback Mechanism:** It provides visual feedback to the user during the
submission process (e.g., a spinner) and communicates success or failure via
toast messages.
### How it Works
1. **Initialization and Pre-filling:**
- The dialog is typically instantiated and hidden until needed.
- When a user triggers a bisection (e.g., from a chart tooltip), the
`setBisectInputParams` method is called with details like the
`testPath`, `startCommit`, `endCommit`, `bugId`, `story`, and
`anomalyId`.
- These parameters are used to populate the input fields within the
dialog's form.
2. **User Interaction and Submission:**
- The `open()` method displays the modal dialog.
- The user can review and, if necessary, modify the pre-filled values
(e.g., adjust the commit range or add a bug ID). They can also provide
an optional patch to be applied during the bisection.
- Upon clicking the "Bisect" button, the `postBisect` method is invoked.
3. **Request Construction and API Call:**
- `postBisect` gathers the current values from the form fields.
- It parses the `testPath` to extract components like the `benchmark`,
`chart`, and `statistic`. The logic for deriving `chart` and `statistic`
involves checking the last part of the test name against a predefined
list of `STATISTIC_VALUES` (e.g., "avg", "count").
- A `CreateBisectRequest` object is constructed with all the necessary
parameters.
- A `fetch` call is made to the `/_/bisect/create` endpoint with the JSON
payload.
4. **Response Handling:**
- If the request is successful, a success message is typically displayed
(often as a toast by the calling context, as this dialog focuses on the
submission itself), and the dialog closes.
- If the request fails, an error message is displayed using
`errorMessage`, and the dialog remains open, allowing the user to
correct any issues or retry.
**Simplified Bisect Request Workflow:**
```
User Clicks Bisect Trigger (e.g., on chart)
|
V
Calling Code prepares `BisectPreloadParams`
|
V
`bisect-dialog-sk.setBisectInputParams(params)`
|
V
`bisect-dialog-sk.open()`
|
V
Dialog is Displayed (pre-filled)
|
V
User reviews/modifies data & Clicks "Bisect"
|
V
`bisect-dialog-sk.postBisect()`
|
V
`testPath` is parsed (extract benchmark, chart, statistic)
|
V
`CreateBisectRequest` object is built
|
V
`fetch POST /_/bisect/create` with request data
|
V
Handle API Response:
- Success -> Close dialog, Show success notification (external)
- Error -> Show error message, Keep dialog open
```
### Key Components and Files
- **`bisect-dialog-sk.ts`**: This is the core TypeScript file defining the
`BisectDialogSk` custom element.
- **`BisectDialogSk` class**: Extends `ElementSk` and manages the dialog's
state, rendering, and interaction logic.
- `BisectPreloadParams` interface: Defines the structure of the initial
data passed to the dialog.
- `template`: A lit-html template defining the dialog's HTML structure,
including input fields for test path, bug ID, start/end commits, story,
and an optional patch. It also includes a close icon, a spinner for
loading states, and submit/close buttons.
- `connectedCallback()`: Initializes the element, sets up property
upgrades, queries for DOM elements (dialog, form, spinner, button), and
attaches an event listener to the form's submit event. It also fetches
the logged-in user's status.
- `setBisectInputParams()`: Populates the internal state and input fields
with data provided externally.
- `open()`: Shows the modal dialog and ensures the submit button is
enabled.
- `closeBisectDialog()`: Closes the dialog.
- `postBisect()`: This is the heart of the submission logic. It:
- Activates the spinner and disables the submit button.
- Parses the `testPath` to extract various components required for the
bisect request (like `benchmark`, `chart`, `story`, `statistic`).
The logic for `chart` and `statistic` derivation is particularly
important here.
- Constructs the `CreateBisectRequest` payload.
- Makes a `POST` request to the `/_/bisect/create` endpoint.
- Handles the response, either closing the dialog on success or
displaying an error message on failure.
- `STATISTIC_VALUES`: A constant array used to determine if the last part
of a test name is a statistic (e.g., `avg`, `min`, `max`).
- **`bisect-dialog-sk.scss`**: Contains the SASS styles for the dialog,
ensuring it aligns with the application's theme. It styles the dialog
itself, input fields, and the footer elements.
- **`index.ts`**: A simple entry point that imports and thus registers the
`bisect-dialog-sk` custom element.
- **`BUILD.bazel`**: Defines the build rules for this module, specifying its
dependencies (SASS, TypeScript, other SK elements like `alogin-sk`,
`select-sk`, `spinner-sk`, `close-icon-sk`) and sources. The dependencies
highlight its reliance on common UI components and infrastructure modules
for features like login status and error messaging.
### Design Choices
- **Custom Element (`ElementSk`)**: Encapsulating the dialog as a custom
element promotes reusability and modularity. It can be easily integrated
into different parts of the Perf application where bisection capabilities
are needed.
- **`lit-html` for Templating**: Provides an efficient and declarative way to
define the dialog's HTML structure and update it based on its state.
- **Pre-computation of Request Parameters**: The dialog takes a "test path"
and derives several other parameters (benchmark, chart, statistic) from it.
This simplifies the input required from the user or the calling component,
as they only need to provide the full test identifier.
- **Specific to Chrome**: The comment "The bisect logic is only specific to
Chrome" indicates that the backend service this dialog interacts with
(`/_/bisect/create`) is tailored for Chrome's bisection infrastructure. The
`project: 'chromium'` in the request payload confirms this.
- **Error Handling**: The use of `jsonOrThrow` and `errorMessage` provides a
standard way to handle API errors and inform the user.
- **Spinner for Feedback**: The `spinner-sk` element gives visual feedback
during the asynchronous `fetch` operation, improving user experience.
# Module: /modules/calendar-input-sk
## Calendar Input Element (`calendar-input-sk`)
The `calendar-input-sk` module provides a user-friendly way to select dates. It
combines a standard text input field for manual date entry with a button that
reveals a `calendar-sk` element within a dialog for visual date picking. This
approach offers flexibility for users who prefer typing dates directly and those
who prefer a visual calendar interface.
### Responsibilities and Key Components
- **`calendar-input-sk.ts`**: This is the core file defining the
`CalendarInputSk` custom element.
- **Why**: It orchestrates the interaction between the text input, the
calendar button, and the pop-up calendar dialog. The goal is to provide
a seamless date selection experience.
- **How**:
- It uses a standard HTML `<input type="text">` element for direct date
input. A `pattern` attribute (`[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}`) and a
`title` are used to guide the user on the expected `YYYY-MM-DD` format.
An error indicator (`&cross;`) is shown if the input doesn't match the
pattern.
- A `<button>` element, styled with a `date-range-icon-sk`, triggers the
display of the calendar.
- A standard HTML `<dialog>` element is used to present the `calendar-sk`
element. This choice simplifies the implementation of modal behavior.
- The `openHandler` method is responsible for showing the dialog. It uses
a `Promise` to manage the asynchronous nature of user interaction with
the dialog (either selecting a date or canceling). This makes the event
handling logic cleaner and easier to follow.
- The `inputChangeHandler` is triggered when the user types into the text
field. It validates the input against the defined pattern. If valid, it
parses the date string and updates the `displayDate` property.
- The `calendarChangeHandler` is invoked when a date is selected from the
`calendar-sk` component within the dialog. It resolves the
aforementioned `Promise` with the selected date.
- The `dialogCancelHandler` is called when the dialog is closed without a
date selection (e.g., by pressing the "Cancel" button or the Escape
key). It rejects the `Promise`.
- An `input` custom event (of type `CustomEvent<Date>`) is dispatched
whenever the selected date changes, whether through the text input or
the calendar dialog. This allows parent components to react to date
selections.
- The `displayDate` property acts as the single source of truth for the
currently selected date. Setting this property will update both the text
input and the date displayed in the `calendar-sk` when it's opened.
- It leverages the `lit-html` library for templating, providing a
declarative way to define the element's structure and efficiently update
the DOM.
- The element extends `ElementSk`, inheriting common functionalities for
Skia custom elements.
- **`calendar-input-sk.scss`**: This file contains the styling for the
`calendar-input-sk` element.
- **Why**: To provide a consistent visual appearance that integrates well
with the Skia design system (themes).
- **How**: It uses SASS to define styles for the input field, the calendar
button, the error indicator, and the dialog. It leverages CSS variables
(e.g., `--error`, `--on-surface`, `--surface-1dp`) for theming, allowing
the component's appearance to adapt to different contexts (like dark
mode). The `.invalid` class is conditionally displayed based on the
input field's validity state using the `:invalid` pseudo-class.
- **`index.ts`**: This file simply imports and thereby registers the
`calendar-input-sk` custom element.
- **Why**: This is a common pattern for making custom elements available
for use in an application. It acts as the entry point for the component.
- **`calendar-input-sk-demo.html` / `calendar-input-sk-demo.ts`**: These files
constitute a demonstration page for the `calendar-input-sk` element.
- **Why**: To showcase the element's functionality, different states
(including invalid input and dark mode), and provide a simple way for
developers to interact with and understand the component. It also serves
as a testbed during development.
- **How**: The HTML file includes multiple instances of
`<calendar-input-sk>` in various configurations. The TypeScript file
initializes these instances, sets initial `displayDate` values, and
demonstrates how to listen for the `input` event. It also shows an
example of programmatically setting an invalid value in one of the input
fields.
### Key Workflows
**1. Selecting a Date via Text Input:**
```
User types "2023-10-26" into text input
|
V
inputChangeHandler in calendar-input-sk.ts
|
+-- (Input is valid: matches pattern "YYYY-MM-DD") --> Parse "2023-10-26" into a Date object
| |
| V
| Update _displayDate property
| |
| V
| Render component (updates input field's .value)
| |
| V
| Dispatch "input" CustomEvent<Date>
|
+-- (Input is invalid: e.g., "2023-") --> Do nothing (CSS shows error indicator)
```
**2. Selecting a Date via Calendar Dialog:**
```
User clicks calendar button
|
V
openHandler in calendar-input-sk.ts
|
V
dialog.showModal() is called
|
V
<dialog> with <calendar-sk> is displayed
|
+-- User selects a date in <calendar-sk> --> <calendar-sk> dispatches "change" event
| |
| V
| calendarChangeHandler in calendar-input-sk.ts
| |
| V
| dialog.close()
| |
| V
| Promise resolves with the selected Date
|
+-- User clicks "Cancel" button or presses Esc --> dialog dispatches "cancel" event
|
V
dialogCancelHandler in calendar-input-sk.ts
|
V
dialog.close()
|
V
Promise rejects
```
**If Promise resolves (date selected):**
```
openHandler continues after await
|
V
Update _displayDate property with the resolved Date
|
V
Render component (updates input field's .value)
|
V
Dispatch "input" CustomEvent<Date>
|
V
Focus on the text input field
```
The design emphasizes a clear separation of concerns: the `calendar-sk` handles
the visual calendar logic, while `calendar-input-sk` manages the integration of
text input and the dialog presentation. The use of a `Promise` in `openHandler`
simplifies the handling of the asynchronous dialog interaction, leading to more
readable and maintainable code.
# Module: /modules/calendar-sk
The `calendar-sk` module provides a custom HTML element `<calendar-sk>` that
displays an interactive monthly calendar. This element was created to address
limitations with the native HTML `<input type="date">` element, specifically its
lack of Safari support and the inability to style the pop-up calendar.
Furthermore, it aims to be more themeable and accessible than other existing web
component solutions like Elix.
The core philosophy behind `calendar-sk` is to provide a user-friendly,
accessible, and customizable date selection experience. Accessibility is a key
consideration, with design choices informed by WAI-ARIA practices for date
pickers. This includes keyboard navigation and appropriate ARIA attributes.
**Key Responsibilities and Components:**
- **`calendar-sk.ts`**: This is the heart of the module, defining the
`CalendarSk` custom element which extends `ElementSk`.
- **Rendering the Calendar:** It uses the `lit-html` library for
templating, dynamically generating the HTML for the calendar grid. The
calendar displays one month at a time.
- The main template (`CalendarSk.template`) constructs the overall table
structure, including navigation buttons for changing the year and month,
and headers for the year and month.
- `CalendarSk.rowTemplate` is responsible for rendering each week (row) of
the calendar.
- `CalendarSk.buttonForDateTemplate` creates the individual day buttons.
It handles logic for disabling buttons for dates outside the current
month and highlighting the selected date and today's date.
- **Date Management:**
- It internally manages a `_displayDate` (a JavaScript `Date` object)
which represents the currently selected or focused date.
- The `CalendarDate` class is a helper to simplify comparisons of year,
month, and date, as JavaScript `Date` objects can be tricky with
timezones and direct comparisons.
- Helper functions like `getNumberOfDaysInMonth` and
`firstDayIndexOfMonth` are used to correctly layout the days within the
grid.
- **Navigation:**
- Provides UI buttons (using `navigate-before-icon-sk` and
`navigate-next-icon-sk`) for incrementing/decrementing the month and
year. Methods like `incYear`, `decYear`, `incMonth`, and `decMonth`
handle the logic for updating `_displayDate` and re-rendering. A crucial
detail in month/year navigation is handling cases where the current day
(e.g., 31st) doesn't exist in the target month (e.g., February). In such
scenarios, the date is adjusted to the last valid day of the target
month.
- **Keyboard Navigation:**
- The `keyboardHandler` method implements navigation using arrow keys
(day/week changes) and PageUp/PageDown keys (month changes). This
handler is designed to be attached to a parent element (like a dialog or
the document) to allow for controlled event handling, especially when
multiple keyboard-interactive elements are on a page. When a key is
handled, it prevents further event propagation and focuses the newly
selected date button.
- **Internationalization (i18n):**
- Leverages `Intl.DateTimeFormat` to display month names and weekday
headers according to the specified `locale` property or the browser's
default locale. The `buildWeekDayHeader` method dynamically generates
these headers.
- **Events:**
- Dispatches a `change` custom event ( `CustomEvent<Date>`) whenever a new
date is selected by clicking on a day. The event detail contains the
selected `Date` object.
- **Theming:**
- The component is themeable through CSS custom properties, as defined in
`calendar-sk.scss`. It imports styles from
`//perf/modules/themes:themes_sass_lib` and
`//elements-sk/modules/styles:buttons_sass_lib`.
- **`calendar-sk.scss`**: This file contains the SASS/CSS styles for the
`<calendar-sk>` element. It defines the visual appearance of the calendar
grid, buttons, headers, and how selected or "today" dates are highlighted.
It relies on CSS variables (e.g., `--background`, `--secondary`,
`--surface-1dp`) for theming, allowing the look and feel to be customized by
the consuming application.
- **`calendar-sk-demo.html` and `calendar-sk-demo.ts`**: These files set up a
demonstration page for the `calendar-sk` element.
- `calendar-sk-demo.html` includes instances of the calendar, some in dark
mode and one configured for a different locale (`zh-Hans-CN`), to
showcase its versatility.
- `calendar-sk-demo.ts` initializes these calendar instances, sets their
initial `displayDate` and `locale`, and attaches event listeners to log
the `change` event. It also demonstrates how to hook up the
`keyboardHandler`.
- **`index.ts`**: A simple entry point that imports and thus registers the
`calendar-sk` custom element, making it available for use in HTML.
**Key Workflows:**
1. **Initialization and Rendering:** `ElementSk constructor` ->
`connectedCallback` -> `buildWeekDayHeader` -> `_render` (calls
`CalendarSk.template`)
- When the `<calendar-sk>` element is added to the DOM, its
`connectedCallback` is invoked.
- This triggers the initial rendering process, including building the
weekday headers based on the current locale.
- The main template then renders the calendar grid for the month of the
initial `_displayDate`.
2. **Date Selection (Click):** User clicks on a date button -> `dateClick`
method -> Updates `_displayDate` -> Dispatches `change` event with the new
`Date` -> `_render` (to update UI, e.g., highlight new selection)
User clicks a date button. `[date button]` --click--> `dateClick(event)` |
+--> `new Date(this._displayDate)` (create copy) | +-->
`d.setDate(event.target.dataset.date)` (update day) | +-->
`dispatchEvent(new CustomEvent<Date>('change', { detail: d }))` | +-->
`this._displayDate = d` | +--> `this._render()`
3. **Month/Year Navigation (Click):** User clicks "Previous Month" button ->
`decMonth` method -> Calculates new year, monthIndex, and date (adjusting
for days in month) -> Updates `_displayDate` with the new `Date` ->
`_render` (to display the new month/year)
User clicks "Previous Month" button. `[Previous Month button]` --click-->
`decMonth()` | +--> Calculate new year, month, date (adjusting for month
boundaries and days in month) | +--> `this._displayDate = new Date(newYear,
newMonthIndex, newDate)`| +-->`this.\_render()`
4. **Keyboard Navigation:** User presses "ArrowRight" while calendar (or its
container) has focus -> `keyboardHandler(event)` -> `case 'ArrowRight':
this.incDay();` -> `incDay` method updates `_displayDate` (e.g., from May 21
to May 22) -> `this._render()` -> `e.stopPropagation(); e.preventDefault();`
->
`this.querySelector<HTMLButtonElement>('button[aria-selected="true"]')!.focus();`
User presses ArrowRight key. `keydown event (ArrowRight)` --->
`keyboardHandler(event)` | + (matches `case 'ArrowRight'`) | +-->
`this.incDay()` | | | +--> `this.\_displayDate = new Date(year, monthIndex,
date + 1)`| | | +-->`this.\_render()`| +-->`event.stopPropagation()`|
+-->`event.preventDefault()` | +--> Focus the newly selected day button.
The use of zero-indexed months (`monthIndex`) internally, as is common with the
JavaScript `Date` object, is a deliberate choice for consistency with the
underlying API, though it requires careful handling to avoid off-by-one errors,
especially when calculating things like the number of days in a month.
# Module: /modules/chart-tooltip-sk
## chart-tooltip-sk Module Documentation
### Overview
The `chart-tooltip-sk` module provides a custom HTML element,
`<chart-tooltip-sk>`, designed to display detailed information about a specific
data point on a chart. This tooltip is intended to be interactive, offering
context-sensitive actions and information relevant to performance monitoring and
analysis. It can be triggered by hovering over or clicking on a chart point.
The design philosophy behind this module is to centralize the presentation of
complex data point information and related actions. Instead of scattering this
logic across various chart implementations, `chart-tooltip-sk` encapsulates it,
promoting reusability and maintainability. It aims to provide a rich user
experience by surfacing relevant details like commit information, anomaly
status, bug tracking, and actions like bisection or requesting further traces.
### Key Responsibilities and Components
The primary responsibility of `chart-tooltip-sk` is to render a tooltip with
relevant information and interactive elements based on the data point it's
associated with.
**Core Functionality & Design Choices:**
- **Data Loading and Display:**
- The `load()` method is the main entry point for populating the tooltip
with data. It accepts various parameters like the trace index, test
name, y-value, date, commit position, anomaly details, and bug
information. This comprehensive loading mechanism allows the parent
charting component (e.g., `explore-simple-sk`) to provide all necessary
context.
- It displays fundamental information such as the test name, data value,
units, and the date of the data point.
- **Why:** Consolidating data loading into a single method simplifies the
interface for parent components.
- **Commit Information:**
- The tooltip can display details about the commit associated with the
data point, including the author, message, and a link to the commit in
the version control system.
- The `fetch_details()` method is responsible for asynchronously
retrieving commit details using the `/_/cid/` endpoint. This is done to
avoid loading all commit details upfront for every point on a chart,
which could be performance-intensive.
- The `_always_show_commit_info` and `_skip_commit_detail_display` flags
(sourced from `window.perf`) allow for configurable display of commit
details, catering to different instance needs.
- **Why:** On-demand fetching of commit details optimizes initial load
times. Configuration flags provide flexibility for different deployment
scenarios.
- **Anomaly Detection and Triage:**
- If a data point is identified as an anomaly, the tooltip will highlight
this and display relevant anomaly metrics (e.g., median before/after,
percentage change).
- It integrates with `anomaly-sk` for consistent formatting of anomaly
data.
- It incorporates `triage-menu-sk` to allow users to triage new anomalies
(e.g., create bugs, mark as not a bug).
- If a bug is already associated with an anomaly, it displays the bug ID
and provides an option to unassociate it.
- **Why:** Centralizing anomaly display and triage actions within the
tooltip provides a focused user workflow.
- **Bug Association:**
- Integrates with `user-issue-sk` to display and manage Buganizer issues
linked to a data point (even if it's not a formal anomaly). Users can
associate existing bugs or create new ones.
- The `bug_host_url` (from `window.perf`) is used to construct links to
the bug tracking system.
- **Why:** Direct integration with bug tracking streamlines the process of
linking performance data to actionable issues.
- **Interactive Actions:**
- **Bisect:** Provides a "Bisect" button (if `_show_pinpoint_buttons` is
true, typically for Chromium instances) that opens `bisect-dialog-sk`.
This allows users to initiate a bisection to find the exact commit that
caused a regression.
- **Request Trace:** Offers a "Request Trace" button (also gated by
`_show_pinpoint_buttons`) that opens `pinpoint-try-job-dialog-sk`. This
is used to request more detailed trace data for a specific commit.
- **Point Links:** Integrates `point-links-sk` to show relevant links for
a data point based on instance configuration (e.g., links to V8 or
WebRTC specific commit ranges). This is configured via
`keys_for_commit_range` and `keys_for_useful_links` in `window.perf`.
- **JSON Source:** If enabled (`show_json_file_display` in `window.perf`),
it provides a way to view the raw JSON data for the point via
`json-source-sk`.
- **Why:** Placing these actions directly in the tooltip makes them easily
discoverable and accessible in the context of the selected data point.
- **Positioning and Visibility:**
- The `moveTo()` method handles the dynamic positioning of the tooltip
relative to the mouse cursor or the selected chart point. It
intelligently adjusts its position to stay within the viewport and avoid
overlapping critical chart elements.
- The tooltip can be "fixed" (typically on click) or transient (on hover).
A fixed tooltip remains visible and offers more interactive elements.
- **Why:** Smart positioning ensures the tooltip is always usable and
doesn't obstruct the underlying chart. The fixed/transient behavior
balances information density with unobtrusiveness.
- **Styling:**
- Uses SCSS for styling (`chart-tooltip-sk.scss`), including themes
imported from `//perf/modules/themes:themes_sass_lib`.
- Employs `md-elevation` for a Material Design-inspired shadow effect.
- **Why:** SCSS allows for organized and maintainable styles. Material
Design elements provide a consistent look and feel.
**Key Files:**
- **`chart-tooltip-sk.ts`:** The core TypeScript file defining the
`ChartTooltipSk` class, its properties, methods, and HTML template (using
`lit-html`). This is where the primary logic for data display, interaction
handling, and integration with sub-components resides.
- **`chart-tooltip-sk.scss`:** The SASS file containing the styles for the
tooltip element.
- **`index.ts`:** A simple entry point that imports and registers the
`chart-tooltip-sk` custom element.
- **`chart-tooltip-sk-demo.html` & `chart-tooltip-sk-demo.ts`:** Files for
demonstrating the tooltip's functionality. The demo sets up mock data and
`fetchMock` to simulate API responses, allowing isolated testing and
visualization of the component.
- **`BUILD.bazel`:** Defines how the element and its demo page are built,
including dependencies on other Skia Elements and Perf modules like
`anomaly-sk`, `commit-range-sk`, `triage-menu-sk`, etc.
**Workflow Example: Displaying Tooltip on Chart Point Click (Fixed Tooltip)**
```
User clicks a point on a chart
|
V
Parent Chart Component (e.g., explore-simple-sk)
1. Determines data for the clicked point (coordinates, commit, trace info).
2. Optionally fetches commit details if not already available.
3. Optionally checks its anomaly map for anomaly data.
4. Calls `chartTooltipSk.load(...)` with all relevant data,
setting `tooltipFixed = true` and providing a close button action.
5. Calls `chartTooltipSk.moveTo({x, y})` to position the tooltip.
|
V
chart-tooltip-sk
1. `load()` method populates internal properties (_test_name, _y_value, _commit_info, _anomaly, etc.).
2. `_render()` is triggered (implicitly or explicitly).
3. The lit-html template in `static template` is evaluated:
- Basic info (test name, value, date) is displayed.
- If `commit_info` is present, commit details (author, message, hash) are shown.
- If `_anomaly` is present:
- Anomaly metrics are displayed.
- If `anomaly.bug_id === 0`, `triage-menu-sk` is shown.
- If `anomaly.bug_id > 0`, bug ID is shown with an unassociate button.
- Pinpoint job links are shown if available.
- If `tooltip_fixed` is true:
- "Bisect" and "Request Trace" buttons are shown (if configured).
- `user-issue-sk` is shown (if not an anomaly).
- `json-source-sk` button/link is shown (if configured).
- The close icon is visible.
4. Child components like `commit-range-sk`, `point-links-sk`, `user-issue-sk`, `triage-menu-sk`
are updated with their respective data.
5. `moveTo()` positions the rendered `div.container` on the screen.
|
V
User interacts with buttons (e.g., "Bisect", "Triage", "Close")
|
V
chart-tooltip-sk or its child components handle the interaction
- e.g., clicking "Bisect" calls `openBisectDialog()`, which shows `bisect-dialog-sk`.
- e.g., clicking "Close" executes the `_close_button_action` passed during `load()`.
```
This modular approach ensures that `chart-tooltip-sk` is a self-contained,
feature-rich component for displaying detailed contextual information and
actions related to data points in performance charts.
# Module: /modules/cid
## CID Module Documentation
This module, `/modules/cid`, provides functionality for interacting with Commit
IDs (CIDs), which are also referred to as CommitNumbers. The primary purpose of
this module is to facilitate the retrieval of detailed commit information based
on a set of commit numbers and their corresponding sources.
### Design and Implementation
The core functionality revolves around the `lookupCids` function. This function
is designed to be a simple and efficient way to fetch commit details from a
backend endpoint.
**Why Asynchronous Operations?**
The lookup of commit information involves a network request to a backend service
(`/_/cid/`). Network requests are inherently asynchronous. Therefore,
`lookupCids` returns a `Promise`. This allows the calling code to continue
execution while the commit information is being fetched and to handle the
response (or any potential errors) when it becomes available. This non-blocking
approach is crucial for maintaining a responsive user interface or efficient
server-side processing.
**Why JSON for Data Exchange?**
JSON (JavaScript Object Notation) is used as the data format for both the
request and the response.
- **Request:** The input `cids` (an array of `CommitNumber` objects) is
serialized into a JSON string and sent in the body of the HTTP POST request.
JSON is a lightweight and widely supported format, making it ideal for
client-server communication.
- **Response:** The backend endpoint is expected to return a JSON response
conforming to the `CIDHandlerResponse` type. The `jsonOrThrow` utility
(imported from `../../../infra-sk/modules/jsonOrThrow`) is used to parse
this JSON response. This utility simplifies error handling by automatically
throwing an error if the response is not valid JSON or if the HTTP request
itself fails.
**Why POST Request?**
A POST request is used instead of a GET request for sending the `cids`. While
GET requests are often used for retrieving data, they are typically limited in
the amount of data that can be sent in the URL (e.g., through query parameters).
Since the number of `cids` to look up could be large, sending them in the
request body via a POST request is a more robust and scalable approach. The
`Content-Type: application/json` header informs the server that the request body
contains JSON data.
### Key Components and Files
- **`cid.ts`**: This is the sole TypeScript file in the module and contains
the implementation of the `lookupCids` function.
- **`lookupCids(cids: CommitNumber[]): Promise<CIDHandlerResponse>`**:
- **Responsibility**: Takes an array of `CommitNumber` objects and
asynchronously fetches detailed commit information for each from the
`/_/cid/` backend endpoint.
- **How it works**:
1. It constructs an HTTP POST request to the `/_/cid/` endpoint.
2. The `cids` array is converted into a JSON string and included as the
request body.
3. Appropriate headers (`Content-Type: application/json`) are set.
4. The `fetch` API is used to make the network request.
5. The response from the server is then processed by `jsonOrThrow`. If
the request is successful and the response is valid JSON, it
resolves the promise with the parsed `CIDHandlerResponse`.
Otherwise, it rejects the promise with an error.
- **Dependencies**:
- `jsonOrThrow` (from `../../../infra-sk/modules/jsonOrThrow`): For
robust JSON parsing and error handling.
- `CommitNumber`, `CIDHandlerResponse` (from `../json`): These are
type definitions that define the structure of the input commit
identifiers and the expected response from the backend.
### Workflow: Looking up Commit IDs
The typical workflow for using this module is as follows:
```
Caller | /modules/cid/cid.ts (lookupCids) | Backend Server (/_/cid/)
---------------------------|----------------------------------|-------------------------
1. Has array of CommitNumber objects.
| |
2. Calls `lookupCids(cids)`| |
`---------------------->`| |
| 3. Serializes `cids` to JSON. |
| 4. Creates POST request with JSON body.
| `--------------------------->`| 5. Receives POST request.
| | 6. Processes `cids`.
| | 7. Generates `CIDHandlerResponse`.
| `<---------------------------`| 8. Sends JSON response.
| 9. Receives response. |
| 10. `jsonOrThrow` parses response.|
| (Throws error on failure) |
| |
11. Receives Promise that | |
resolves with | |
`CIDHandlerResponse` | |
(or rejects with error).
`<----------------------`| |
```
# Module: /modules/cluster-lastn-page-sk
The `cluster-lastn-page-sk` module provides a user interface for testing and
configuring alert configurations by running them against a recent range of
commits. This allows users to "dry run" an alert to see what regressions it
would detect before saving it to run periodically.
**Core Functionality:**
The primary purpose of this module is to facilitate the iterative process of
defining effective alert configurations. Instead of deploying an alert and
waiting for it to trigger (potentially with undesirable results), users can
simulate its behavior on historical data. This helps in fine-tuning parameters
like the detection algorithm, radius, sparsity, and interestingness threshold.
**Key Components and Files:**
- **`cluster-lastn-page-sk.ts`**: This is the heart of the module, defining
the `ClusterLastNPageSk` custom element.
- **State Management**: It manages the current alert configuration
(`this.state`), the commit range (`this.domain`), and the results of the
dry run (`this.regressions`). It utilizes `stateReflector` to
potentially persist and restore parts of this state in the URL, allowing
users to share specific configurations or test setups.
- **User Interaction**: It handles user actions such as:
- Editing the alert configuration via a dialog (`alert-config-dialog`
which hosts an `alert-config-sk` element).
- Modifying the commit range using a `domain-picker-sk` element.
- Initiating the dry run (`run()` method).
- Saving the configured alert (`writeAlert()` method).
- Viewing details of detected regressions in a dialog
(`triage-cluster-dialog` which hosts a `cluster-summary2-sk` element).
- **API Communication**:
- Fetches initial data (paramset for alert configuration, default new
alert template) from `/_/initpage/` and `/_/alert/new` respectively.
- Sends the alert configuration and commit range to the `/_/dryrun/start`
endpoint to initiate the clustering and regression detection process. It
uses the `startRequest` utility from `../progress/progress` to handle
the asynchronous request and display progress.
- Sends the finalized alert configuration to `/_/alert/update` to save or
update it in the backend.
- **Rendering**: It uses `lit-html` for templating and dynamically renders
the UI based on the current state, including the controls, the progress
of a running dry run, and a table of detected regressions. The table
displays commit details (`commit-detail-sk`) and triage status
(`triage-status-sk`) for each detected regression.
- **Error Handling**: It displays error messages if the dry run or alert
saving fails.
- **`cluster-lastn-page-sk.html` (Demo Page)**: A simple HTML file that
includes the `cluster-lastn-page-sk` element and an `error-toast-sk` for
displaying global error messages. This is primarily used for demonstration
and testing purposes.
- **`cluster-lastn-page-sk-demo.ts`**: Sets up mock HTTP responses using
`fetch-mock` for the demo page. This allows the `cluster-lastn-page-sk`
element to function in isolation without needing a live backend. It mocks
endpoints like `/_/initpage/`, `/_/alert/new`, `/_/count/`, and
`/_/loginstatus/`.
- **`cluster-lastn-page-sk.scss`**: Provides the styling for the
`cluster-lastn-page-sk` element and its dialogs, ensuring a consistent look
and feel with the rest of the Perf application. It uses shared SASS
libraries for buttons and themes.
**Workflow for Testing an Alert Configuration:**
1. **Load Page**: User navigates to the page.
- `cluster-lastn-page-sk` fetches initial paramset and a default new alert
configuration.
```
User -> cluster-lastn-page-sk
cluster-lastn-page-sk -> GET /_/initpage/ (fetches paramset)
cluster-lastn-page-sk -> GET /_/alert/new (fetches default alert)
```
2. **Configure Alert**: User clicks the "Configure Alert" button.
- A dialog (`alert-config-dialog`) opens, showing `alert-config-sk`.
- User modifies alert parameters (algorithm, radius, query, etc.).
- User clicks "Accept".
- The `state` in `cluster-lastn-page-sk` is updated with the new
configuration.
```
User --clicks--> "Configure Alert" button
cluster-lastn-page-sk --shows--> alert-config-dialog
User --interacts with--> alert-config-sk
User --clicks--> "Accept"
alert-config-sk --updates--> cluster-lastn-page-sk.state
```
3. **(Optional) Adjust Commit Range**: User interacts with `domain-picker-sk`
to define the number of recent commits or a specific date range for the dry
run.
- `cluster-lastn-page-sk.domain` is updated.
4. **Run Dry Run**: User clicks the "Run" button.
- `cluster-lastn-page-sk` constructs a `RegressionDetectionRequest` using
the current alert `state` and `domain`.
- It sends this request to `/_/dryrun/start`.
- The UI shows a spinner and progress messages.
- As results (regressions) become available, they are displayed in a
table.
```
User --clicks--> "Run" button
cluster-lastn-page-sk --creates--> RegressionDetectionRequest
cluster-lastn-page-sk --POSTs to--> /_/dryrun/start (with request body)
(progress updates via startRequest callback)
Backend --processes & clusters-->
Backend --sends progress/results--> cluster-lastn-page-sk
cluster-lastn-page-sk --updates--> UI (regressions table, status messages)
```
5. **Review Results**: User examines the table of regressions.
- Each row shows a commit and the regressions (low/high) found at that
commit.
- User can click on a regression to open a `triage-cluster-dialog`
(showing `cluster-summary2-sk`) for more details.
- From the summary dialog, user can open related traces in the explorer
view.
6. **Iterate or Save**:
- If results are not satisfactory, user goes back to step 2 to adjust the
alert configuration and re-runs.
- If results are satisfactory, user clicks "Create Alert" (or "Update
Alert" if modifying an existing one).
- `cluster-lastn-page-sk` sends the current alert `state` to
`/_/alert/update`. `User --clicks--> "Create Alert" / "Update Alert"
button cluster-lastn-page-sk --POSTs to--> /\_/alert/update (with alert
config) Backend --saves/updates alert--> Backend --responds with ID-->
cluster-lastn-page-sk cluster-lastn-page-sk --updates--> UI (button text
might change to "Update Alert")`
**Design Decisions:**
- **Client-Side Dry Run Initiation**: The "dry run" is initiated from the
client, sending the full alert configuration. This allows immediate feedback
and iteration without needing to first save an incomplete or experimental
alert to the backend.
- **Component-Based UI**: The UI is built using custom elements (e.g.,
`alert-config-sk`, `domain-picker-sk`, `cluster-summary2-sk`). This promotes
modularity, reusability, and separation of concerns.
- **Asynchronous Operations with Progress**: Long-running operations like the
dry run are handled asynchronously with visual feedback (spinners, status
messages) provided by the `../progress/progress` utility, enhancing user
experience.
- **State Reflection**: Using `stateReflector` allows parts of the page's
state (like the alert configuration) to be encoded in the URL. This is
useful for sharing specific test scenarios or bookmarking them.
- **Dialogs for Focused Interaction**: Modal dialogs are used for alert
configuration and viewing regression summaries, preventing users from
interacting with the main page content while these tasks are in progress,
thus guiding their focus.
- **Mocking for Demo/Testing**: The demo page
(`cluster-lastn-page-sk-demo.ts`) heavily relies on `fetch-mock`. This
enables isolated development and testing of the UI component without a
backend dependency, which is crucial for frontend unit/integration tests and
local development.
# Module: /modules/cluster-page-sk
The `cluster-page-sk` module provides the user interface for Perf's trace
clustering functionality. This allows users to identify groups of traces that
exhibit similar behavior, which is crucial for understanding performance
regressions or improvements across different configurations and tests.
**Core Functionality and Design:**
The primary goal of this page is to allow users to define a set of traces and
then apply a clustering algorithm to them. The "why" behind this is to simplify
the analysis of large datasets by grouping related performance changes. Instead
of manually inspecting hundreds or thousands of individual traces, users can
focus on a smaller number of clusters, each representing a distinct performance
pattern.
The "how" involves several key components:
1. **Defining the Scope of Analysis:**
- **Commit Selection:** Users first select a central commit around which
the analysis will be performed. This is handled by
`commit-detail-picker-sk`. The clustering will typically look at commits
before and after this selected point. The `state.offset` property stores
the selected commit's offset.
- **Query:** Users define the set of traces to consider using a query
string. This is managed by `query-sk` and `paramset-sk`. The
`state.query` holds this query. The `query-count-sk` element provides
feedback on how many traces match the current query.
- **Time Range/Commit Radius:** Users can specify a "radius" (in terms of
number of commits) around the selected commit to include in the
analysis. This is stored in `state.radius`.
2. **Clustering Algorithm and Parameters:**
- **Algorithm Selection:** Users can choose the clustering algorithm
(e.g., k-means). This is facilitated by `algo-select-sk` and stored in
`state.algo`. The choice of algorithm impacts how clusters are formed
and what "similarity" means.
- **Number of Clusters (K):** For algorithms like k-means, the user can
suggest the number of clusters to find. A value of 0 typically means the
server will try to determine an optimal K. This is stored in `state.k`.
- **Interestingness Threshold:** Users can define a threshold for what
constitutes an "interesting" cluster, often based on the magnitude of
regression or step size. This is `state.interesting`.
- **Sparse Data Handling:** An option (`state.sparse`) allows users to
indicate if the data is sparse, meaning not all traces have data points
for all commits. This affects how the clustering algorithm processes
missing data.
3. **Executing the Clustering and Displaying Results:**
- **Initiating the Request:** The "Run" button triggers the clustering
process. The `start()` method constructs a `RegressionDetectionRequest`
object containing all the user-defined parameters. This request is sent
to the `/_/cluster/start` endpoint.
- **Background Processing and Progress:** Clustering can be a long-running
operation. The module uses the `progress` utility to manage the
asynchronous request. It displays a spinner (`spinner-sk`) and status
messages (`ele.status`, `ele.runningStatus`) to keep the user informed.
The `requestId` property tracks the active request.
- **Displaying Clusters:** Once the server responds, the
`RegressionDetectionResponse` contains a list of `FullSummary` objects.
Each `FullSummary` represents a discovered cluster. These are rendered
using multiple `cluster-summary2-sk` elements. This component is
responsible for visualizing the details of each cluster, including its
member traces and regression information.
- **Sorting Results:** Users can sort the resulting clusters by various
metrics (size, regression score, etc.) using `sort-sk`.
**State Management:**
The `cluster-page-sk` component maintains its internal state in a `State`
object. This includes user selections like the query, commit offset, algorithm,
and various parameters. Crucially, this state is reflected in the URL using the
`stateReflector` utility. This design decision ensures that:
- The page is bookmarkable: Users can save and share URLs that directly lead
to a specific clustering configuration and its results.
- Browser history (back/forward buttons) works as expected.
- The application state is serializable and easily reproducible.
The `stateHasChanged()` method is called whenever a piece of the state is
modified, triggering the `stateReflector` to update the URL and potentially
re-render the component.
**Key Files and Their Roles:**
- **`cluster-page-sk.ts`:** This is the main TypeScript file defining the
`ClusterPageSk` custom element. It orchestrates all the sub-components,
manages the application state, handles user interactions (e.g., button
clicks, input changes), makes API calls for clustering, and renders the
results. It defines the overall layout and logic of the clustering page.
- **`cluster-page-sk.html` (inferred, as it's a LitElement):** The HTML
template is defined within `cluster-page-sk.ts` using `lit-html`. This
template structures the page, embedding various custom elements for commit
selection, query building, algorithm choice, and result display.
- **`cluster-page-sk.scss`:** Provides the specific styling for the
`cluster-page-sk` element and its layout, ensuring a consistent look and
feel.
- **`index.ts`:** A simple entry point that imports and registers the
`cluster-page-sk` custom element, making it available for use in HTML.
- **`cluster-page-sk-demo.ts` & `cluster-page-sk-demo.html`:** These files set
up a demonstration page for the `cluster-page-sk` element.
`cluster-page-sk-demo.ts` uses `fetch-mock` to simulate API responses,
allowing the component to be developed and tested in isolation without
needing a live backend. This is crucial for rapid development and ensuring
the UI behaves correctly under various backend scenarios.
- **`State` class (within `cluster-page-sk.ts`):** Defines the structure of
the data that is persisted in the URL and drives the component's behavior.
It encapsulates all user-configurable options for the clustering process.
**Workflow Example: Performing a Cluster Analysis**
```
User Interaction | Component/State Change | Backend Interaction
-----------------------------------------|-------------------------------|---------------------
1. User navigates to the cluster page. | `ClusterPageSk` initializes. | Fetches initial paramset (`/_/initpage/`)
| `stateReflector` initializes |
| from URL or defaults. |
| |
2. User selects a commit. | `commit-detail-picker-sk` | (Potentially fetches commit details if not cached)
| emits `commit-selected`. |
| `state.offset` updates. |
| `stateHasChanged()` called. |
| |
3. User types a query (e.g., "config=gpu").| `query-sk` emits | (Potentially `/`_`/count/` to update trace count)
| `query-change`. |
| `state.query` updates. |
| `stateHasChanged()` called. |
| |
4. User selects an algorithm (e.g., kmeans).| `algo-select-sk` emits |
| `algo-change`. |
| `state.algo` updates. |
| `stateHasChanged()` called. |
| |
5. User adjusts advanced parameters | Input elements update |
(K, radius, interestingness). | corresponding `state` props. |
| `stateHasChanged()` called. |
| |
6. User clicks "Run". | `start()` method is called. | POST to `/_/cluster/start` with `RegressionDetectionRequest`
| `requestId` is set. | (This is a long-running request)
| Spinner becomes active. |
| |
7. Page periodically updates status. | `progress` utility polls for | GET requests to check progress.
| updates. |
| `ele.runningStatus` updates. |
| |
8. Clustering completes. | `progress` utility resolves. | Final response from `/_/cluster/start` (or progress endpoint)
| `summaries` array is populated| containing `RegressionDetectionResponse`.
| with cluster data. |
| `requestId` is cleared. |
| Spinner stops. |
| |
9. Results are displayed. | `ClusterPageSk` re-renders, |
| showing `cluster-summary2-sk` |
| elements for each cluster. |
```
This workflow highlights how user inputs are translated into state changes,
which then drive API requests and ultimately update the UI to present the
clustering results. The separation of concerns among various sub-components (for
query, commit selection, etc.) makes the main `cluster-page-sk` element more
manageable.
# Module: /modules/cluster-summary2-sk
The `cluster-summary2-sk` module provides a custom HTML element for displaying
detailed information about a cluster of performance test results. This includes
visualizing the trace data, showing regression statistics, and allowing users to
triage the cluster.
**Core Functionality and Design:**
The primary purpose of this element is to present a comprehensive summary of a
performance cluster. It aims to provide all necessary information for a user to
understand the nature of a performance change (regression or improvement) and
take appropriate action (e.g., filing a bug, marking it as expected).
Key design considerations include:
- **Data Visualization:** A `plot-simple-sk` element is used to display the
centroid trace of the cluster over time. This visual representation helps
users quickly grasp the trend and identify the point of change. An "x-bar"
can be displayed on the plot to highlight the specific commit where a step
change is detected.
- **Statistical Summary:** The element displays key statistics about the
cluster, such as its size, the regression factor, step size, and least
squares error. The labels and formatting of these statistics dynamically
adapt based on the `StepDetection` algorithm used (e.g., 'absolute',
'percent', 'mannwhitneyu'). This ensures that the presented information is
relevant and interpretable for the specific detection method.
- **Commit Details:** Integration with `commit-detail-panel-sk` allows users
to view details of the commit associated with the detected step point or any
selected point on the trace plot. This is crucial for correlating
performance changes with specific code modifications.
- **Triaging:** If not disabled via the `notriage` attribute, the element
includes a `triage2-sk` component. This allows authenticated users with
"editor" privileges to set the triage status (e.g., "positive", "negative",
"untriaged") and add a message. This functionality is essential for tracking
the investigation and resolution of performance issues.
- **Contextual Actions:** Buttons are provided to:
- "View on dashboard": Opens the current cluster view in a broader
explorer context, pre-filling relevant parameters like shortcut ID and
time range.
- "Word Cloud": Toggles the visibility of a `word-cloud-sk` element, which
displays a summary of the parameters that make up the traces in the
cluster. This helps in understanding the common characteristics of the
affected tests.
- A permalink is generated to directly link to the triage page for the
specific step point.
- **Interactive Exploration:** The `commit-range-sk` component allows users to
define a range around the detected step or a selected commit, facilitating
further investigation within the Perf application.
**Key Components and Their Roles:**
- **`cluster-summary2-sk.ts`**: This is the main TypeScript file defining the
`ClusterSummary2Sk` custom element.
- **`ClusterSummary2Sk` class:** Extends `ElementSk` and manages the
element's state, rendering, and event handling.
- **Data Properties (`full_summary`, `triage`, `alert`):** These
properties receive the core data for the cluster. When `full_summary` is
set, it triggers the rendering of the plot, statistics, and commit
details. The `alert` property determines the labels and formatting for
regression statistics. The `triage` property reflects the current triage
state.
- **Template (`template` static method):** Uses `lit-html` to define the
element's structure, binding data to various sub-components and display
areas.
- **Event Handling:**
- `open-keys`: Fired when the "View on dashboard" button is clicked,
providing details for opening the explorer.
- `triaged`: Fired when the triage status is updated, containing the
new status and the relevant commit information.
- `trace_selected`: Handles events from `plot-simple-sk` when a point
on the graph is clicked, triggering a lookup for the corresponding
commit details.
- **Helper Methods:**
- `statusClass()`: Determines the CSS class for the regression display
based on the severity (e.g., "high", "low").
- `permaLink()`: Generates a URL to the triage page focused on the
step point.
- `lookupCids()` (static): A static method (delegating to
`../cid/cid.ts`) used to fetch commit details based on commit
numbers.
- **`labelsForStepDetection`:** A crucial constant object that maps
different `StepDetection` algorithm names (e.g., 'percent',
'mannwhitneyu', 'absolute') to specific labels and number formatting
functions for the regression statistics. This ensures that the displayed
information is meaningful and correctly interpreted for the algorithm
used to detect the cluster.
- **`cluster-summary2-sk.html` (template, rendered by
`cluster-summary2-sk.ts`):** Defines the visual layout using HTML and
embedded custom elements. It uses a CSS grid for positioning the main
sections: regression summary, statistics, plot, triage status, commit
details, actions, and word cloud.
- **`cluster-summary2-sk.scss`**: Provides the styling for the element. It
defines how different sections are displayed, including styles for
regression severity (e.g., red for "high" regressions, green for "low"),
button appearances, and responsive behavior (hiding the plot on smaller
screens).
- **`cluster-summary2-sk-demo.html` and `cluster-summary2-sk-demo.ts`**: These
files set up a demonstration page for the `cluster-summary2-sk` element. The
`.ts` file provides mock data for `FullSummary`, `Alert`, and `TriageStatus`
to populate the demo instances of the element. It also demonstrates how to
listen for the `triaged` and `open-keys` custom events.
**Workflows:**
1. **Initialization and Data Display:**
- The host application provides `full_summary` (containing cluster data
and trace frame), `alert` (details of the alert that triggered this
cluster), and optionally `triage` (current triage status) properties to
the `cluster-summary2-sk` element.
- `set full_summary()`:
- Updates internal `summary` and `frame` data.
- Populates dataset attributes for sorting (e.g., `data-clustersize`).
- Clears and redraws the `plot-simple-sk` with the centroid trace from
`summary.centroid` and time labels from `frame.dataframe.header`.
- If a step point is identified and the status is not "Uninteresting",
an x-bar is placed on the plot at the corresponding commit.
- `lookupCids` is called to fetch and display details for the commit
at the step point in `commit-detail-panel-sk`.
- `set alert()`:
- Updates the `labels` used for displaying regression statistics based
on `alert.step` and `labelsForStepDetection`.
- `set triage()`:
- Updates the `triageStatus` and re-renders the triage controls.
- The element renders based on the provided data, displaying statistics,
plot, commit details, and triage controls.
```
Host Application cluster-summary2-sk
---------------- -------------------
[Set full_summary data] --> Process data
|
+-> plot-simple-sk (Draws trace)
|
+-> commit-detail-panel-sk (Shows step commit)
|
+-> Display stats (regression, size, etc.)
[Set alert data] ---------> Update regression labels/formatters
[Set triage data] --------> Update triage2-sk state
```
2. **User Triage:**
- User interacts with `triage2-sk` (selects status) and the message input
field.
- User clicks the "Update" button.
- `update()` method is called:
- An `ClusterSummary2SkTriagedEventDetail` object is created
containing the `step_point` (as `columnHeader`) and the current
`triageStatus`.
- A `triaged` custom event is dispatched with this detail.
- The host application listens for the `triaged` event to persist the
triage status.
```
User cluster-summary2-sk Host Application
---- ------------------- ----------------
Selects status ----> [triage2-sk updates value]
Types message ----> [Input updates value]
Clicks "Update" ---> update()
|
+-> Creates TriagedEventDetail
|
+-> Dispatches "triaged" event --> Listens and handles event
(e.g., saves to backend)
```
3. **Viewing on Dashboard:**
- User clicks the "View on dashboard" button.
- `openShortcut()` method is called:
- An `ClusterSummary2SkOpenKeysEventDetail` object is created with the
`shortcut` ID, `begin` and `end` timestamps from the frame, and the
`step_point` as `xbar`.
- An `open-keys` custom event is dispatched.
- The host application listens for `open-keys` and navigates the user to
the explorer view with the provided parameters.
```
User cluster-summary2-sk Host Application
---- ------------------- ----------------
Clicks "View on dash" --> openShortcut()
|
+-> Creates OpenKeysEventDetail
|
+-> Dispatches "open-keys" event --> Listens and handles event
(e.g., navigates to explorer)
```
The `cluster-summary2-sk` element plays a vital role in the Perf frontend by
providing a focused and interactive view for analyzing individual performance
regressions or improvements identified through clustering. Its integration with
plotting, commit details, and triaging makes it a key tool for performance
analysis workflows.
# Module: /modules/commit-detail-panel-sk
## Commit Detail Panel SK
**High-level Overview:**
The `commit-detail-panel-sk` module provides a custom HTML element
`<commit-detail-panel-sk>` designed to display a list of commit details. It
offers functionality to make these commit entries selectable and emits an event
when a commit is selected. This component is primarily used in user interfaces
where users need to browse and interact with a sequence of commits.
**Why and How:**
The core purpose of this module is to present commit information in a structured
and interactive way. Instead of simply displaying raw commit data, it leverages
the `commit-detail-sk` element (an external dependency) to render each commit
with relevant information like author, message, and a link to the commit.
The design decision to make commits selectable (via the `selectable` attribute)
enhances user interaction. When a commit is clicked in "selectable" mode, it
triggers a `commit-selected` custom event. This event carries detailed
information about the selected commit, including its index in the list, a
concise description, and the full commit object. This allows parent components
or applications to react to user selections and perform actions based on the
chosen commit (e.g., loading further details, navigating to a specific state).
The implementation uses Lit library for templating and rendering. The commit
data is provided via the `details` property, which expects an array of `Commit`
objects (defined in `perf/modules/json`). The component dynamically generates
table rows for each commit.
The visual appearance is controlled by `commit-detail-panel-sk.scss`. It defines
styles for the panel, including highlighting the selected row and adjusting
opacity based on the `selectable` state. The styling aims for a clean and
readable presentation of commit information.
A `hide` property is also available to conditionally show or hide the entire
commit list. This is useful for scenarios where the panel's visibility needs to
be controlled dynamically by the parent application.
**Key Components/Files:**
- **`commit-detail-panel-sk.ts`**: This is the heart of the module. It defines
the `CommitDetailPanelSk` class, which extends `ElementSk`.
- **Responsibilities**:
- Manages the list of `Commit` objects (`_details` property).
- Renders the list of commits as an HTML table using Lit templates
(`template` and `rows` static methods).
- Handles user clicks on table rows (`_click` method).
- When a commit is selected (and the `selectable` attribute is present),
it dispatches the `commit-selected` custom event with relevant commit
data.
- Manages the `selectable`, `selected`, and `hide` attributes and their
corresponding properties, re-rendering the component when these change.
- Integrates the `commit-detail-sk` element to display individual commit
details within each row.
- **`commit-detail-panel-sk.scss`**: This file contains the SASS styles for
the component.
- **Responsibilities**:
- Defines the visual appearance of the commit panel, including link
colors, table cell padding, and selected row highlighting.
- Adjusts the opacity and cursor style based on whether the panel is
`selectable`.
- Leverages theme variables (e.g., `--primary`, `--surface-1dp`) from
`//perf/modules/themes:themes_sass_lib` for consistent theming.
- **`commit-detail-panel-sk-demo.ts` and `commit-detail-panel-sk-demo.html`**:
These files provide a demonstration page for the component.
- **Responsibilities**:
- Illustrate how to use the `<commit-detail-panel-sk>` element in an HTML
page.
- Show examples of the component in both selectable and non-selectable
states, and in light/dark themes.
- Demonstrate how to provide commit data to the `details` property and how
to listen for the `commit-selected` event.
- **`index.ts`**: A simple entry point that imports and registers the
`commit-detail-panel-sk` custom element, making it available for use.
- **`BUILD.bazel`**: Defines how the module is built and its dependencies. For
instance, it declares `commit-detail-sk` as a runtime dependency and Lit as
a TypeScript dependency.
- **`commit-detail-panel-sk_puppeteer_test.ts`**: Contains Puppeteer tests to
verify the component's rendering and basic functionality.
**Key Workflows:**
1. **Initialization and Rendering:**
```
Parent Application --> Sets 'details' property of <commit-detail-panel-sk> with Commit[]
|
V
commit-detail-panel-sk.ts --> _render() is called
|
V
Lit template generates <table>
|
V
For each Commit in 'details':
Generates <tr> containing <commit-detail-sk .cid=Commit>
```
2. **Commit Selection (when `selectable` is true):** `User --> Clicks on a <tr>
in the <commit-detail-panel-sk> | V commit-detail-panel-sk.ts -->
_click(event) handler is invoked | V Determines the clicked commit's index
and data | V Sets 'selected' attribute/property to the index of the clicked
commit | V Dispatches 'commit-selected' CustomEvent with { selected: index,
description: string, commit: Commit } | V Parent Application --> Listens for
'commit-selected' event and processes the event.detail`
The design favors declarative attribute-based configuration (e.g., `selectable`,
`selected`) and event-driven communication for user interactions, which are
common patterns in web component development.
# Module: /modules/commit-detail-picker-sk
The `commit-detail-picker-sk` module provides a user interface element for
selecting a specific commit from a range of commits. It's designed to be a
reusable component that simplifies the process of commit selection within
applications that need to interact with commit histories.
**Core Functionality and Design:**
The primary purpose of `commit-detail-picker-sk` is to allow users to browse and
select a commit. This is achieved by presenting a button that, when clicked,
opens a dialog.
- **Button as Entry Point:** The button displays a summary of the currently
selected commit (author and message) or a default message like "Choose a
commit." This provides immediate context to the user. Clicking this button
triggers the opening of the selection dialog. `[Button: "Author - Commit
Message"] --- (click) ---> [Dialog Opens]`
- **Dialog for Selection:** The dialog is the main interaction point for
choosing a commit. It contains:
- `commit-detail-panel-sk`: This submodule is responsible for displaying
the list of commits fetched from the backend. Users can click on a
commit in this panel to select it.
- **Date Range Selection:** A `day-range-sk` component allows users to
specify a time window for fetching commits. This is crucial for
performance and usability, as it prevents loading an overwhelming number
of commits at once. When the date range changes, the component
automatically fetches the relevant commits. `[day-range-sk] -- (date
range change) --> [Fetch Commits for New Range] | V
[commit-detail-panel-sk updates]`
- **Spinner:** A `spinner-sk` element provides visual feedback to the user
while commits are being fetched, indicating that an operation is in
progress.
- **Close Button:** Allows the user to dismiss the dialog without making a
selection or after a selection is made.
**Data Flow and State Management:**
1. **Initialization:** When the component is first loaded, it initializes with
a default date range (typically the last 24 hours). It then fetches the
commits within this initial range.
2. **Fetching Commits:** The component makes a POST request to the
`/_/cidRange/` endpoint. The request body includes the `begin` and `end`
timestamps of the desired range and optionally the `offset` of a currently
selected commit (to ensure it's included in the results if it falls outside
the new range). `User Action (e.g., change date range) | V
[commit-detail-picker-sk] | V (Constructs RangeRequest: {begin, end,
offset}) POST /_/cidRange/ | V (Receives Commit[] array)
[commit-detail-picker-sk] | V (Updates internal 'details' array)
[commit-detail-panel-sk] (Re-renders with new commit list)`
3. **Commit Selection:** - When a user selects a commit in the `commit-detail-panel-sk`, the panel
emits a `commit-selected` event. - `commit-detail-picker-sk` listens for this event and updates its
internal `selected` index. - The dialog is then closed, and the main button's text updates to reflect
the new selection. - Crucially, `commit-detail-picker-sk` itself emits a `commit-selected`
event. This allows parent components to react to the user's choice. The
detail of this event is of type
`CommitDetailPanelSkCommitSelectedDetails`, containing information about
the selected commit. `[commit-detail-panel-sk] -- (internal click on a
commit) --> Emits 'commit-selected' (internal) | V
[commit-detail-picker-sk] -- (handles internal event) --> Updates
'selected' index Updates button text Closes dialog Emits
'commit-selected' (external)`
4. **External Selection (`selection` property):** The component exposes a
`selection` property (of type `CommitNumber`). If this property is set
externally, the component will attempt to fetch commits around that
`CommitNumber` and pre-select it in the panel.
**Key Files and Responsibilities:**
- **`commit-detail-picker-sk.ts`:** This is the core TypeScript file defining
the `CommitDetailPickerSk` custom element.
- **Why:** It orchestrates the interactions between the button, dialog,
`commit-detail-panel-sk`, and `day-range-sk`. It handles fetching commit
data, managing the selection state, and emitting the final
`commit-selected` event.
- **How:** It uses the Lit library for templating and rendering. It
defines methods for opening/closing the dialog (`open()`, `close()`),
handling range changes (`rangeChange()`), updating the commit list
(`updateCommitSelections()`), and processing selections from the panel
(`panelSelect()`). The `selection` getter/setter allows for programmatic
control of the selected commit.
- **`commit-detail-picker-sk.scss`:** Contains the SASS/CSS styles for the
component.
- **Why:** To provide a consistent visual appearance and layout for the
button and the dialog, ensuring it integrates well with the overall
application theme (e.g., light and dark modes via CSS variables like
`--on-background`, `--background`).
- **How:** It styles the `dialog` element, the buttons within it, and
ensures proper display and spacing of child components like
`day-range-sk`.
- **`commit-detail-picker-sk-demo.html` & `commit-detail-picker-sk-demo.ts`:**
These files provide a demonstration page for the component.
- **Why:** To showcase the component's functionality in isolation, making
it easier to test and understand its usage. The demo also includes
examples for light and dark themes.
- **How:** The HTML sets up basic page structure and placeholders for the
component. The TypeScript file initializes instances of
`commit-detail-picker-sk`, mocks the backend API call (`/_/cidRange/`)
using `fetch-mock` to provide sample commit data, and sets up an event
listener to display the `commit-selected` event details.
- **Dependencies:**
- `commit-detail-panel-sk`: Used within the dialog to list and allow
selection of individual commits. `commit-detail-picker-sk` passes the
fetched `details` (array of `Commit` objects) to this panel.
- `day-range-sk`: Used to allow the user to define the time window for
which commits should be fetched. Its `day-range-change` event triggers a
refetch in the picker.
- `spinner-sk`: Provides visual feedback during data loading.
- `ElementSk`: Base class from `infra-sk` providing common custom element
functionality.
- `jsonOrThrow`: Utility for parsing JSON responses and throwing an error
if parsing fails or the response is not OK.
- `errorMessage`: Utility for displaying error messages to the user.
The design focuses on encapsulation: the `commit-detail-picker-sk` component
manages its internal state (current range, fetched commits, selected index) and
exposes a clear interface for interaction (a button to open, a `selection`
property, and a `commit-selected` event). This makes it easy to integrate into
larger applications that require users to pick a commit from a potentially large
history.
# Module: /modules/commit-detail-sk
## commit-detail-sk
The `commit-detail-sk` module provides a custom HTML element
`<commit-detail-sk>` designed to display concise information about a single
commit. This element is crucial for user interfaces where presenting commit
details in a structured and interactive manner is necessary.
### Why
In applications dealing with version control systems, there's often a need to
display details of individual commits. This could be for reviewing changes,
navigating commit history, or linking to related actions like exploring code
changes, viewing clustered data, or triaging issues associated with a commit.
The `commit-detail-sk` element encapsulates this functionality, offering a
reusable and consistent way to present commit information.
### How
The core of the module is the `CommitDetailSk` class, which extends `ElementSk`.
This class defines the structure and behavior of the `<commit-detail-sk>`
element.
**Key Responsibilities and Components:**
- **`commit-detail-sk.ts`**: This is the heart of the module.
- It defines the `CommitDetailSk` custom element.
- The element takes a `Commit` object (defined in `perf/modules/json`) as
input via the `cid` property. This object contains details like the
commit hash, author, message, timestamp, and URL.
- The `template` function, using `lit-html`, defines the HTML structure of
the element. It displays:
- A truncated commit hash.
- The commit author.
- The time elapsed since the commit (human-readable, via `diffDate`).
- The commit message.
- It also renders a set of Material Design outlined buttons: "Explore",
"Cluster", "Triage", and "Commit". These buttons are intended to
navigate the user to different views or actions related to the specific
commit. The links for these buttons are dynamically generated based on
the commit hash and the `cid.url`.
- The `openLink` method handles the click events on these buttons, opening
the respective links in a new browser window/tab.
- `upgradeProperty` is used to ensure that the `cid` property is correctly
initialized if it's set before the element is fully connected to the
DOM.
- **`commit-detail-sk.scss`**: This file contains the styling for the
`<commit-detail-sk>` element.
- It defines styles for the layout, typography, and appearance of the
commit information and the action buttons.
- It utilizes CSS variables for theming (e.g., `--blue`, `--primary`),
allowing the component to adapt to different visual themes (light and
dark mode, as demonstrated in the demo).
- It includes styles from `//perf/modules/themes:themes_sass_lib` and
`//elements-sk/modules:colors_sass_lib` to ensure consistency with the
broader application's design system.
- **`commit-detail-sk-demo.html` and `commit-detail-sk-demo.ts`**: These files
provide a demonstration page for the `<commit-detail-sk>` element.
- The HTML sets up basic page structure and includes instances of
`<commit-detail-sk>` in both light and dark mode contexts.
- The TypeScript file initializes these demo elements with sample `Commit`
data. It also simulates a click on the element to potentially reveal
more details or actions if such functionality were implemented (though
in the current version, the "tip" div with buttons is always visible).
The `Date.now` function is mocked to ensure consistent output for the
`diffDate` calculation in the demo and tests.
**Workflow Example: Displaying Commit Information and Actions**
```
1. Application provides a `Commit` object.
e.g., { hash: "abc123...", author: "user@example.com", ... }
2. The `Commit` object is assigned to the `cid` property of a `<commit-detail-sk>` element.
<commit-detail-sk .cid=${commitData}></commit-detail-sk>
3. `CommitDetailSk` element renders:
[abc123...] - [user@example.com] - [2 days ago] - [Commit message]
+----------------------------------------------------------------+
| [Explore] [Cluster] [Triage] [Commit (link to commit source)] | <- Action buttons
+----------------------------------------------------------------+
4. User clicks an action button (e.g., "Explore").
5. `openLink` method is called with a generated URL (e.g., "/g/e/abc123...").
6. A new browser tab opens to the specified URL.
```
This design promotes reusability and separation of concerns. The element focuses
solely on presenting commit information and providing relevant action links,
making it easy to integrate into various parts of an application that need to
display commit details. The use of `lit-html` for templating allows for
efficient rendering and updates.
# Module: /modules/commit-range-sk
The `commit-range-sk` module provides a custom HTML element,
`<commit-range-sk>`, designed to display a link representing a range of commits
within a Git repository. This functionality is particularly useful in
performance analysis tools where identifying the specific commits that
introduced a performance regression or improvement is crucial.
**Core Functionality and Design:**
The primary purpose of `commit-range-sk` is to dynamically generate a URL that
points to a commit range viewer (e.g., a Git web interface like Gerrit or
GitHub). This URL is constructed based on a "begin" and an "end" commit.
- **Identifying the Commit Range:**
- The element takes a `trace` (an array of numerical data points, where
each point corresponds to a commit), a `commitIndex` (the index within
the `trace` array that represents the "end" commit of interest), and
`header` information (which maps trace indices to commit metadata like
`offset` or commit number).
- The "end" commit is directly determined by the `commitIndex` and the
`header`.
- The "begin" commit is found by iterating backward from the
`commitIndex - 1` in the `trace`. It skips over any entries marked with
`MISSING_DATA_SENTINEL` (indicating commits for which there's no data
point) until it finds a valid previous commit.
- This logic ensures that the range always spans from a commit with actual
data to the target commit, even if there are intermediate commits with
missing data.
- **Converting Commit Numbers to Hashes:**
- The commit range URL template, configured globally via
`window.perf.commit_range_url`, typically requires Git commit hashes
(SHAs) rather than internal commit numbers or offsets.
- The `commit-range-sk` element uses a `commitNumberToHashes` function to
perform this conversion.
- The default implementation, `defaultcommitNumberToHashes`, makes an
asynchronous call to a backend service (likely `/`/cid/``) by
invoking`lookupCids`from the`//perf/modules/cid:cid_ts_lib` module. This
service is expected to return the commit hashes corresponding to the
provided commit numbers.
- This design allows for testability by enabling the replacement of
`commitNumberToHashes` with a mock function during testing (as seen in
`commit-range-sk_test.ts`).
- **URL Construction and Display:**
- Once the "begin" and "end" commit numbers are identified and their
corresponding hashes are retrieved, the element populates the
`window.perf.commit_range_url` template. This template usually contains
placeholders like `{begin}` and `{end}` which are replaced with the
actual commit hashes.
- The displayed text for the link is also dynamically generated. If the
"begin" and "end" commits are not consecutive (i.e., there's at least
one commit between them, or the "begin" commit had to skip missing data
points), the text will show a range like "`<begin_offset + 1> -
<end_offset>`". Otherwise, it will just show the "`<end_offset>`". The
`+1` for the begin offset in a range is to ensure the displayed range
starts _after_ the last known good commit.
- The element supports two display modes controlled by the `showLinks`
property:
- If `showLinks` is `false` (default, or when the element is merely
hovered over in some UIs), only the text representing the commit(s) is
displayed.
- If `showLinks` is `true`, a fully formed hyperlink (`<a>` tag) is
rendered.
**Key Components/Files:**
- **`commit-range-sk.ts`**: This is the core file defining the `CommitRangeSk`
custom element.
- It extends `ElementSk`, a base class for custom elements in the Skia
infrastructure.
- It manages the state of the component through properties like `_trace`,
`_commitIndex`, `_header`, `_url`, `_text`, and `_commitIds`.
- The `recalcLink()` method is central to its operation. It's triggered
whenever relevant input properties (`trace`, `commitIndex`, `header`)
change. This method orchestrates the process of finding commit IDs,
converting them to hashes, and generating the URL and display text.
- `setCommitIds()` implements the logic for determining the start and end
commit numbers based on the input trace and header, handling missing
data points.
- It uses the `lit/html` library for templating, allowing for efficient
rendering and updates to the DOM.
- **`commit-range-sk-demo.ts` and `commit-range-sk-demo.html`**: These files
provide a demonstration page for the `commit-range-sk` element.
- `commit-range-sk-demo.ts` sets up a mock environment, including mocking
the `fetch` call to `/`/cid/``using`fetch-mock`. This is crucial for
demonstrating the element's behavior without needing a live backend.
- It also initializes the global `window.perf` object with necessary
configuration, such as the `commit_range_url` template.
- It then instantiates the `<commit-range-sk>` element and populates its
properties to showcase its functionality.
- **`commit-range-sk_test.ts`**: This file contains unit tests for the
`CommitRangeSk` element.
- It utilizes `chai` for assertions and `setUpElementUnderTest` for easy
instantiation of the element in a test environment.
- A key testing strategy involves overriding the `commitNumberToHashes`
method on the element instance to provide controlled hash values and
assert the correctness of the generated URL and text, especially in
scenarios involving `MISSING_DATA_SENTINEL`.
- **`BUILD.bazel`**: Defines how the module is built, its dependencies (e.g.,
`//infra-sk/modules/ElementSk`, `//perf/modules/json`, `lit`), and how the
demo page and tests are structured.
**Workflow Example: Generating a Commit Range Link**
1. **Initialization:**
- The application using `<commit-range-sk>` sets the global
`window.perf.commit_range_url` (e.g.,
`"http://example.com/range/{begin}/{end}"`).
- The `<commit-range-sk>` element is added to the DOM.
2. **Property Setting:**
- The application provides data to the element:
- `element.trace = [10, MISSING_DATA_SENTINEL, 12, 15];`
- `element.header = [{offset: C1}, {offset: C2}, {offset: C3},
{offset: C4}];`(where C1-C4 are commit numbers)
-`element.commitIndex = 3;`(points to the data`15`and commit`C4`)
- `element.showLinks = true;`
3. **`recalcLink()` Triggered:**
- Changing any of the above properties automatically calls `recalcLink()`.
4. **Determine Commit IDs (`setCommitIds()`):**
- End commit: `header[commitIndex].offset` => `C4`.
- Previous commit search:
- Start at `commitIndex - 1 = 2`. `trace[2]` is `12` (not missing).
So, `header[2].offset` => `C3`.
- `_commitIds` becomes `[C3, C4]`.
5. **Check if Range (`isRange()`):**
- Is `C3 + 1 === C4`? Let's assume `C3` and `C4` are not consecutive
(e.g., `C3=100`, `C4=102`). `isRange()` returns `true`.
- Text becomes: `"${C3 + 1} - ${C4}"` (e.g., `"101 - 102"`).
6. **Convert Commit IDs to Hashes (`commitNumberToHashes`):**
- `commitNumberToHashes([C3, C4])` is called.
- Internally, this likely makes a POST request to `/`/cid/``with`[C3,
C4]`.
- Backend returns: `{ commitSlice: [{hash: "hash_for_C3"}, {hash:
"hash_for_C4"}] }`.
- The function resolves with `["hash_for_C3", "hash_for_C4"]`.
7. **Construct URL:**
- `url = window.perf.commit_range_url` (e.g.,
`"http://example.com/range/{begin}/{end}"`)
- `url = url.replace('{begin}', "hash_for_C3")`
- `url = url.replace('{end}', "hash_for_C4")`
- `_url` becomes `"http://example.com/range/hash_for_C3/hash_for_C4"`.
8. **Render:**
- Since `showLinks` is true, the template becomes: `<a
href="http://example.com/range/hash_for_C3/hash_for_C4"
target="\_blank">101 - 102</a>` - The element updates its content with this HTML.
This workflow demonstrates how `commit-range-sk` encapsulates the logic for
finding relevant commits, converting their identifiers, and presenting a
user-friendly link to explore changes between them, abstracting away the
complexities of interacting with commit data and URL templates.
# Module: /modules/common
## Common Module
The `common` module houses utility functions and data structures that are shared
across various parts of the Perf application, particularly those related to data
visualization and testing. Its primary purpose is to promote code reuse and
maintain consistency in how data is processed and displayed.
### Responsibilities and Key Components
The module's responsibilities can be broken down into the following areas:
1. **Plot Data Construction and Formatting**:
- **Why**: Visualizing performance data often involves transforming raw
data into formats suitable for charting libraries (like Google Charts).
This process needs to be standardized to ensure plots are consistent and
correctly represent the underlying information.
- **How**:
- `plot-builder.ts`: This file is central to preparing data for
plotting.
- `convertFromDataframe`: This function is crucial for adapting data
organized in a `DataFrame` structure (where traces are rows) into a
format suitable for Google Charts, which typically expects data in
columns. It essentially transposes the `TraceSet`. The `domain`
parameter allows specifying whether the x-axis should represent
commit positions, dates, or both, providing flexibility in how
time-series data is visualized.
```
Input DataFrame (TraceSet):
TraceA: [val1, val2, val3]
TraceB: [valA, valB, valC]
Header: [commit1, commit2, commit3]
convertFromDataframe (domain='commit') ->
Output for Google Chart:
["Commit Position", "TraceA", "TraceB"]
[commit1_offset, val1, valA ]
[commit2_offset, val2, valB ]
[commit3_offset, val3, valC ]
```
- `ConvertData`: This function takes a `ChartData` object, which is a
more abstract representation of plot data (lines with x, y
coordinates and labels), and transforms it into the specific
array-of-arrays format required by Google Charts. This abstraction
allows other parts of the application to work with `ChartData`
without needing to know the exact details of the charting library's
input format.
```
Input ChartData:
xLabel: "Time"
lines: {
"Line1": [{x: t1, y: v1}, {x: t2, y: v2}],
"Line2": [{x: t1, y: vA}, {x: t2, y: vB}]
}
ConvertData ->
Output for Google Chart:
["Time", "Line1", "Line2"]
[t1, v1, vA ]
[t2, v2, vB ]
```
- `mainChartOptions` and `SummaryChartOptions`: These functions
provide pre-configured option objects for Google Line Charts. They
encapsulate common styling and behavior (like colors, axis
formatting, tooltip behavior, and null interpolation) to ensure a
consistent look and feel for different types of charts (main detail
charts vs. summary overview charts). This avoids repetitive
configuration and makes it easier to maintain visual consistency.
The options are also designed to adapt to the current theme
(light/dark mode) by using CSS custom properties.
- `defaultColors`: A predefined array of colors used for chart series,
ensuring a consistent and visually distinct palette.
2. **Plotting Utilities**:
- **Why**: Beyond basic data transformation, there are common tasks
related to preparing data specifically for plotting, such as associating
anomalies with data points and handling missing values.
- **How**:
- `plot-util.ts`: This file contains helper functions that build upon
`plot-builder.ts`.
- `CreateChartDataFromTraceSet`: This function serves as a
higher-level constructor for `ChartData`. It takes a raw `TraceSet`
(a dictionary where keys are trace identifiers and values are arrays
of numbers), corresponding x-axis labels (commit numbers or dates),
the desired x-axis format, and anomaly information. It then iterates
through the traces, constructs `DataPoint` objects (which include x,
y, and any associated anomaly), and organizes them into the
`ChartData` structure. A key aspect is its handling of
`MISSING_DATA_SENTINEL` to exclude missing points from the chart
data, relying on the charting library's interpolation. It also uses
`findMatchingAnomaly` to link anomalies to their respective data
points.
```
Input TraceSet:
"trace_foo": [10, 12, MISSING_DATA_SENTINEL, 15]
xLabels: [c1, c2, c3, c4]
Anomalies: { "trace_foo": [{x: c2, y: 12, anomaly: {...}}] }
CreateChartDataFromTraceSet ->
Output ChartData:
lines: {
"trace_foo": [
{x: c1, y: 10, anomaly: null},
{x: c2, y: 12, anomaly: {...}},
// Point for c3 is skipped due to MISSING_DATA_SENTINEL
{x: c4, y: 15, anomaly: null}
]
}
...
```
- `findMatchingAnomaly`: A utility to efficiently check if a given
data point (identified by its trace key, x-coordinate, and
y-coordinate) corresponds to a known anomaly. This is used by
`CreateChartDataFromTraceSet` to enrich data points with anomaly
details.
3. **Test Utilities**:
- **Why**: Writing effective unit and integration tests, as well as
creating demo pages, often requires mock data and simulated API
responses. Centralizing these test utilities avoids duplication and
makes tests easier to write and maintain.
- **How**:
- `test-util.ts`: This file provides functions to set up a common
testing and demo environment.
- `setUpExploreDemoEnv`: This is a comprehensive function that uses
`fetch-mock` to intercept various API calls that are typically made
by Perf frontend components (e.g., explore page, alert details). It
returns predefined, static responses for endpoints like
`/_/login/status`, `/_/initpage/...`, `/_/count/`, `/_/frame/start`,
`/_/defaults/`, `/_/status/...`, `/_/cid/`, `/_/details/`,
`/_/shortcut/get`, `/_/nextParamList/`, and `/_/shortcut/update`.
- The purpose of mocking these endpoints is to allow frontend
components to be tested or demonstrated in isolation, without
requiring a live backend. The mocked data is designed to be
representative of real API responses, enabling realistic testing
scenarios. For example, it provides sample `paramSet` data,
`DataFrame` structures, commit information, and default
configurations. This ensures that components relying on these API
calls behave predictably in a test or demo environment. The function
also checks for a `proxy_endpoint` cookie to avoid mocking if a real
backend is being proxied for development or demo purposes.
# Module: /modules/const
The `/modules/const` module serves as a centralized repository for constants
utilized throughout the Perf UI. Its primary purpose is to ensure consistency
and maintainability by providing a single source of truth for values that are
shared across different parts of the user interface.
A key design decision behind this module is to manage values that might also be
defined in the backend. This avoids potential discrepancies and ensures that
frontend and backend systems operate with the same understanding of specific
sentinel values or configurations.
The core responsibility of this module is to define and export these shared
constants.
One of the key components is the `const.ts` file. This file contains the actual
definitions of the constants. A notable constant defined here is
`MISSING_DATA_SENTINEL`.
The `MISSING_DATA_SENTINEL` constant (value: `1e32`) is critical for
representing missing data points within traces. The backend uses this specific
floating-point value to indicate that a sample is absent. The choice of `1e32`
is deliberate. JSON, the data interchange format used, does not natively support
`NaN` (Not a Number) or infinity values (`+/- Inf`). Therefore, a valid
`float32` that has a compact JSON representation and is unlikely to clash with
actual data values was chosen. It is imperative that this frontend constant
remains synchronized with the `MissingDataSentinel` constant defined in the
backend Go package `//go/vec32/vec`. This synchronization ensures that both the
UI and the backend correctly interpret missing data.
Any part of the Perf UI that needs to interpret or display trace data,
especially when dealing with potentially incomplete datasets, will rely on this
`MISSING_DATA_SENTINEL`. For instance, charting libraries or data table
components might use this constant to visually differentiate missing points or
to exclude them from calculations.
Workflow involving `MISSING_DATA_SENTINEL`:
Backend Data Generation --> Data contains `MissingDataSentinel` from
`//go/vec32/vec` | V Data Serialization (JSON) --> `1e32` is used for missing
data | V Frontend Data Fetching | V Frontend UI Component (e.g., a chart) | V UI
uses `MISSING_DATA_SENTINEL` from `/modules/const/const.ts` to identify missing
points | V Appropriate rendering (e.g., gap in a line chart, specific
placeholder in a table)
# Module: /modules/csv
The `/modules/csv` module provides functionality to convert `DataFrame` objects,
a core data structure representing performance or experimental data, into the
Comma Separated Values (CSV) format. This conversion is essential for users who
wish to export data for analysis in external tools, spreadsheets, or for
archival purposes.
The primary challenge in converting a `DataFrame` to CSV lies in representing
the potentially sparse and varied parameter sets associated with each trace
(data series) in a flat, tabular format. The `DataFrame` stores traces indexed
by a "trace ID," which is a string encoding of key-value pairs representing the
parameters that uniquely identify that trace.
The conversion process addresses this challenge through a multi-step approach:
1. **Parameter Key Consolidation**:
- The `parseIdsIntoParams` function takes an array of trace IDs and
transforms each ID string back into its constituent key-value parameter
pairs. This is achieved by leveraging the `fromKey` function from the
`//perf/modules/paramtools` module.
- The `allParamKeysSorted` function then iterates through all these parsed
parameter sets to identify the complete, unique set of all parameter
keys present across all traces. These keys are then sorted
alphabetically. This sorted list of unique parameter keys will form the
initial set of columns in the CSV, ensuring a consistent order and
comprehensive representation of all parameters.
_Pseudocode for parameter key consolidation:_
```
traceIDs = ["key1=valueA,key2=valueB", "key1=valueC,key3=valueD"]
parsedParams = {}
for each id in traceIDs:
parsedParams[id] = fromKey(id) // e.g., {"key1=valueA,key2=valueB": {key1:"valueA", key2:"valueB"}}
allKeys = new Set()
for each params in parsedParams.values():
for each key in params.keys():
allKeys.add(key)
sortedColumnNames = sorted(Array.from(allKeys)) // e.g., ["key1", "key2", "key3"]
```
2. **Header Row Generation**:
- The `dataFrameToCSV` function begins by constructing the header row of
the CSV.
- This row starts with the `sortedColumnNames` derived in the previous
step.
- It then appends column headers derived from the `DataFrame`'s `header`
property. Each element in `df.header` typically represents a point in
time (or a commit, build, etc.), and its `timestamp` field is converted
into an ISO 8601 formatted date string.
_Pseudocode for header row generation:_
```
csvHeader = sortedColumnNames
for each columnHeader in df.header:
csvHeader.push(new Date(columnHeader.timestamp * 1000).toISOString())
csvLines.push(csvHeader.join(','))
```
3. **Data Row Generation**:
- For each trace in the `df.traceset` (excluding "special\_" traces, which
are likely internal or metadata traces not intended for direct CSV
export):
- The corresponding parameter values for the `sortedColumnNames` are
retrieved. If a trace does not have a value for a particular
parameter key, an empty string is used, ensuring that each row has
the same number of columns corresponding to the parameter keys.
- The actual data points for the trace are then appended. The
`MISSING_DATA_SENTINEL` (defined in `//perf/modules/const`) is a
special value indicating missing data; this is converted to an empty
string in the CSV to represent a null or missing value. Other
numerical values are appended directly.
- Each fully constructed row is then joined by commas.
_Pseudocode for data row generation:_
```
for each traceId, traceData in df.traceset:
if traceId starts with "special_":
continue
traceParams = parsedParams[traceId]
rowData = []
for each columnName in sortedColumnNames:
rowData.push(traceParams[columnName] or "") // Add parameter value or empty string
for each value in traceData:
if value is MISSING_DATA_SENTINEL:
rowData.push("")
else:
rowData.push(value)
csvLines.push(rowData.join(','))
```
4. **Final CSV String Assembly**:
- Finally, all the generated lines (header and data rows) are joined
together with newline characters (`\n`) to produce the complete CSV
string.
The design prioritizes creating a CSV that is both human-readable and easily
parsable by other tools. By dynamically determining the parameter columns based
on the input `DataFrame` and sorting them, it ensures that all relevant trace
metadata is included in a consistent manner. The explicit handling of
`MISSING_DATA_SENTINEL` ensures that missing data is represented clearly as
empty fields.
The key files in this module are:
- `index.ts`: This file contains the core logic for the CSV conversion. It
houses the `parseIdsIntoParams`, `allParamKeysSorted`, and the main
`dataFrameToCSV` functions. It leverages helper functions from
`//perf/modules/paramtools` for parsing trace ID strings and relies on
constants from `//perf/modules/const` for identifying missing data.
- `index_test.ts`: This file provides unit tests for the `dataFrameToCSV`
function. It defines a sample `DataFrame` with various scenarios, including
different parameter sets per trace and missing data points, and asserts that
the generated CSV matches the expected output. This is crucial for ensuring
the correctness and robustness of the CSV generation logic.
The dependencies on `//perf/modules/const` (for `MISSING_DATA_SENTINEL`) and
`//perf/modules/json` (for `DataFrame`, `ColumnHeader`, `Params` types) indicate
that this module is tightly integrated with the broader data representation and
handling mechanisms of the Perf system. The dependency on
`//perf/modules/paramtools` (for `fromKey`) highlights its role in interpreting
the structured information encoded within trace IDs.
# Module: /modules/dataframe
The `dataframe` module is designed to manage and manipulate time-series data,
specifically performance testing traces, within the Perf application. It
provides a centralized way to fetch, store, and process trace data, enabling
functionalities like visualizing performance trends, identifying anomalies, and
managing user-reported issues.
The core idea is to have a reactive data repository that components can consume.
This allows for efficient data loading and updates, especially when dealing with
large datasets and dynamic time ranges. Instead of each component fetching and
managing its own data, they can rely on a shared `DataFrameRepository` to handle
these tasks. This promotes consistency and reduces redundant data fetching.
## Key Components and Responsibilities
### `dataframe_context.ts`
This file defines the `DataFrameRepository` class, which acts as the central
data store and manager. It's implemented as a LitElement
(`<dataframe-repository-sk>`) that doesn't render any UI itself but provides
data and loading states through Lit contexts.
**Why a LitElement with Contexts?** Using a LitElement allows easy integration
into the existing component-based architecture. Lit contexts (`@lit/context`)
provide a clean and reactive way for child components to consume the `DataFrame`
and related information without prop drilling or complex event bus
implementations.
**Core Functionalities:**
- **Data Fetching:**
- `resetTraces(range, paramset)`: Fetches an initial set of traces based
on a time range and a `ParamSet` (a set of key-value pairs defining the
traces to query). This is typically called when the user defines a new
query. `User defines query -> explore-simple-sk calls resetTraces() | V
DataFrameRepository -> Fetches data from /_/frame/start | V Updates
internal _header, _traceset, anomaly, userIssues | V Provides DataFrame,
DataTable, AnomalyMap, UserIssueMap via context`
- `extendRange(offsetInSeconds)`: Fetches additional data to extend the
current time range, either forwards or backwards. This is used for
infinite scrolling or when the user wants to see more data. To improve
performance for large range extensions, it slices the requested range
into smaller chunks (`chunkSize`) and fetches them concurrently. `User
scrolls/requests more data -> UI calls extendRange() | V
DataFrameRepository -> Slices range into chunks if needed | V Fetches
data for each chunk from /_/frame/start concurrently | V Merges new data
with existing _header, _traceset, anomaly | V Provides updated
DataFrame, DataTable, AnomalyMap via context`
- The fetching mechanism uses the `/_/frame/start` endpoint, sending a
`FrameRequest` which includes the time range, query (derived from
`ParamSet`), and timezone.
- It handles responses, including potential errors or "Finished" status
with no data (e.g., no commits in the requested range).
- **Data Caching and Merging:**
- Maintains an internal representation of the data: `_header` (array of
`ColumnHeader` objects, representing commit points/timestamps) and
`_traceset` (a `TraceSet` object mapping trace keys to their data
arrays).
- When new data is fetched (either initial load or extension), it's merged
with the existing cached data. The merging logic ensures that headers
are correctly ordered and trace data is appropriately prepended or
appended. If a trace being extended isn't present in a new data chunk,
it's padded with `MISSING_DATA_SENTINEL` to maintain alignment with the
header.
- **Anomaly Management:**
- Fetches anomaly data (`AnomalyMap`) along with the trace data.
- `updateAnomalies(anomalies, id)`: Allows merging new anomalies and
removing specific anomalies (e.g., when an anomaly is nudged or
re-triaged). This uses `mergeAnomaly` and `removeAnomaly` from
`index.ts`.
- **User-Reported Issue Management:**
- `getUserIssues(traceKeys, begin, end)`: Fetches user-reported issues
(e.g., Buganizer bugs linked to specific data points) from the
`/_/user_issues/` endpoint for a given set of traces and commit range.
- `updateUserIssue(traceKey, commitPosition, bugId)`: Updates the local
cache of user issues, typically after a new issue is filed or an
existing one is modified.
- Trace keys are normalized by removing special functions (e.g., `norm()`)
before querying for user issues to ensure issues are found even if the
displayed trace is a transformed version of the original.
- **Google DataTable Conversion:**
- Converts the internal `DataFrame` into a
`google.visualization.DataTable` format using `convertFromDataframe`
(from `perf/modules/common:plot-builder_ts_lib`). This `DataTable` is
then provided via `dataTableContext` and is typically consumed by
charting components like `<plot-google-chart-sk>`.
- The Google Chart library is loaded asynchronously
(`DataFrameRepository.loadPromise`).
- **State Management:**
- `loading`: A boolean provided via `dataframeLoadingContext` to indicate
if a data request is in flight.
- `_requestComplete`: A Promise that resolves when the current data
fetching operation completes. This can be used to coordinate actions
that depend on data being available.
**Contexts Provided:**
- `dataframeContext`: Provides the current `DataFrame` object.
- `dataTableContext`: Provides the `google.visualization.DataTable` derived
from the `DataFrame`.
- `dataframeAnomalyContext`: Provides the `AnomalyMap` for the current data.
- `dataframeUserIssueContext`: Provides the `UserIssueMap` for the current
data.
- `dataframeLoadingContext`: Provides a boolean indicating if data is
currently being loaded.
- `dataframeRepoContext`: Provides the `DataFrameRepository` instance itself,
allowing consumers to call its methods (e.g., `extendRange`).
### `index.ts`
This file contains utility functions for manipulating `DataFrame` structures,
similar to its Go counterpart (`//perf/go/dataframe/dataframe.go`). These
functions are crucial for merging, slicing, and analyzing the data.
**Key Functions:**
- `findSubDataframe(header, range, domain)`: Given a `DataFrame` header and a
time/offset range, this function finds the start and end indices within the
header that correspond to the given range. This is essential for slicing
data.
- `generateSubDataframe(dataframe, range)`: Creates a new `DataFrame`
containing only the data within the specified index range of the original
`DataFrame`.
- `mergeAnomaly(anomaly1, ...anomalies)`: Merges multiple `AnomalyMap` objects
into a single one. If anomalies exist for the same trace and commit, the
later ones in the arguments list will overwrite earlier ones. It always
returns a non-null `AnomalyMap`.
- `removeAnomaly(anomalies, id)`: Creates a new `AnomalyMap` excluding any
anomalies with the specified `id`. This is used when an anomaly is moved or
re-triaged on the backend, and the old entry needs to be cleared.
- `findAnomalyInRange(allAnomaly, range)`: Filters an `AnomalyMap` to include
only anomalies whose commit positions fall within the given commit range.
- `mergeColumnHeaders(a, b)`: Merges two arrays of `ColumnHeader` objects,
producing a new sorted array of unique headers. It also returns mapping
objects (`aMap`, `bMap`) that indicate the new index of each header from the
original arrays. This is fundamental for the `join` operation.
- **Why map objects?** When merging traces from two DataFrames, the data
points need to be placed at the correct positions in the newly merged
header. The maps provide this correspondence.
- `join(a, b)`: Combines two `DataFrame` objects into a new one.
1. It first merges their headers using `mergeColumnHeaders`.
2. Then, it creates a new `traceset`. For each trace in the original
DataFrames, it uses the `aMap` and `bMap` to place the trace data points
into the correct slots in the new, longer trace arrays, filling gaps
with `MISSING_DATA_SENTINEL`.
3. It also merges the `paramset` from both DataFrames.
4. **Purpose:** This is useful when combining data from different sources
or different time periods that might not perfectly align.
- `buildParamSet(d)`: Reconstructs the `paramset` of a `DataFrame` based on
the keys present in its `traceset`. This ensures the `paramset` accurately
reflects the data.
- `timestampBounds(df)`: Returns the earliest and latest timestamps present in
the `DataFrame`'s header.
### `traceset.ts`
This file provides utility functions for extracting and formatting information
from the trace keys within a `DataFrame` or `DataTable`. Trace keys are strings
that encode various parameters (e.g.,
`",benchmark=Speedometer,test=MotionMark,"`).
**Key Functions:**
- `getAttributes(df)`: Extracts all unique attribute keys (e.g., "benchmark",
"test") present across all trace keys in a `DataFrame`.
- `getTitle(dt)`: Identifies the common key-value pairs across all trace
labels in a `DataTable`. These common pairs form the "title" of the chart,
representing what all displayed traces have in common.
- **Why `DataTable` input?** This function is often used directly with the
`DataTable` that feeds a chart, as column labels in the `DataTable` are
typically the trace keys.
- `getLegend(dt)`: Identifies the key-value pairs that are _not_ common across
all trace labels in a `DataTable`. These differing parts form the "legend"
for each trace, distinguishing them from one another.
- It ensures that all legend objects have the same set of keys (sorted
alphabetically), filling in missing values with `"untitled_key"` for
consistency in display.
- `titleFormatter(title)`: Formats the output of `getTitle` (an object) into a
human-readable string, typically by joining values with '/'.
- `legendFormatter(legend)`: Formats the output of `getLegend` (an array of
objects) into an array of human-readable strings.
- `getLegendKeysTitle(label)`: Takes a legend object (for a single trace) and
creates a string by joining its keys, often used as a title for the legend
section.
- `isSingleTrace(dt)`: Checks if a `DataTable` contains data for only a single
trace (i.e., has 3 columns: domain, commit position/date, and one trace).
- `findTraceByLabel(dt, legendTraceId)`: Finds the column label (trace key) in
a `DataTable` that matches the given `legendTraceId`.
- `findTracesForParam(dt, paramKey, paramValue)`: Finds all trace labels in a
`DataTable` that contain a specific key-value pair.
- `removeSpecialFunctions(key)`: A helper used internally to strip function
wrappers (like `norm(...)`) from trace keys before processing, ensuring that
the underlying parameters are correctly parsed.
**Design Rationale for Title/Legend Generation:** When multiple traces are
plotted, the title should reflect what's common among them (e.g.,
"benchmark=Speedometer"), and the legend should highlight what's different
(e.g., "test=Run1" vs. "test=Run2"). These functions automate this process by
analyzing the trace keys.
## Workflows
### Initial Data Load and Display
```
1. User navigates to a page or submits a query.
|
V
2. <explore-simple-sk> (or similar component) determines initial time range and ParamSet.
|
V
3. Calls `dataframeRepository.resetTraces(initialRange, initialParamSet)`.
|
V
4. DataFrameRepository:
a. Sets `loading = true`.
b. Constructs `FrameRequest`.
c. POSTs to `/_/frame/start`.
d. Receives `FrameResponse` (containing DataFrame and AnomalyMap).
e. Updates its internal `_header`, `_traceset`, `anomaly`.
f. Calls `setDataFrame()`:
i. Updates `this.dataframe` (triggers `dataframeContext`).
ii. Converts DataFrame to `google.visualization.DataTable`.
iii. Updates `this.data` (triggers `dataTableContext`).
g. Updates `this.anomaly` (triggers `dataframeAnomalyContext`).
h. Sets `loading = false`.
|
V
5. Charting components (consuming `dataTableContext`) re-render with the new data.
|
V
6. Other UI elements (consuming `dataframeContext`, `dataframeAnomalyContext`) update.
```
### Extending Time Range (e.g., Scrolling)
```
1. User action triggers a request to load more data (e.g., scrolls near edge of chart).
|
V
2. UI component calls `dataframeRepository.extendRange(offsetInSeconds)`.
|
V
3. DataFrameRepository:
a. Sets `loading = true`.
b. Calculates the new time range (`deltaRange`).
c. Slices the new range into chunks if `offsetInSeconds` is large (`sliceRange`).
d. For each chunk:
i. Constructs `FrameRequest`.
ii. POSTs to `/_/frame/start`.
e. `Promise.all` awaits all chunk responses.
f. Filters out empty/error responses and sorts responses by timestamp.
g. Merges `header` and `traceset` from sorted responses into existing `_header` and `_traceset`.
- For traceset: pads with `MISSING_DATA_SENTINEL` if a trace is missing in a new chunk.
h. Merges `anomalymap` from sorted responses into existing `anomaly`.
i. Calls `setDataFrame()` (as in initial load).
j. Sets `loading = false`.
|
V
4. Charting components and other UI elements update.
```
### Displaying Chart Title and Legend
```
1. Charting component (e.g., <perf-explore-sk>) has access to the `DataTable` via `dataTableContext`.
|
V
2. It calls `getTitle(dataTable)` and `getLegend(dataTable)` from `traceset.ts`.
|
V
3. It then uses `titleFormatter` and `legendFormatter` to get displayable strings.
|
V
4. Renders these strings as the chart title and legend series.
```
## Testing
- `dataframe_context_test.ts`: Tests the `DataFrameRepository` class. It uses
`fetch-mock` to simulate API responses from `/_/frame/start` and
`/_/user_issues/`. Tests cover initialization, data loading (`resetTraces`),
range extension (`extendRange`) with and without chunking, anomaly merging,
and user issue fetching/updating.
- `index_test.ts`: Tests the utility functions in `index.ts`, such as
`mergeColumnHeaders`, `join`, `findSubDataframe`, `mergeAnomaly`, etc. It
uses manually constructed `DataFrame` objects to verify the logic of these
data manipulation functions.
- `traceset_test.ts`: Tests the functions in `traceset.ts` for extracting
titles and legends from trace keys. It generates `DataFrame` objects with
various key combinations, converts them to `DataTable` (requiring Google
Chart API to be loaded), and then asserts the output of `getTitle`,
`getLegend`, etc.
- `test_utils.ts`: Provides helper functions for tests, notably:
- `generateFullDataFrame`: Creates mock `DataFrame` objects with specified
structures, which is invaluable for setting up consistent test
scenarios.
- `generateAnomalyMap`: Creates mock `AnomalyMap` objects linked to a
`DataFrame`.
- `mockFrameStart`: A utility to easily mock the `/_/frame/start` endpoint
with `fetch-mock`, returning parts of a provided full `DataFrame` based
on the request's time range.
- `mockUserIssues`: Mocks the `/_/user_issues/` endpoint.
The testing strategy relies heavily on creating controlled mock data and API
responses to ensure that the data processing and fetching logic behaves as
expected under various conditions.
# Module: /modules/day-range-sk
The `day-range-sk` module provides a custom HTML element for selecting a date
range. It allows users to pick a "begin" and "end" date, which is a common
requirement in applications that deal with time-series data or event logging.
The primary goal of this module is to offer a user-friendly way to define a time
interval. It achieves this by composing two `calendar-input-sk` elements, one
for the start date and one for the end date. This design choice leverages an
existing, well-tested component for date selection, promoting code reuse and
consistency.
**Key Components and Responsibilities:**
- **`day-range-sk.ts`**: This is the core file defining the `DayRangeSk`
custom element.
- **Why**: It encapsulates the logic for managing the begin and end dates,
handling user interactions, and emitting an event when the range
changes.
- **How**:
- It extends `ElementSk`, a base class for custom elements, providing
lifecycle callbacks and rendering capabilities.
- It uses the `lit-html` library for templating, rendering two
`calendar-input-sk` elements labeled "Begin" and "End".
- The `begin` and `end` dates are stored as attributes (and corresponding
properties) representing Unix timestamps in seconds. This is a common
and unambiguous way to represent points in time.
- When either `calendar-input-sk` element fires an `input` event
(signifying a date change), the `DayRangeSk` element updates its
corresponding `begin` or `end` attribute and then dispatches a custom
event named `day-range-change`.
- The `day-range-change` event's `detail` object contains the `begin` and
`end` timestamps, allowing parent components to easily consume the
selected range.
- Default values for `begin` and `end` are set if not provided: `begin`
defaults to 24 hours before the current time, and `end` defaults to the
current time. This provides a sensible initial state.
- The `connectedCallback` and `attributeChangedCallback` are used to
ensure the element renders correctly when added to the DOM or when its
attributes are modified.
- **`day-range-sk.scss`**: This file contains the styling for the
`day-range-sk` element.
- **Why**: To provide a consistent visual appearance and integrate with
the application's theming.
- **How**: It imports common theme variables (`themes.scss`) and defines
specific styles for the labels and input fields within the
`day-range-sk` component, ensuring they adapt to light and dark modes.
- **`day-range-sk-demo.html` and `day-range-sk-demo.ts`**: These files provide
a demonstration page for the `day-range-sk` element.
- **Why**: To showcase the element's functionality, allow for interactive
testing, and serve as an example of how to use it.
- **How**:
- The HTML file includes instances of `day-range-sk` with different
initial `begin` and `end` attributes.
- The TypeScript file listens for the `day-range-change` event from these
instances and displays the event details in a `<pre>` tag, demonstrating
how to retrieve the selected date range.
- **`day-range-sk_puppeteer_test.ts`**: This file contains Puppeteer tests for
the `day-range-sk` element.
- **Why**: To ensure the element renders correctly and behaves as expected
in a browser environment.
- **How**: It uses the `loadCachedTestBed` utility to set up a testing
environment, navigates to the demo page, and takes screenshots for
visual regression testing. It also performs a basic smoke test to
confirm the element is present on the page.
**Key Workflows:**
1. **Initialization:** `User HTML` -> `day-range-sk (attributes: begin, end)`
`day-range-sk.connectedCallback()` `IF begin/end not set` `Set default begin
(now - 24h), end (now)` `_render()` `Create two <calendar-input-sk> elements
with initial dates`
2. **User Selects a New "Begin" Date:** `User interacts with "Begin"
<calendar-input-sk>` `<calendar-input-sk> fires "input" event (with new
Date)` `day-range-sk._beginChanged(event)` `Update this.begin (convert Date
to timestamp)` `this._sendEvent()` `Dispatch "day-range-change" event with {
begin: new_begin_timestamp, end: current_end_timestamp }`
3. **User Selects a New "End" Date:** `User interacts with "End"
<calendar-input-sk>` `<calendar-input-sk> fires "input" event (with new
Date)` `day-range-sk._endChanged(event)` `Update this.end (convert Date to
timestamp)` `this._sendEvent()` `Dispatch "day-range-change" event with {
begin: current_begin_timestamp, end: new_end_timestamp }`
4. **Parent Component Consumes Date Range:** `Parent Component` `Listen for
"day-range-change" on <day-range-sk>` `On event:` `Access
event.detail.begin` `Access event.detail.end` `Perform actions with the new
date range`
The conversion between `Date` objects (used by `calendar-input-sk`) and numeric
timestamps (used by `day-range-sk`'s attributes and events) is handled
internally by the `dateFromTimestamp` utility function and by using
`Date.prototype.valueOf() / 1000`. This design ensures that the `day-range-sk`
element exposes a simple, numeric API for its date range while leveraging a more
complex date object-based component for the UI.
# Module: /modules/domain-picker-sk
The `domain-picker-sk` module provides a custom HTML element
`<domain-picker-sk>` that allows users to select a data domain. This domain can
be defined in two ways: either as a specific date range or as a number of data
points (commits) preceding a chosen end date. This flexibility is crucial for
applications that need to visualize or analyze time-series data where users
might want to focus on a specific period or view the most recent N data points.
The core design choice is to offer these two distinct modes of domain selection,
catering to different user needs. The "Date Range" mode is useful when users
know the specific start and end dates they are interested in. The "Dense" mode
is more suitable when users want to see a fixed amount of recent data,
regardless of the specific start date.
The component's state is managed internally and can also be set externally via
the `state` property. This `state` object, defined by the `DomainPickerState`
interface, holds the `begin` and `end` timestamps (in Unix seconds), the
`num_commits` (for "Dense" mode), and the `request_type` which indicates the
current selection mode (0 for "Date Range" - `RANGE`, 1 for "Dense" - `DENSE`).
**Key Files and Their Responsibilities:**
- **`domain-picker-sk.ts`**: This is the heart of the module. It defines the
`DomainPickerSk` class, which extends `ElementSk`.
- **Why**: It encapsulates all the logic for rendering the UI, handling
user interactions, and managing the component's state.
- **How**:
- It uses the `lit-html` library for templating, allowing for efficient
updates to the DOM when the state changes. The `template` static method
defines the basic structure, and `_showRadio` and `_requestType` static
methods conditionally render different parts of the UI based on the
current `request_type` and the `force_request_type` attribute.
- It manages the `_state` object. Initial default values are set in the
constructor (e.g., end date is now, begin date is 24 hours ago, default
`num_commits` is 50).
- Event handlers like `typeRange`, `typeDense`, `beginChange`,
`endChange`, and `numChanged` update the internal `_state` and then call
`render()` to reflect these changes in the UI.
- The `force_request_type` attribute (`'range'` or `'dense'`) allows the
consuming application to lock the picker into a specific mode, hiding
the radio buttons that would normally allow the user to switch. This is
useful when the application context dictates a specific type of domain
selection. The `attributeChangedCallback` and the getter/setter for
`force_request_type` handle this.
- It leverages other custom elements: `radio-sk` for mode selection and
`calendar-input-sk` for date picking, promoting modularity and reuse.
- **`domain-picker-sk.scss`**: This file contains the SASS styles for the
component.
- **Why**: It separates the presentation from the logic, making the
component easier to style and maintain.
- **How**: It defines styles for the layout of controls (e.g., using
flexbox to align items), descriptive text, input fields, and the
calendar input. It also imports shared styles from
`elements-sk/modules/styles` for consistency (e.g., buttons, colors).
- **`index.ts`**: A simple entry point that imports and registers the
`domain-picker-sk` custom element.
- **Why**: This is a common pattern for web components, making it easy for
other parts of the application to import and use the component.
- **How**: It executes `import './domain-picker-sk';` which ensures the
`DomainPickerSk` class is defined and registered with the browser's
`CustomElementRegistry` via the `define` function call within
`domain-picker-sk.ts`.
- **`domain-picker-sk-demo.html` and `domain-picker-sk-demo.ts`**: These files
provide a demonstration page for the component.
- **Why**: They allow developers to see the component in action, test its
different states and attributes, and serve as a basic example of how to
use it.
- **How**: `domain-picker-sk-demo.html` includes instances of
`<domain-picker-sk>`, some with the `force_request_type` attribute set.
`domain-picker-sk-demo.ts` initializes the `state` of these demo
instances with sample data.
- **`domain-picker-sk_puppeteer_test.ts`**: Contains Puppeteer tests for the
component.
- **Why**: To ensure the component renders correctly and behaves as
expected in a browser environment.
- **How**: It uses the `puppeteer-tests/util` library to load the demo
page and take screenshots, verifying the visual appearance of the
component in its default state.
**Key Workflows/Processes:**
1. **Initialization and Rendering:**
- `<domain-picker-sk>` element is added to the DOM.
- `connectedCallback` is invoked.
- Properties like `state` and `force_request_type` are upgraded (if set as
attributes before the element was defined).
- Default `_state` is established (e.g., end = now, begin = 24h ago,
mode = RANGE).
- `render()` is called:
- It checks `force_request_type`. If set, it overrides
`_state.request_type`.
- The main template is rendered.
- `_showRadio` decides whether to show mode selection radio buttons.
- `_requestType` renders either the "Begin" date input (for RANGE
mode) or the "Points" number input (for DENSE mode).
```
[DOM Insertion] -> connectedCallback() -> _upgradeProperty('state')
-> _upgradeProperty('force_request_type')
-> render()
|
V
[UI Displayed]
```
2. **User Changes Mode (if `force_request_type` is not set):**
- User clicks on "Date Range" or "Dense" radio button.
- `@change` event triggers `typeRange()` or `typeDense()`.
- `_state.request_type` is updated.
- `render()` is called.
- The UI updates to show the relevant inputs (Begin date vs. Points).
```
[User clicks radio] -> typeRange()/typeDense() -> _state.request_type updated
-> render()
|
V
[UI Updates]
```
3. **User Changes Date/Number of Commits:**
- User interacts with `<calendar-input-sk>` (for Begin/End dates) or the
`<input type="number">` (for Points).
- `@input` (for calendar) or `@change` (for number input) event triggers
`beginChange()`, `endChange()`, or `numChanged()`.
- The corresponding part of `_state` (e.g., `_state.begin`, `_state.end`,
`_state.num_commits`) is updated.
- `render()` is called (though in the case of date changes, the
`<calendar-input-sk>` handles its own visual update for the date
display, and `render()` here ensures the parent component is aware and
can re-render if other parts depend on these values, although in the
current implementation, `render()` on the parent might be redundant for
just date changes if no other part of _this_ component's template
changes directly).
```
[User changes input] -> beginChange()/endChange()/numChanged()
|
V
_state updated
|
V
render() // Potentially re-renders the component
|
V
[UI reflects new value]
```
The component emits no custom events itself but relies on the events from its
child components (`radio-sk`, `calendar-input-sk`) to trigger internal state
updates and re-renders. Consumers of `domain-picker-sk` would typically read the
`state` property to get the user's selection.
# Module: /modules/errorMessage
The `errorMessage` module provides a wrapper around the `errorMessage` function
from the `elements-sk` library. Its primary purpose is to offer a more
convenient way to display persistent error messages to the user.
**Core Functionality and Design Rationale:**
The key differentiation of this module lies in its default behavior for message
display duration. While the `elements-sk` `errorMessage` function requires a
duration to be specified for how long a message (often referred to as a "toast")
remains visible, this module defaults the duration to `0` seconds.
This design choice is intentional: a duration of `0` typically signifies that
the error message will _not_ automatically close. This is particularly useful in
scenarios where an error is critical or requires user acknowledgment, and an
auto-dismissing message might be missed. By defaulting to a persistent display,
the module prioritizes ensuring the user is aware of the error.
**Responsibilities and Key Components:**
The module exposes a single function: `errorMessage`.
- **`errorMessage(message: string | { message: string } | { resp: Response } |
object, duration: number = 0): void`**:
- This function is responsible for displaying an error message to the
user.
- It accepts the same flexible `message` parameter as the underlying
`elements-sk` function. This means it can handle plain strings, objects
with a `message` property, objects containing a `Response` object (from
which an error message can often be extracted), or generic objects.
- The crucial aspect is the `duration` parameter. If not explicitly
provided by the caller, it defaults to `0`. This default triggers the
persistent display behavior mentioned above.
- Internally, this function simply calls `elementsErrorMessage` from the
`elements-sk` library, passing along the provided `message` and the
(potentially defaulted) `duration`.
**Workflow:**
The typical workflow for using this module is straightforward:
1. **Import:** The `errorMessage` function is imported from this module.
2. **Invocation:** When an error condition occurs that needs to be communicated
to the user persistently, the `errorMessage` function is called with the
error details.
- `errorMessage("A critical error occurred.")` -> Displays "A critical
error occurred." indefinitely.
- `errorMessage("Something went wrong.", 5000)` -> Displays "Something
went wrong." for 5 seconds (overriding the default).
Essentially, this module acts as a thin convenience layer, promoting a specific
error display pattern (persistent messages) by changing the default behavior of
a more general utility. This reduces boilerplate for common use cases where
persistent error notification is desired.
# Module: /modules/existing-bug-dialog-sk
The `existing-bug-dialog-sk` module provides a user interface element for
associating performance anomalies with existing bug reports in a bug tracking
system (like Monorail). It's designed to be used within a larger performance
monitoring application where users need to triage and manage alerts generated by
performance regressions.
The core purpose of this module is to simplify the workflow of linking one or
more detected anomalies to a pre-existing bug. Instead of manually navigating to
the bug tracker and updating the bug, users can do this directly from the
performance monitoring interface. This reduces context switching and streamlines
the bug management process.
**Key Components and Responsibilities:**
- **`existing-bug-dialog-sk.ts`**: This is the heart of the module, defining
the custom HTML element `existing-bug-dialog-sk`.
- **Why**: It encapsulates the entire UI and logic for the dialog. This
includes displaying a form for entering a bug ID, a dropdown for
selecting the bug tracking project (though currently hardcoded to
'chromium'), and a list of already associated bugs for the selected
anomalies.
- **How**:
- It uses Lit for templating and rendering the dialog's HTML structure.
- It manages the dialog's visibility (`open()`, `closeDialog()`).
- It handles form submission:
- Takes the entered bug ID and the list of selected anomalies
(`_anomalies`).
- Makes an HTTP POST request to a backend endpoint
(`/_/triage/associate_alerts`) to create the association.
- Upon success, it opens the bug page in a new tab and dispatches a
custom event `anomaly-changed`. This event signals other parts of
the application (e.g., charts or lists displaying anomalies) that
the anomaly data has been updated (specifically, the `bug_id` field)
and they might need to re-render.
- Handles potential errors by displaying an error message toast.
- It fetches and displays a list of bugs already associated with the
anomalies in the current group. This involves:
- Making a POST request to `/_/anomalies/group_report` to get details
of anomalies in the same group, including their associated
`bug_id`s. This endpoint might return a `sid` (state ID) if the
report generation is asynchronous, requiring a follow-up request.
- Once the list of associated bug IDs is retrieved, it makes another
POST request to `/_/triage/list_issues` to fetch the titles of these
bugs. This provides more context to the user than just showing bug
IDs.
- The `setAnomalies()` method is crucial for initializing the dialog with
the relevant anomaly data when it's about to be shown.
- It relies on `window.perf.bug_host_url` to construct links to the bug
tracker.
- **`existing-bug-dialog-sk.scss`**: This file contains the SASS/CSS styles
for the dialog.
- **Why**: It ensures the dialog has a consistent look and feel with the
rest of the application, using shared theme variables
(`--on-background`, `--background`, etc.).
- **How**: It defines styles for the dialog container, input fields,
buttons, close icon, and the list of associated bugs. It also includes
specific styling for the loading spinner and selected items.
- **`index.ts`**: This is a simple entry point that imports and registers the
`existing-bug-dialog-sk` custom element, making it available for use in
HTML.
**Workflow for Associating Anomalies with an Existing Bug:**
1. **User Action**: The user selects one or more anomalies in the main
application interface and chooses an option to associate them with an
existing bug.
2. **Dialog Initialization**: The application calls `setAnomalies()` on an
`existing-bug-dialog-sk` instance, passing the selected anomalies.
3. **Dialog Display**: The application calls `open()` on the dialog instance.
`Application existing-bug-dialog-sk | | | -- setAnomalies(anomalies) --> | |
| | ------ open() ---------> | | | | -- fetch_associated_bugs() --> Backend
API (/anomalies/group_report) | | | <-- (Associated Bug IDs) -- | | | | --
fetch_bug_titles() ---> Backend API (/triage/list_issues) | | | <--- (Bug
Titles) -------- | | | | -- Renders Dialog with form & associated bugs list
--`
4. **User Interaction**:
- The user sees the dialog.
- If there are other anomalies in the same group already linked to bugs,
these bugs (ID and title) are listed.
- The user enters a Bug ID into the input field.
- The user clicks the "Submit" button.
5. **Form Submission and Backend Communication**: `existing-bug-dialog-sk |
| -- (User Submits Form) --> | | | | -- _spinner.active = true --> (UI
Update: Show spinner) | | | -- fetch('/_/triage/associate_alerts', POST,
{bug_id, keys}) --> Backend API | | | <---- (Success/Failure) ---- | | |
| -- _spinner.active = false -> (UI Update: Hide spinner) | | | -- IF
Success: | | | -- closeDialog() ------> (UI Update: Hide dialog) | | | |
| -- window.open(bug_url) -> (Opens bug in new tab) | | | | | --
dispatchEvent('anomaly-changed') --> Application (Notifies other components)
| | | -- IF Failure: | | | -- errorMessage(msg) -> (UI Update: Show error
toast)`
6. **Outcome**:
- **Success**: The anomalies are linked to the specified bug in the
backend. The dialog closes, the bug page opens in a new tab, and other
parts of the UI (listening for `anomaly-changed`) update to reflect the
new association.
- **Failure**: An error message is shown, and the dialog remains open,
allowing the user to try again or correct the input.
The design prioritizes a clear and focused user experience for a common task in
performance alert triaging. By integrating directly with the backend API for bug
association and fetching related bug information, it aims to be an efficient
tool for developers and SREs. The use of custom events allows for loose coupling
with other components in the larger application.
# Module: /modules/explore-multi-sk
## explore-multi-sk Module
### Overview
The `explore-multi-sk` module provides a user interface for displaying and
interacting with multiple performance data graphs simultaneously. This is
particularly useful when users need to compare different metrics,
configurations, or time ranges side-by-side. The core idea is to leverage the
functionality of individual `explore-simple-sk` elements, which represent single
graphs, and manage their states and interactions within a unified multi-graph
view.
### Key Design Decisions and Implementation Choices
**State Management:** A central `State` object within `explore-multi-sk` manages
properties that are common across all displayed graphs. These include the time
range (`begin`, `end`), display options (`showZero`, `dots`), and pagination
settings (`pageSize`, `pageOffset`). This approach simplifies the overall state
management and keeps the URL from becoming overly complex, as only a limited set
of shared parameters need to be reflected in the URL.
Each individual graph (`explore-simple-sk` instance) maintains its own specific
state related to the data it displays (formulas, queries, selected keys).
`explore-multi-sk` stores an array of `GraphConfig` objects, where each object
corresponds to an `explore-simple-sk` instance and holds its unique
configuration.
The `stateReflector` utility is used to synchronize the shared `State` with the
URL, allowing for bookmarking and sharing of multi-graph views.
**Dynamic Graph Addition and Removal:** Users can dynamically add new graphs to
the view. When a new graph is added, an empty `explore-simple-sk` instance is
created and the user can then configure its data source (query or formula).
If the `useTestPicker` option is enabled (often determined by backend defaults),
instead of a simple "Add Graph" button, a `test-picker-sk` element is displayed.
This component provides a more structured way to select tests and parameters,
and upon selection, a new graph is automatically generated and populated.
Graphs can also be removed. Event listeners are in place to handle
`remove-explore` custom events, which are typically dispatched by the individual
`explore-simple-sk` elements when a user closes them in a "Multiview" context
(where `useTestPicker` is active).
**Pagination:** To handle potentially large numbers of graphs, pagination is
implemented using the `pagination-sk` element. This allows users to view a
subset of the total graphs at a time, improving performance and usability. The
`pageSize` and `pageOffset` are part of the shared state.
**Graph Manipulation (Split and Merge):**
- **Split Graph:** If a single graph displaying multiple traces is present,
the "Split Graph" functionality allows the user to create separate graphs
for each of those traces. This is useful for focusing on individual trends
that were previously combined.
- **Merge Graphs:** Conversely, the "Merge Graphs" functionality takes all
currently displayed graphs and combines their traces into a single graph.
This can be helpful for seeing an aggregated view.
These operations primarily involve manipulating the `graphConfigs` array and
then re-rendering the graphs.
**Shortcuts:** The module supports saving and loading multi-graph configurations
using shortcuts. When the configuration of graphs changes (traces added/removed,
graphs split/merged), `updateShortcutMultiview` is called. This function
communicates with a backend service (`/_/shortcut/get` and a corresponding save
endpoint invoked by `updateShortcut` from `explore-simple-sk`) to store or
retrieve the `graphConfigs` associated with a unique shortcut ID. This ID is
then reflected in the URL, allowing users to share specific multi-graph setups.
**Synchronization of Interactions:**
- **X-Axis Label:** When the x-axis label (e.g., switching between commit
number and date) is toggled on one graph, a custom event `x-axis-toggled` is
dispatched. `explore-multi-sk` listens for this and updates the x-axis on
all other visible graphs to maintain consistency.
- **Chart Selection (Plot Summary):** While not explicitly detailed in
`explore-multi-sk.ts`, the `explore-simple-sk` component likely has
mechanisms for plot selection. If the `plotSummary` feature is active,
selections on one graph might influence others, though the provided code for
`explore-multi-sk` doesn't directly show this cross-graph selection
synchronization logic, but it does have `syncChartSelection` which would
handle this.
**Defaults and Configuration:** The component fetches default configurations
from a `/_/defaults/` endpoint. These defaults can influence various aspects,
such as: - Whether to use `test-picker-sk` (`useTestPicker`). - Default
parameters and their order for `test-picker-sk` (`include_params`,
`default_param_selections`). This allows for instance-specific customization of
the Perf UI.
### Responsibilities and Key Components
- **`explore-multi-sk.ts`**:
- **Responsibilities**: This is the main TypeScript file defining the
`ExploreMultiSk` custom element. It is responsible for:
- Managing the overall state of the multi-graph view (shared properties
like time range, pagination).
- Handling the addition, removal, and configuration of individual
`explore-simple-sk` graph elements.
- Interacting with the `stateReflector` to update the URL based on the
shared state.
- Implementing the "Split Graph" and "Merge Graphs" functionalities.
- Managing pagination for the displayed graphs.
- Fetching and applying default configurations.
- Coordinating interactions between graphs (e.g., synchronizing x-axis
labels).
- Interacting with the `test-picker-sk` if enabled.
- Handling user authentication status for features like "Add to
Favorites".
- Managing shortcuts for saving and loading multi-graph configurations.
- **Key Interactions**:
- Creates and manages instances of `explore-simple-sk`.
- Uses `pagination-sk` for displaying graphs in pages.
- Uses `test-picker-sk` for adding graphs when `useTestPicker` is true.
- Uses `favorites-dialog-sk` to allow users to save graph configurations.
- Communicates with backend services for shortcuts and default
configurations.
- **`explore-multi-sk.html` (Inferred from the Lit `html` template in
`explore-multi-sk.ts`)**:
- **Responsibilities**: Defines the structure of the `explore-multi-sk`
element. This includes:
- A menu section with buttons for "Add Graph", "Split Graph", "Merge
Graphs", and "Add to Favorites".
- The `test-picker-sk` element (conditionally visible).
- `pagination-sk` elements for navigating through graph pages.
- A container (`#graphContainer`) where the individual `explore-simple-sk`
elements are dynamically rendered.
- **Key Components**:
- `<button>` elements for user actions.
- `<test-picker-sk>` for test selection.
- `<pagination-sk>` for graph pagination.
- `<favorites-dialog-sk>` for saving favorites.
- A `div` (`#graphContainer`) to hold the `explore-simple-sk` instances.
- **`explore-multi-sk.scss`**:
- **Responsibilities**: Provides the styling for the `explore-multi-sk`
element and its children. It ensures that the layout is appropriate for
displaying multiple graphs and their controls.
- **Key Aspects**:
- Styles the `#menu` and `#pagination` areas.
- Defines the height of embedded `explore-simple-sk` plots.
- Handles the conditional visibility of elements like `#test-picker` and
`#add-graph-button`.
### Key Workflows
**1. Initial Load and State Restoration:**
```
User navigates to URL with explore-multi-sk
|
V
explore-multi-sk.connectedCallback()
|
V
Fetch defaults from /_/defaults/
|
V
stateReflector() is initialized
|
V
State is read from URL (or defaults if URL is empty)
|
V
IF state.shortcut is present:
Fetch graphConfigs from /_/shortcut/get using the shortcut ID
|
V
ELSE (or after fetching):
For each graphConfig (or if starting fresh, often one empty graph is implied or added):
Create/configure explore-simple-sk instance
Set its state based on graphConfig and shared state
|
V
Add graphs to the current page based on pagination settings
|
V
Render the component
```
**2. Adding a Graph (without Test Picker):**
```
User clicks "Add Graph" button
|
V
explore-multi-sk.addEmptyGraph() is called
|
V
A new ExploreSimpleSk instance is created
A new empty GraphConfig is added to this.graphConfigs
|
V
explore-multi-sk.updatePageForNewExplore()
|
V
IF current page is full:
Increment pageOffset (triggering pageChanged)
ELSE:
Add new graph to current page
|
V
The new explore-simple-sk element might open its query dialog for the user
```
**3. Adding a Graph (with Test Picker):**
```
TestPickerSk is visible (due to defaults or state)
|
V
User interacts with TestPickerSk, selects tests/parameters
|
V
User clicks "Plot" button in TestPickerSk
|
V
TestPickerSk dispatches 'plot-button-clicked' event
|
V
explore-multi-sk listens for 'plot-button-clicked'
|
V
explore-multi-sk.addEmptyGraph(unshift=true) is called (new graph at the top)
|
V
explore-multi-sk.addGraphsToCurrentPage() updates the view
|
V
TestPickerSk.createQueryFromFieldData() gets the query
|
V
The new ExploreSimpleSk instance has its query set
```
**4. Splitting a Graph:**
```
User has one graph with multiple traces and clicks "Split Graph"
|
V
explore-multi-sk.splitGraph()
|
V
this.getTracesets() retrieves traces from the first (and only) graph
|
V
this.clearGraphs() removes the existing graph configuration
|
V
FOR EACH trace in the retrieved traceset:
this.addEmptyGraph()
A new GraphConfig is created for this trace (e.g., config.queries = [queryFromKey(trace)])
|
V
this.updateShortcutMultiview() (new shortcut reflecting multiple graphs)
|
V
this.state.pageOffset is reset to 0
|
V
this.addGraphsToCurrentPage() renders the new set of individual graphs
```
**5. Saving/Updating a Shortcut:**
```
Graph configuration changes (e.g., trace added/removed, graph split/merged, new graph added)
|
V
explore-multi-sk.updateShortcutMultiview() is called
|
V
Calls exploreSimpleSk.updateShortcut(this.graphConfigs)
|
V
(Inside updateShortcut)
IF graphConfigs is not empty:
POST this.graphConfigs to backend (e.g., /_/shortcut/new or /_/shortcut/update)
Backend returns a new or existing shortcut ID
|
V
explore-multi-sk.state.shortcut is updated with the new ID
|
V
this.stateHasChanged() is called, triggering stateReflector to update the URL
```
# Module: /modules/explore-simple-sk
The `explore-simple-sk` module provides a custom HTML element for exploring and
visualizing performance data. It allows users to query, plot, and analyze
traces, identify anomalies, and interact with commit details. This element is a
core component of the Perf application's data exploration interface.
**Core Functionality:**
The element's primary responsibility is to provide a user interface for:
1. **Querying Data:** Users can construct queries to select specific traces
based on various parameters.
2. **Plotting Traces:** Selected traces are rendered on a graph, allowing for
visual inspection of performance trends.
3. **Analyzing Data:** Users can interact with the plot to zoom, pan, and
select individual data points for detailed inspection.
4. **Anomaly Detection:** The element integrates with anomaly detection
services to highlight and manage performance regressions or improvements.
5. **Commit Details:** Information about the commits associated with data
points can be displayed, linking performance changes to specific code
modifications.
**Key Design Decisions and Implementation Choices:**
- **State Management:** The element's state (e.g., current query, time range,
plot settings) is managed internally and reflected in the URL. This allows
users to share specific views of the data and enables bookmarking. The
`State` class in `explore-simple-sk.ts` defines the structure of this state.
- **Data Fetching:** Data is fetched asynchronously from the backend using the
`/frame/start` endpoint. The `requestFrame` method handles initiating these
requests and processing the responses. The `FrameRequest` and
`FrameResponse` types define the communication contract with the server.
- **Plotting Library:** The module supports two plotting libraries:
`plot-simple-sk` (a custom canvas-based plotter) and `plot-google-chart-sk`
(which wraps Google Charts). The choice of plotter can be configured.
- **Component-Based Architecture:** The UI is built using a collection of
smaller, specialized custom elements (e.g., `query-sk` for query input,
`paramset-sk` for displaying parameters, `commit-detail-panel-sk` for commit
information). This promotes modularity and reusability.
- **Event-Driven Communication:** Components communicate with each other and
with the main `explore-simple-sk` element through custom events. For
example, when a query changes in `query-sk`, it emits a `query-change` event
that `explore-simple-sk` listens to.
- **Caching and Optimization:** To improve performance, the element employs
strategies like incremental data loading when panning and caching commit
details.
**Key Files and Components:**
- **`explore-simple-sk.ts`:** This is the main TypeScript file that defines
the `ExploreSimpleSk` custom element. It handles:
- State management and URL reflection.
- Data fetching and processing.
- Rendering the UI template.
- Event handling and coordination between child components.
- Interaction logic for plotting, zooming, selecting points, etc.
- **`explore-simple-sk.html` (embedded in `explore-simple-sk.ts`):** This
Lit-html template defines the structure of the element's UI. It includes
placeholders for various child components and dynamic content.
- **`explore-simple-sk.scss`:** This SCSS file provides the styling for the
element and its components.
- **Child Components (imported in `explore-simple-sk.ts`):**
- `query-sk`: For constructing and managing queries.
- `paramset-sk`: For displaying and interacting with parameter sets.
- `plot-simple-sk` / `plot-google-chart-sk`: For rendering the plots.
- `commit-detail-panel-sk`: For displaying commit information.
- `anomaly-sk`: For displaying and managing anomalies.
- Many other components for specific UI elements like dialogs, buttons,
and icons.
**Workflow Example: Plotting a Query**
1. **User Interaction:** The user interacts with the `query-sk` element to
define a query.
2. **Event Emission:** `query-sk` emits a `query-change` event with the new
query.
3. **State Update:** `explore-simple-sk` listens for this event, updates its
internal state (specifically the `queries` array in the `State` object), and
triggers a re-render.
4. **Data Request:** `explore-simple-sk` constructs a `FrameRequest` based on
the updated state and calls `requestFrame` to fetch data from the server.
`User Input (query-sk) -> Event (query-change) -> State Update
(ExploreSimpleSk) -> Data Request (requestFrame)`
5. **Data Processing:** Upon receiving the `FrameResponse`, `explore-simple-sk`
processes the data, updates its internal `_dataframe` object, and prepares
the data for plotting.
6. **Plot Rendering:** `explore-simple-sk` passes the processed data to the
`plot-simple-sk` or `plot-google-chart-sk` element, which then renders the
traces on the graph. `Server Response (FrameResponse) -> Data Processing
(ExploreSimpleSk) -> Plot Update (plot-simple-sk/plot-google-chart-sk) ->
Visual Output`
7. **URL Update:** The state change is reflected in the URL, allowing the user
to bookmark or share the current view.
This workflow illustrates the reactive nature of the element, where user
interactions trigger state changes, which in turn lead to data fetching and UI
updates.
# Module: /modules/explore-sk
The `explore-sk` module serves as the primary user interface for exploring and
analyzing performance data within the Perf application. It provides a
comprehensive view for users to query, visualize, and interact with performance
traces.
The core functionality of `explore-sk` is built upon the `explore-simple-sk`
element. `explore-sk` acts as a wrapper, enhancing `explore-simple-sk` with
additional features like user authentication integration, default configuration
loading, and the optional `test-picker-sk` for more guided query construction.
**Key Responsibilities and Components:**
- **`explore-sk.ts`**: This is the main TypeScript file defining the
`ExploreSk` custom element.
- **Why**: It orchestrates the interaction between various sub-components
and manages the overall state of the exploration page.
- **How**:
- It initializes by fetching default configurations (e.g., query
parameters, display settings) from a backend endpoint (`/_/defaults/`).
This ensures that the exploration view is pre-configured with sensible
starting points.
- It integrates with `alogin-sk` to determine the logged-in user's status.
This information is used to enable features like "favorites" if a user
is logged in.
- It utilizes `stateReflector` to persist and restore the state of the
underlying `explore-simple-sk` element in the URL. This allows users to
share specific views or bookmark their current exploration state.
- It conditionally initializes and displays `test-picker-sk`. If the
`use_test_picker_query` flag is set in the state (often via URL
parameters or defaults), the `test-picker-sk` component is shown,
providing a structured way to build queries based on available parameter
keys and values.
- It listens for events from `test-picker-sk` (e.g.,
`plot-button-clicked`, `remove-all`, `populate-query`) and translates
these into actions on the `explore-simple-sk` element, such as adding
new traces based on the selected test parameters or clearing the view.
- It provides buttons like "View in multi-graph" and "Toggle Chart Style"
which directly interact with methods exposed by `explore-simple-sk`.
- **`explore-simple-sk` (imported module)**: This is a fundamental building
block that handles the core trace visualization, querying logic, and
interaction with the graph.
- **Why**: Encapsulates the complex logic of fetching trace data,
rendering graphs, and handling user interactions like zooming, panning,
and selecting traces.
- **How**: `explore-sk` delegates most of the heavy lifting related to
data exploration to this component. It passes down the initial state,
default configurations, and user-specific settings.
- **`test-picker-sk` (imported module)**: A component that allows users to
build queries by selecting from available test parameters and their values.
- **Why**: Simplifies the query construction process, especially when
dealing with a large number of possible parameters. It provides a more
user-friendly alternative to manually typing complex query strings.
- **How**: When active, it presents a UI for selecting dimensions and
values. Upon user action (e.g., clicking a "plot" button), it emits an
event with the constructed query, which `explore-sk` then uses to fetch
and display the corresponding traces via `explore-simple-sk`. It can
also be populated based on a highlighted trace, allowing users to
quickly refine queries based on existing data.
- **`favorites-dialog-sk` (imported module)**: Enables users to save and
manage their favorite query configurations.
- **Why**: Provides a convenient way for users to quickly return to
frequently used or important exploration views.
- **How**: Integrated into `explore-simple-sk` and its functionality is
enabled by `explore-sk` based on the user's login status.
- **State Management (`stateReflector`)**:
- **Why**: To make the exploration state shareable and bookmarkable.
Changes in the exploration view (queries, zoom levels, etc.) are
reflected in the URL.
- **How**: `explore-sk` uses `stateReflector` to listen for state changes
in `explore-simple-sk`. When the state changes, `stateReflector` updates
the URL. Conversely, when the page loads or the URL changes,
`stateReflector` parses the URL and applies the state to
`explore-simple-sk`.
**Workflow Example: Initial Page Load with Test Picker**
1. `explore-sk` element is connected to the DOM.
2. `connectedCallback` is invoked:
- Renders its initial template.
- Fetches default configurations from `/_/defaults/`.
- `stateReflector` is initialized. If the URL contains state for
`explore-simple-sk`, it's applied.
- The state might indicate `use_test_picker_query = true`.
3. If `use_test_picker_query` is true:
- `initializeTestPicker()` is called.
- `test-picker-sk` element is made visible.
- `test-picker-sk` is initialized with parameters from the defaults (e.g.,
`include_params`, `default_param_selections`) or from existing queries
in the state.
4. User interacts with `test-picker-sk` to select desired test parameters.
5. User clicks the "Plot" button within `test-picker-sk`.
6. `test-picker-sk` emits a `plot-button-clicked` event.
7. `explore-sk` listens for this event:
- It retrieves the query constructed by `test-picker-sk`.
- It calls `exploreSimpleSk.addFromQueryOrFormula()` to add the new traces
to the graph.
8. `explore-simple-sk` fetches the data, renders the traces, and emits a
`state_changed` event.
9. `stateReflector` captures this `state_changed` event and updates the URL to
reflect the new query.
This workflow illustrates how `explore-sk` acts as a central coordinator,
integrating various specialized components to provide a cohesive data
exploration experience. The design emphasizes modularity, with
`explore-simple-sk` handling the core plotting and `test-picker-sk` offering an
alternative query input mechanism, all managed and presented by `explore-sk`.
# Module: /modules/favorites-dialog-sk
The `favorites-dialog-sk` module provides a custom HTML element that displays a
modal dialog for users to add or edit "favorites." Favorites, in this context,
are likely user-defined shortcuts or bookmarks to specific views or states
within the application, identified by a name, description, and a URL.
**Core Functionality and Design:**
The primary purpose of this module is to present a user-friendly interface for
managing these favorites. It's designed as a modal dialog to ensure that the
user's focus is on the task of adding or editing a favorite without distractions
from the underlying page content.
**Key Components:**
- **`favorites-dialog-sk.ts`**: This is the heart of the module, defining the
`FavoritesDialogSk` custom element.
- **Why**: It encapsulates the logic for displaying the dialog, handling
user input, and interacting with a backend service to persist favorite
data.
- **How**:
- It extends `ElementSk`, a base class for custom elements in the Skia
infrastructure, providing a common foundation.
- It uses the Lit library (`lit/html.js`) for templating, allowing for
declarative and efficient rendering of the dialog's UI.
- The `open()` method is the public API for triggering the dialog. It
accepts optional parameters for pre-filling the form when editing an
existing favorite. Crucially, it returns a `Promise`. This promise-based
approach is a key design choice. It resolves when the favorite is
successfully saved and rejects if the user cancels the dialog. This
allows the calling code (likely a parent component managing the list of
favorites) to react appropriately, for instance, by re-fetching the
updated list of favorites only when a change has actually occurred.
- Input fields for "Name," "Description," and "URL" capture the necessary
information. The "Name" and "URL" fields are mandatory.
- The `confirm()` method handles the submission logic. It performs basic
validation (checking for empty name and URL) and then makes an HTTP POST
request to either `/_/favorites/new` or `/_/favorites/edit` depending on
whether a new favorite is being created or an existing one is being
modified.
- A `spinner-sk` element is used to provide visual feedback to the user
during the asynchronous operation of saving the favorite.
- Error handling is implemented using `errorMessage` to display issues to
the user, such as network errors or validation failures from the
backend.
- The `dismiss()` method handles the cancellation of the dialog, rejecting
the promise returned by `open()`.
- Input event handlers (`filterName`, `filterDescription`, `filterUrl`)
update the component's internal state as the user types, and trigger
re-renders via `this._render()`.
- **`favorites-dialog-sk.scss`**: This file contains the SASS styles for the
dialog.
- **Why**: It separates the presentation concerns from the JavaScript
logic, making the component more maintainable.
- **How**: It defines styles for the `<dialog>` element, input fields,
labels, and buttons, ensuring a consistent look and feel within the
application's theme (as indicated by `@import
'../themes/themes.scss';`).
- **`favorites-dialog-sk-demo.html` / `favorites-dialog-sk-demo.ts`**: These
files provide a demonstration page for the `favorites-dialog-sk` element.
- **Why**: This allows developers to see the component in isolation, test
its functionality, and understand how to integrate it.
- **How**: The HTML sets up a basic page with buttons to trigger the
dialog in "new favorite" and "edit favorite" modes. The TypeScript file
wires up event listeners on these buttons to call the `open()` method of
the `favorites-dialog-sk` element with appropriate parameters.
**Workflow: Adding/Editing a Favorite**
A typical workflow involving this dialog would be:
1. **User Action**: The user clicks a button (e.g., "Add Favorite" or an "Edit"
icon next to an existing favorite) in the main application UI.
2. **Dialog Invocation**: The event handler for this action calls the `open()`
method of an instance of `favorites-dialog-sk`.
- If adding a new favorite, `open()` might be called with minimal or no
arguments, defaulting the URL to the current page.
- If editing, `open()` is called with the `favId`, `name`, `description`,
and `url` of the favorite to be edited.
```
User clicks "Add New" --> favoritesDialog.open('', '', '', 'current.page.url')
|
V
Dialog Appears
|
V
User fills form, clicks "Save" --> confirm() is called
|
V
POST /_/favorites/new
|
V (Success)
Dialog closes, open() Promise resolves
|
V
Calling component re-fetches favorites
-------------------------------- OR ---------------------------------
User clicks "Edit Favorite" --> favoritesDialog.open('id123', 'My Fav', 'Desc', 'fav.url.com')
|
V
Dialog Appears (pre-filled)
|
V
User modifies form, clicks "Save" --> confirm() is called
|
V
POST /_/favorites/edit (with 'id123')
|
V (Success)
Dialog closes, open() Promise resolves
|
V
Calling component re-fetches favorites
-------------------------------- OR ---------------------------------
User clicks "Cancel" or Close Icon --> dismiss() is called
|
V
Dialog closes, open() Promise rejects
|
V
Calling component does nothing (no re-fetch)
```
3. **User Interaction**: The user fills in or modifies the "Name,"
"Description," and "URL" fields in the dialog.
4. **Submission/Cancellation**:
- **Save**: The user clicks the "Save" button.
- The `confirm()` method is invoked.
- Input validation (name and URL not empty) is performed.
- A `fetch` request is made to the backend API (`/_/favorites/new` or
`/_/favorites/edit`).
- A spinner is shown during the API call.
- Upon successful completion, the dialog closes, and the `Promise`
returned by `open()` resolves.
- If the API call fails, an error message is displayed, and the dialog
remains open (or the promise might reject depending on specific
error handling in `confirm`).
- **Cancel**: The user clicks the "Cancel" button or the close icon.
- The `dismiss()` method is invoked.
- The dialog closes.
- The `Promise` returned by `open()` rejects.
5. **Post-Dialog Action**: The component that initiated the dialog (e.g., a
`favorites-sk` list component) uses the resolved/rejected state of the
`Promise` to decide whether to refresh its list of favorites. This is a key
aspect of the design – it avoids unnecessary re-fetches if the user simply
cancels the dialog.
The design prioritizes a clear separation of concerns, using custom elements for
UI encapsulation, SASS for styling, and a promise-based API for asynchronous
operations and communication with parent components. This makes the
`favorites-dialog-sk` a reusable and well-defined piece of UI for managing user
favorites.
# Module: /modules/favorites-sk
The `favorites-sk` module provides a user interface element for displaying and
managing a user's "favorites". Favorites are essentially bookmarked URLs,
categorized into sections. This module allows users to view their favorited
links, edit their details (name, description, URL), and delete them.
**Core Functionality & Design:**
The primary responsibility of `favorites-sk` is to fetch favorite data from a
backend endpoint (`/_/favorites/`) and render it in a user-friendly way. It also
handles interactions for modifying these favorites, such as editing and
deleting.
- **Data Fetching and Rendering:**
- Upon connection to the DOM (`connectedCallback`), the element attempts
to fetch the favorites configuration from the backend.
- The fetched data, expected to be in a `Favorites` JSON format (defined
in `perf/modules/json`), is stored in the `favoritesConfig` property.
- The `_render()` method is called to update the display.
- The rendering logic iterates through sections and then links within each
section, generating an HTML table for display.
- A key design choice is to distinguish "My Favorites" from other
sections. "My Favorites" are displayed with "Edit" and "Delete" buttons,
implying user ownership and modifiability. Other sections are presented
as read-only.
- **Favorite Management:**
- **Deletion:**
- When a user clicks the "Delete" button for a favorite in the "My
Favorites" section, the `deleteFavoriteConfirm` method is invoked.
- This method displays a standard browser confirmation dialog
(`window.confirm`) to prevent accidental deletions.
- If confirmed, `deleteFavorite` sends a POST request to
`/_/favorites/delete` with the ID of the favorite to be removed.
- After a successful deletion, the favorites list is re-fetched to reflect
the change.
- **Editing:**
- Clicking the "Edit" button calls the `editFavorite` method.
- This method interacts with a `favorites-dialog-sk` element (defined in
`perf/modules/favorites-dialog-sk`).
- The `favorites-dialog-sk` is responsible for presenting a modal dialog
where the user can modify the favorite's name, description, and URL.
- Upon successful editing (dialog submission), the favorites list is
re-fetched.
- **Error Handling:**
- Network errors or non-OK responses during fetch operations (fetching
favorites, deleting favorites) are caught.
- An error message is displayed to the user via the `errorMessage` utility
(from `elements-sk/modules/errorMessage`).
**Key Components/Files:**
- **`favorites-sk.ts`:** This is the heart of the module. It defines the
`FavoritesSk` custom element, extending `ElementSk`. It contains the logic
for fetching, rendering, deleting, and initiating the editing of favorites.
- `constructor()`: Initializes the element with its Lit-html template.
- `deleteFavorite()`: Handles the asynchronous request to the backend for
deleting a favorite.
- `deleteFavoriteConfirm()`: Provides a confirmation step before actual
deletion.
- `editFavorite()`: Manages the interaction with the `favorites-dialog-sk`
for editing.
- `template()`: The static Lit-html template function that defines the
overall structure of the element.
- `getSectionsTemplate()`: A helper function that dynamically generates
the HTML for displaying sections and their links based on
`favoritesConfig`. It specifically adds edit/delete controls for the "My
Favorites" section.
- `fetchFavorites()`: Fetches the favorites data from the backend and
triggers a re-render.
- `connectedCallback()`: A lifecycle method that ensures favorites are
fetched when the element is added to the page.
- **`favorites-sk.scss`:** Provides the styling for the `favorites-sk`
element, defining its layout, padding, colors for links, and table
appearance.
- **`index.ts`:** A simple entry point that imports and registers the
`favorites-sk` custom element, making it available for use in HTML.
- **`favorites-sk-demo.html` & `favorites-sk-demo.ts`:** These files provide a
demonstration page for the `favorites-sk` element. The HTML includes an
instance of `<favorites-sk>` and a `<pre>` tag to display events. The
TypeScript file simply imports the element and sets up an event listener
(though no custom events are explicitly dispatched by `favorites-sk` in the
provided code).
**Workflow: Deleting a Favorite**
```
User Clicks "Delete" Button (for a link in "My Favorites")
|
V
favorites-sk.ts: deleteFavoriteConfirm(id, name)
|
V
window.confirm("Deleting favorite: [name]. Are you sure?")
|
+-- User clicks "Cancel" --> Workflow ends
|
V User clicks "OK"
favorites-sk.ts: deleteFavorite(id)
|
V
fetch('/_/favorites/delete', { method: 'POST', body: {id: favId} })
|
+-- Network Error/Non-OK Response --> errorMessage() is called, display error
|
V Successful Deletion
favorites-sk.ts: fetchFavorites()
|
V
fetch('/_/favorites/')
|
V
Parse JSON response, update this.favoritesConfig
|
V
this._render() // Re-renders the component with the updated list
```
**Workflow: Editing a Favorite**
```
User Clicks "Edit" Button (for a link in "My Favorites")
|
V
favorites-sk.ts: editFavorite(id, name, desc, url)
|
V
Get reference to <favorites-dialog-sk id="fav-dialog">
|
V
favorites-dialog-sk.open(id, name, desc, url) // Opens the edit dialog
|
+-- User cancels dialog --> Promise rejects (potentially with undefined, handled)
|
V User submits changes in dialog
Promise resolves
|
V
favorites-sk.ts: fetchFavorites() // Re-fetches and re-renders the list
|
V
fetch('/_/favorites/')
|
V
Parse JSON response, update this.favoritesConfig
|
V
this._render()
```
The design relies on Lit for templating and rendering, which provides efficient
updates to the DOM when the `favoritesConfig` data changes. The separation of
concerns is evident: `favorites-sk` handles the list display and top-level
actions, while `favorites-dialog-sk` manages the intricacies of the editing
form.
# Module: /modules/graph-title-sk
## Graph Title (`graph-title-sk`)
The `graph-title-sk` module provides a custom HTML element designed to display
titles for individual graphs in a structured and informative way. Its primary
goal is to present key-value pairs of metadata associated with a graph in a
visually clear and space-efficient manner.
### Responsibilities and Key Components
The core of this module is the `GraphTitleSk` custom element
(`graph-title-sk.ts`). Its main responsibilities are:
1. **Data Reception and Storage:** It receives a `Map<string, string>` where
keys represent parameter names (e.g., "bot", "benchmark") and values
represent their corresponding values (e.g., "linux-perf", "Speedometer2").
This map, along with the number of traces in the graph, is provided via the
`set()` method.
2. **Dynamic Rendering:** Based on the provided data, the element dynamically
generates HTML to display the title. It iterates through the key-value pairs
and renders them in a columnar layout. Each pair is displayed with the key
(parameter name) in a smaller font above its corresponding value.
3. **Handling Empty or Generic Titles:**
- If a key or its corresponding value is an empty string, that particular
entry is omitted from the displayed title. This ensures that the title
remains concise and only shows relevant information.
- If the input `titleEntries` map is empty but `numTraces` is greater than
zero, it displays a generic title like "Multi-trace Graph (X traces)" to
indicate a graph with multiple data series without specific shared
parameters.
4. **Space Management and Truncation:**
- The title entries are arranged in a flexible, wrapping layout (`display:
flex; flex-wrap: wrap;`) using CSS (`graph-title-sk.scss`). This allows
the title to adapt to different screen widths.
- To prevent overcrowding, especially when there are many parameters, the
component implements a "show more" functionality. If the number of title
entries exceeds a predefined limit (`MAX_PARAMS`, currently 8), it
initially displays only the first `MAX_PARAMS` entries. A "Show Full
Title" button (`<md-text-button class="showMore">`) is then provided,
allowing the user to expand the view and see all title entries.
Conversely, a "Show Short Title" mechanism is implied (though not
explicitly shown as a button in the current code, `showShortTitles()` method exists) to revert to the truncated view.
- Individual values that are very long are visually truncated in the
display, but the full value is available as a tooltip when the user
hovers over the text. This is achieved by setting the`title`attribute
of the`div` containing the value.
### Design Decisions and Implementation Choices
- **Custom Element (`ElementSk`):** The component is built as a custom element
extending `ElementSk`. This aligns with the Skia infrastructure's approach
to building reusable UI components and allows for easy integration into Skia
applications.
- **Lit Library for Templating:** The HTML structure is generated using the
`lit` library's `html` template literal tag. This provides a declarative and
efficient way to define the component's view and update it when data
changes. The `_render()` method, inherited from `ElementSk`, is called to
trigger re-rendering when the internal state (`_titleEntries`, `numTraces`,
`showShortTitle`) changes.
- **CSS for Styling:** Styling is handled through a dedicated SCSS file
(`graph-title-sk.scss`). This separates presentation concerns from the
component's logic. CSS variables (e.g., `var(--primary)`) are used for
theming, allowing the component's appearance to be consistent with the
overall application theme.
- **`set()` Method for Data Input:** Instead of relying solely on HTML
attributes for complex data like a map, a public `set()` method is provided.
This is a common pattern for custom elements when dealing with non-string
data or when updates need to trigger specific internal logic beyond simple
attribute reflection.
- **Conditional Rendering for Title Brevity:** The decision to truncate the
number of displayed parameters by default (when exceeding `MAX_PARAMS`) and
provide a "Show Full Title" option is a user experience choice. It
prioritizes a clean initial view for complex graphs while still allowing
users to access all details if needed.
### Key Workflows
**1. Initial Rendering with Data:**
```
User/Application Code GraphTitleSk Element
--------------------- --------------------
calls set(titleData, numTraces) -->
stores titleData & numTraces
calls _render()
|
V
getTitleHtml() is invoked
|
V
Iterates titleData:
- Skips empty keys/values
- If entries > MAX_PARAMS & showShortTitle is true:
- Renders first MAX_PARAMS entries
- Renders "Show Full Title" button
- Else:
- Renders all entries
|
V
HTML template is updated with generated content
Browser renders the title
```
**2. Toggling Full/Short Title Display (when applicable):**
```
User Interaction GraphTitleSk Element
---------------- --------------------
Clicks "Show Full Title" button -->
onClick handler (showFullTitle) executes
|
V
this.showShortTitle = false
calls _render()
|
V
getTitleHtml() is invoked
|
V
Now renders ALL title entries because showShortTitle is false
|
V
HTML template is updated
Browser re-renders the title to show all entries
```
A similar flow occurs if a mechanism to call `showShortTitles()` is implemented
and triggered.
The demo page (`graph-title-sk-demo.html` and `graph-title-sk-demo.ts`)
showcases various states of the `graph-title-sk` element, including:
- A "good" example with several valid entries.
- A "partial" example where some entries have empty keys or values.
- A "generic" example where an empty map is provided, resulting in the
"Multi-trace Graph" title.
- An "empty" example (though the demo code doesn't explicitly create a state
where `numTraces` is 0 and the map is also empty, which would result in no
title being displayed).
# Module: /modules/ingest-file-links-sk
## Module: ingest-file-links-sk
**Overview:**
The `ingest-file-links-sk` module provides a custom HTML element,
`<ingest-file-links-sk>`, designed to display a list of relevant links
associated with a specific data point in the Perf performance monitoring system.
These links are retrieved from the `ingest.Format` data structure, which can be
generated by various ingestion processes. The primary purpose is to offer users
quick access to related resources, such as Swarming task runs, Perfetto traces,
or bot information, directly from the Perf UI.
**Why:**
Performance analysis often requires context beyond the raw data. Understanding
the environment in which a test ran (e.g., specific bot configuration), or
having direct access to detailed trace files, can be crucial for debugging
performance regressions or understanding improvements. This module centralizes
these relevant links in a consistent and easily accessible manner, improving the
efficiency of performance investigations.
**How:**
The `<ingest-file-links-sk>` element fetches link data asynchronously. When its
`load()` method is called with a `CommitNumber` (representing a specific point
in time or version) and a `traceID` (identifying the specific data series), it
makes a POST request to the `/_/details/?results=false` endpoint. This endpoint
is expected to return a JSON object conforming to the `ingest.Format` structure.
The element then parses this JSON response. It specifically looks for the
`links` field within the `ingest.Format`. If `links` exist and the `version`
field in the `ingest.Format` is present (indicating a modern format), the
element dynamically renders a list of these links.
Key design considerations and implementation details:
- **Asynchronous Loading:** Link fetching is an asynchronous operation to
avoid blocking the UI. A `spinner-sk` element is displayed while data is
being loaded.
- **URL vs. Text:** The module intelligently differentiates between actual
URLs and plain text values within the `links` object. If a value is a valid
URL, it's rendered as an `<a>` tag. Otherwise, it's displayed as "Key:
Value".
- **Markdown Link Handling:** The element includes logic to parse and convert
Markdown-style links (e.g., `[Link Text](url)`) into standard HTML anchor
tags. This allows ingestion processes to provide links in a more
human-readable format if desired.
- **Sorted Display:** Links are displayed in alphabetical order by their keys
for consistent presentation.
- **Error Handling:** If the fetch request fails or the response is not in the
expected format, an error message is displayed, and the spinner is hidden.
- **Legacy Format Compatibility:** The element checks for the `version` field
in the response. If it's missing, it assumes a legacy data format that
doesn't support these links and gracefully avoids displaying anything.
**Responsibilities and Key Components:**
- **`ingest-file-links-sk.ts`:** This is the core file defining the
`IngestFileLinksSk` custom element.
- It handles the fetching of link data from the backend API.
- It manages the rendering of the link list based on the fetched data.
- It includes the logic for differentiating between URLs and plain text,
and for parsing Markdown links.
- It manages the display of a loading spinner and error messages.
- The `load(cid: CommitNumber, traceid: string)` method is the public API
for triggering the data fetching and rendering process.
- The `displayLinks` static method is responsible for generating the
`TemplateResult` array for rendering the list items.
- The `isUrl` and `removeMarkdown` helper functions provide utility for
link processing.
- **`ingest-file-links-sk.scss`:** This file contains the SASS styles for the
custom element, defining its appearance, including list styling and spinner
positioning.
- **`ingest-file-links-sk-demo.html` and `ingest-file-links-sk-demo.ts`:**
These files provide a demonstration page for the element. The demo page uses
`fetch-mock` to simulate the backend API response, allowing developers to
see the element in action and test its functionality in isolation.
- **`ingest-file-links-sk_test.ts`:** This file contains unit tests for the
`IngestFileLinksSk` element. It uses `fetch-mock` to simulate various API
responses and asserts the element's behavior, such as correct link
rendering, spinner state, and error handling.
- **`ingest-file-links-sk_puppeteer_test.ts`:** This file contains
Puppeteer-based end-to-end tests. These tests load the demo page in a
headless browser and verify the element's visual rendering and basic
functionality.
**Key Workflow: Loading and Displaying Links**
```
User Action/Page Load -> Calls ingest-file-links-sk.load(commit, traceID)
|
V
ingest-file-links-sk: Show spinner-sk
|
V
Make POST request to /_/details/?results=false
(with commit and traceID in request body)
|
V
Backend API: Processes request, retrieves links for the
given commit and trace
|
V
ingest-file-links-sk: Receives JSON response (ingest.Format)
|
+----------------------+
| |
V V
Response OK? Response Error?
| |
V V
Parse links Display error message
Hide spinner Hide spinner
Render link list
```
# Module: /modules/json
## JSON Module Documentation
This module defines TypeScript interfaces and types that represent the structure
of JSON data used throughout the Perf application. It essentially acts as a
contract between the Go backend and the TypeScript frontend, ensuring data
consistency and type safety.
**Why:**
The primary motivation for this module is to leverage TypeScript's strong typing
capabilities. By defining these interfaces, we can catch potential data
inconsistencies and errors at compile time rather than runtime. This is
particularly crucial for a data-intensive application like Perf, where the
frontend relies heavily on JSON responses from the backend.
Furthermore, these definitions are **automatically generated** from Go struct
definitions. This ensures that the frontend and backend data models remain
synchronized. Any changes to the Go structs will trigger an update to these
TypeScript interfaces, reducing the likelihood of manual errors and
inconsistencies.
**How:**
The `index.ts` file contains all the interface and type definitions. These are
organized into a flat structure for simplicity, with some nested namespaces
(e.g., `pivot`, `progress`, `ingest`) where logical grouping is beneficial.
A key design choice is the use of **nominal typing** for certain primitive types
(e.g., `CommitNumber`, `TimestampSeconds`, `Trace`). This is achieved by
creating type aliases that are branded with a unique string literal type. For
example:
```typescript
export type CommitNumber = number & {
_commitNumberBrand: 'type alias for number';
};
export function CommitNumber(v: number): CommitNumber {
return v as CommitNumber;
}
```
This prevents accidental assignment of a generic `number` to a `CommitNumber`
variable, even though they are structurally identical at runtime. This adds an
extra layer of type safety, ensuring that, for example, a timestamp is not
inadvertently used where a commit number is expected. Helper functions (e.g.,
`CommitNumber(v: number)`) are provided for convenient type assertion.
**Key Components/Files/Submodules:**
- **`index.ts`**: This is the sole file in this module and contains all the
TypeScript interface and type definitions. It serves as the single source of
truth for JSON data structures used in the frontend.
- **Interfaces (e.g., `Alert`, `DataFrame`, `FrameRequest`,
`Regression`)**: These define the shape of complex JSON objects. For
instance, the `Alert` interface describes the structure of an alert
configuration, including its query, owner, and various detection
parameters. The `DataFrame` interface represents the core data structure
for displaying traces, including the actual trace data (`traceset`),
column headers (`header`), and associated parameter sets (`paramset`).
- **Type Aliases (e.g., `ClusterAlgo`, `StepDetection`, `Status`)**: These
define specific allowed string values for certain properties, acting
like enums. For example, `ClusterAlgo` can only be `'kmeans'` or
`'stepfit'`, ensuring that only valid clustering algorithms are
specified.
- **Nominally Typed Aliases (e.g., `CommitNumber`, `TimestampSeconds`,
`Trace`, `ParamSet`)**: As explained above, these provide stronger type
checking for primitive types that have specific semantic meaning within
the application. `TraceSet`, for example, is a map where keys are trace
identifiers (strings) and values are `Trace` arrays (nominally typed
`number[]`).
- **Namespaced Interfaces (e.g., `pivot.Request`, `ingest.Format`)**: Some
interfaces are grouped under namespaces to organize related data
structures. For example, `pivot.Request` defines the structure for
requesting pivot table operations, including grouping criteria and
aggregation operations. The `ingest.Format` interface defines the
structure of data being ingested into Perf, including metadata like Git
hash and the actual performance results.
- **Utility/Generic Types (e.g., `ReadOnlyParamSet`, `AnomalyMap`)**:
These represent common data patterns. `ReadOnlyParamSet` is a map of
parameter names to arrays of their possible string values, marked as
read-only to reflect its typical usage. `AnomalyMap` is a nested map
structure used to associate anomalies with specific commits and traces.
**Workflow Example: Requesting and Displaying Trace Data**
A common workflow involves the frontend requesting trace data from the backend
and then displaying it.
1. **Frontend (Client) prepares a `FrameRequest`:**
```
Client Code --> Creates `FrameRequest` object:
{
begin: 1678886400, // Start timestamp
end: 1678972800, // End timestamp
queries: ["config=gpu&name=my_test_trace"],
// ... other properties
}
```
2. **Frontend sends the `FrameRequest` to the Backend (Server).**
3. **Backend processes the request and generates a `FrameResponse`:**
```
Server Logic --> Processes `FrameRequest`
--> Fetches data from database/cache
--> Constructs `FrameResponse` object:
{
dataframe: {
traceset: { "config=gpu&name=my_test_trace": [10.1, 10.5, 9.8, ...Trace] },
header: [ { offset: 12345, timestamp: 1678886400 }, ...ColumnHeader[] ],
paramset: { "config": ["gpu", "cpu"], "name": ["my_test_trace"] }
},
skps: [0, 5, 10], // Indices of significant points
// ... other properties like msg, display_mode, anomalymap
}
```
4. **Backend sends the `FrameResponse` (as JSON) back to the Frontend.**
5. **Frontend receives the JSON and parses it, expecting it to conform to the
`FrameResponse` interface:** ``Client Code --> Receives JSON --> Parses JSON
into a `FrameResponse` typed object --> Uses
`frameResponse.dataframe.traceset` to render charts --> Uses
`frameResponse.dataframe.header` to display commit information``
This typed interaction ensures that if the backend, for example, renamed
`traceset` to `trace_data` in its Go struct, the automatic generation would
update the `DataFrame` interface. The TypeScript compiler would then flag an
error in the frontend code trying to access `frameResponse.dataframe.traceset`,
preventing a runtime error and guiding the developer to update the frontend code
accordingly.
# Module: /modules/json-source-sk
The `json-source-sk` module provides a custom HTML element, `<json-source-sk>`,
designed to display the raw JSON data associated with a specific data point in a
trace. This is particularly useful in performance analysis and debugging
scenarios where understanding the exact input data ingested by the system is
crucial.
The core responsibility of this module is to fetch and present JSON data in a
user-friendly dialog. It aims to simplify the process of inspecting the source
data for a given commit and trace identifier.
The key component is the `JSONSourceSk` class, defined in `json-source-sk.ts`.
This class extends `ElementSk`, a base class for custom elements in the Skia
infrastructure.
**How it Works:**
1. **Initialization and Properties:**
- The element requires two primary properties to be set:
- `cid`: The Commit ID (represented as `CommitNumber`), which
identifies a specific version or point in time.
- `traceid`: A string identifier for the specific trace being
examined.
- When these properties are set, the element renders itself. If `traceid`
is not a valid key (checked by `validKey` from
`perf/modules/paramtools`), the control buttons are hidden.
2. **User Interaction and Data Fetching:**
- The element displays two buttons: "View Json File" and "View Short Json
File".
- Clicking either button triggers the `_loadSource` or `_loadSourceSmall`
methods, respectively.
- These methods internally call `_loadSourceImpl`. This implementation
detail allows for sharing the core fetching logic while differentiating
the request URL.
- `_loadSourceImpl` constructs a `CommitDetailsRequest` object containing
the `cid` and `traceid`.
- It then makes a POST request to the `/_/details/` endpoint.
- If "View Short Json File" was clicked (`isSmall` is true), the URL
includes `?results=false`, indicating to the backend that a
potentially truncated or summarized version of the JSON is
requested.
- A `spinner-sk` element is activated to provide visual feedback
during the fetch operation.
- The response from the server is parsed as JSON using `jsonOrThrow`. If
the request is successful, the JSON data is formatted with indentation
and stored in the `_json` private property.
- The element is then re-rendered to display the fetched JSON.
- If an error occurs during fetching or parsing, `errorMessage` (from
`perf/modules/errorMessage`) is used to display an error notification to
the user.
3. **Displaying the JSON:**
- The fetched JSON data is displayed within a `<dialog>` element
(`#json-dialog`).
- The `jsonFile()` method in the template is responsible for rendering the
`<pre>` tag containing the formatted JSON string, but only if `_json` is
not empty.
- The dialog is shown using `showModal()`, providing a modal interface for
viewing the JSON.
- A close button (`#closeIcon` with a `close-icon-sk`) allows the user to
dismiss the dialog. Closing the dialog also clears the `_json` property.
**Design Rationale:**
- **Dedicated Element:** Creating a dedicated custom element encapsulates the
functionality of fetching and displaying JSON, making it reusable across
different parts of the application where such inspection is needed.
- **Asynchronous Fetching:** The use of `async/await` and `fetch` allows for
non-blocking data retrieval, ensuring the UI remains responsive while
waiting for the server.
- **Error Handling:** Incorporating error handling via `jsonOrThrow` and
`errorMessage` provides a better user experience by informing users about
issues during data retrieval.
- **Clear Visual Feedback:** The `spinner-sk` element clearly indicates when
data is being loaded.
- **Modal Dialog:** Using a modal dialog (`<dialog>`) for displaying the JSON
helps focus the user's attention on the data without cluttering the main
interface.
- **Option for Short JSON:** The "View Short Json File" option caters to
scenarios where the full JSON might be excessively large, providing a way to
quickly inspect a summary or a smaller subset of the data. This can improve
performance and readability for very large JSON files.
- **Styling and Theming:** The SCSS file (`json-source-sk.scss`) provides
basic styling and leverages existing button styles
(`//elements-sk/modules/styles:buttons_sass_lib`). It also includes
considerations for dark mode by using CSS variables like `--on-background`
and `--background`.
**Workflow Example: Viewing JSON Source**
```
User Sets Properties Element Renders User Clicks Button Fetches Data Displays JSON
-------------------- --------------- ------------------ ------------ -------------
[json-source-sk -> [Buttons visible] -> ["View Json File"] -> POST /_/details/ -> <dialog>
.cid = 123 {cid, traceid} <pre>{json}</pre>
.traceid = ",foo=bar,"] </dialog>
(spinner active)
|
V
Response Received
(spinner inactive)
```
The demo page (`json-source-sk-demo.html` and `json-source-sk-demo.ts`)
illustrates how to use the `<json-source-sk>` element. It sets up mock data
using `fetchMock` to simulate the backend endpoint and programmatically clicks
the button to demonstrate the JSON loading functionality.
The Puppeteer test (`json-source-sk_puppeteer_test.ts`) ensures the element
renders correctly and performs basic visual regression testing.
# Module: /modules/new-bug-dialog-sk
The `new-bug-dialog-sk` module provides a user interface element for filing new
bugs related to performance anomalies. It aims to streamline the bug reporting
process by pre-filling relevant information and integrating with the Buganizer
issue tracker.
**Core Functionality:**
The primary responsibility of this module is to display a dialog that allows
users to input details for a new bug. This dialog is populated with information
derived from one or more selected `Anomaly` objects. The user can then review
and modify this information before submitting the bug.
**Key Design Decisions and Implementation Choices:**
- **Pre-population of Bug Details:** To reduce manual effort and ensure
consistency, the dialog attempts to intelligently pre-fill fields like the
bug title, labels, and components.
- The bug title is generated based on the nature (regression/improvement),
magnitude (percentage change), and affected revision range of the
anomalies. This logic, found in `getBugTitle()`, mimics the behavior of
the legacy Chromeperf UI to maintain familiarity for users.
- Labels and components are aggregated from all selected anomalies. Unique
labels are presented as checkboxes (defaulting to checked), and unique
components are presented as radio buttons (with the first one selected
by default). This is handled by `getLabelCheckboxes()` and
`getComponentRadios()`.
- **Dynamic UI Generation:** The dialog's content, specifically the label
checkboxes and component radio buttons, is dynamically generated based on
the provided `Anomaly` data. This ensures that only relevant options are
presented to the user. Lit-html's templating capabilities are used for this
dynamic rendering.
- **User Contextualization:** The dialog attempts to automatically CC the
logged-in user on the new bug. This is achieved by fetching the user's login
status via `/alogin-sk`.
- **Asynchronous Bug Filing:** The actual bug filing process is asynchronous.
When the user submits the form, a POST request is made to the
`/_/triage/file_bug` endpoint.
- A spinner (`spinner-sk`) is displayed during this operation to provide
visual feedback.
- Upon successful bug creation, the user is redirected to the newly
created bug page in a new tab, and an `anomaly-changed` event is
dispatched to notify other components (like `explore-simple-sk` or
`chart-tooltip-sk`) that the anomalies have been updated with the new
bug ID.
- If an error occurs, an error message is displayed using
`error-toast-sk`, and the dialog remains open, allowing the user to
retry or correct information.
- **Standard HTML Dialog Element:** The core dialog functionality leverages
the native `<dialog>` HTML element, which provides built-in accessibility
and modal behavior.
**Workflow: Filing a New Bug**
1. **Initialization:** An external component (e.g., a chart displaying
anomalies) invokes the `setAnomalies()` method on `new-bug-dialog-sk`,
passing the relevant `Anomaly` objects and associated trace names.
2. **Opening the Dialog:** The external component calls the `open()` method.
`User Action (e.g., click "File Bug" button) | V External Component
--[setAnomalies(anomalies, traceNames)]--> new-bug-dialog-sk | V External
Component --[open()]--> new-bug-dialog-sk`
3. **Dialog Population:** - `new-bug-dialog-sk` fetches the current user's login status to pre-fill
the CC field. - The `_render()` method is called, which uses the Lit-html template. - `getBugTitle()` generates a suggested title. - `getLabelCheckboxes()` and `getComponentRadios()` create the UI for
selecting labels and components based on the input anomalies. - The dialog (`<dialog id="new-bug-dialog">`) is displayed modally.
``new-bug-dialog-sk.open() | V [Fetch Login Status] --> Updates `_user`
| V _render() |--> getBugTitle() --> Populates Title Input |-->
getLabelCheckboxes() --> Creates Label Checkboxes |-->
getComponentRadios() --> Creates Component Radios | V Dialog is
displayed to the user``
4. **User Interaction:** The user reviews and potentially modifies the
pre-filled information (title, description, labels, component, assignee,
CCs).
5. **Submission:** The user clicks the "Submit" button. `User clicks "Submit" |
V Form Submit Event | V new-bug-dialog-sk.fileNewBug()`
6. **Bug Filing Process:** - The `fileNewBug()` method is invoked. - The spinner is activated, and form buttons are disabled. - Form data (title, description, selected labels, selected component,
assignee, CCs, anomaly keys, trace names) is collected. - A POST request is sent to `/_/triage/file_bug` with the collected data.
`fileNewBug() | V [Activate Spinner, Disable Buttons] | V [Extract Form
Data] | V fetch('/_/triage/file_bug', {POST, body: jsonData})`
7. **Response Handling:** - **Success:** - The server responds with a JSON object containing the `bug_id`. - The spinner is deactivated, and buttons are re-enabled. - The dialog is closed. - A new browser tab is opened to the URL of the created bug (e.g.,
`https://issues.chromium.org/issues/BUG_ID`). - The `bug_id` is updated in the local `_anomalies` array. - An `anomaly-changed` custom event is dispatched with the updated
anomalies and bug ID. - **Failure:** - The server responds with an error. - The spinner is deactivated, and buttons are re-enabled. - An error message is displayed to the user via `errorMessage()`. The
dialog remains open. `fetch Response | +-- Success (HTTP 200, valid
JSON with bug_id) | | | V | [Deactivate Spinner, Enable Buttons] | |
| V | closeDialog() | | | V | window.open(bugUrl, '_blank') | | | V
| Update local _anomalies with bug_id | | | V |
dispatchEvent('anomaly-changed', {anomalies, bugId}) | +-- Failure
(HTTP error or invalid JSON) | V [Deactivate Spinner, Enable
Buttons] | V errorMessage(errorMsg) --> Displays error toast`
**Key Files:**
- **`new-bug-dialog-sk.ts`**: This is the core file containing the
`NewBugDialogSk` class definition, which extends `ElementSk`. It includes
the Lit-html template for the dialog, the logic for populating form fields
based on `Anomaly` data, handling form submission, interacting with the
backend API to file the bug, and managing the dialog's visibility and state.
- **`new-bug-dialog-sk.scss`**: This file defines the styles for the dialog,
ensuring it integrates visually with the rest of the application and themes.
It styles the dialog container, input fields, buttons, and the close icon.
- **`new-bug-dialog-sk-demo.ts` and `new-bug-dialog-sk-demo.html`**: These
files provide a demonstration page for the `new-bug-dialog-sk` element. The
`.ts` file sets up mock data (`Anomaly` objects) and mock fetch responses to
simulate the bug filing process, allowing for isolated testing and
development of the dialog. The `.html` file includes the `new-bug-dialog-sk`
element and a button to trigger its opening.
- **`index.ts`**: This file simply imports `new-bug-dialog-sk.ts` to ensure
the custom element is defined and available for use.
The module relies on several other elements and libraries:
- `alogin-sk`: To determine the logged-in user for CC'ing.
- `close-icon-sk`: For the dialog's close button.
- `spinner-sk`: To indicate activity during bug filing.
- `error-toast-sk` (via `errorMessage` utility): To display error messages.
- `lit`: For templating and component rendering.
- `jsonOrThrow`: A utility for parsing JSON responses and throwing errors on
failure.
# Module: /modules/paramtools
The `paramtools` module provides a TypeScript implementation of utility
functions for manipulating parameter sets and structured keys. It mirrors the
functionality found in the Go module `/infra/go/paramtools`, which is the
primary source of truth for these operations. The decision to replicate this
logic in TypeScript is to enable client-side applications to perform these
common tasks without needing to make server requests for simple transformations
or validations. This approach improves performance and reduces server load for
UI-driven interactions.
The core responsibility of this module is to provide robust and consistent ways
to:
1. **Create and parse structured keys:** Structured keys are a fundamental
concept for identifying specific data points (e.g., traces in performance
data).
2. **Manipulate `ParamSet` objects:** `ParamSet`s are used to represent
collections of possible parameter values, often used for filtering or
querying data.
Key functionalities and their "why" and "how":
- **`makeKey(params: Params | { [key: string]: string }): string`**:
- **Why**: To create a canonical string representation of a set of
key-value parameters. This canonical form is essential for consistent
identification and comparison of data points. The keys within the
structured key are sorted alphabetically to ensure that the same set of
parameters always produces the same key, regardless of the order in
which they were provided.
- **How**: It takes a `Params` object (a dictionary of string key-value
pairs). It first checks if the `params` object is empty, throwing an
error if it is, as a key must represent at least one parameter. Then, it
sorts the keys of the `params` object alphabetically. Finally, it
constructs the string by joining each key-value pair with `=` and then
joining these pairs with `,`, prefixing and suffixing the entire string
with a comma. `Input: { "b": "2", "a": "1", "c": "3" } | V Sort keys: [
"a", "b", "c" ] | V Format pairs: "a=1", "b=2", "c=3" | V Join and wrap:
",a=1,b=2,c=3,"`
- **`fromKey(structuredKey: string, attribute?: string): Params`**:
- **Why**: To convert a structured key string back into a `Params` object,
making it easier to work with the individual parameters
programmatically. It also handles the removal of special functions that
might be embedded in the key (e.g., `norm(...)` for normalization).
- **How**: It first calls `removeSpecialFunctions` to strip any function
wrappers from the key. Then, it splits the key string by the comma
delimiter. Each resulting segment (if not empty) is then split by the
equals sign to separate the key and value. These key-value pairs are
collected into a new `Params` object. An optional `attribute` parameter
allows excluding a specific key from the resulting `Params` object,
which can be useful in scenarios where certain attributes are metadata
and not part of the core parameters.
- **`removeSpecialFunctions(key: string): string`**:
- **Why**: Structured keys can sometimes include functional wrappers
(e.g., `norm(...)`, `avg(...)`) or special markers (e.g.,
`special_zero`). This function is designed to strip these away,
returning the "raw" underlying key. This is important when you need to
work with the base parameters without the context of the applied
function or special condition.
- **How**: It uses regular expressions to detect if the key matches a
pattern like `function_name(,param1=value1,...)`. If a match is found,
it extracts the content within the parentheses. The extracted string (or
the original key if no function was found) is then processed by
`extractNonKeyValuePairsInKey`.
- **`extractNonKeyValuePairsInKey(key: string): string`**: This helper
function further refines the key string. It splits the string by commas
and filters out any segments that do not represent a valid `key=value`
pair. This helps to remove extraneous parts like `special_zero` that
might be comma-separated but aren't true parameters. The valid pairs are
then re-joined and wrapped with commas.
- **`validKey(key: string): boolean`**:
- **Why**: To provide a simple client-side check to determine if a string
is a "valid" basic structured key, meaning it's not a key representing a
calculation (like `avg(...)`) or other special trace types. This is a
lightweight validation, as the server performs more comprehensive
checks.
- **How**: It checks if the key string starts and ends with a comma. This
is a convention for simple, non-functional structured keys.
- **`addParamsToParamSet(ps: ParamSet, p: Params): void`**:
- **Why**: To add a new set of parameters (from a `Params` object) to an
existing `ParamSet`. `ParamSet`s store unique values for each parameter
key. This function ensures that when new parameters are added, only new
values are appended to the existing lists for each key, maintaining
uniqueness.
- **How**: It iterates through the key-value pairs of the input `Params`
object (`p`). For each key, it retrieves the corresponding array of
values from the `ParamSet` (`ps`). If the key doesn't exist in `ps`, a
new array is created. If the value from `p` is not already present in
the array, it's added.
- **`paramsToParamSet(p: Params): ParamSet`**:
- **Why**: To convert a single `Params` object (representing one specific
combination of parameters) into a `ParamSet`. In a `ParamSet`, each key
maps to an array of values, even if there's only one value.
- **How**: It creates a new, empty `ParamSet`. Then, for each key-value
pair in the input `Params` object, it creates a new entry in the
`ParamSet` where the key maps to an array containing just that single
value.
- **`addParamSet(p: ParamSet, ps: ParamSet | ReadOnlyParamSet): void`**:
- **Why**: To merge one `ParamSet` (or `ReadOnlyParamSet`) into another.
This is useful for combining sets of available parameter options, for
example, when aggregating data from multiple sources.
- **How**: It iterates through the keys and their associated value arrays
in the source `ParamSet` (`ps`). If a key from `ps` is not present in
the target `ParamSet` (`p`), the entire key and its value array (cloned)
are added to `p`. If the key already exists in `p`, it iterates through
the values in the source array and adds any values that are not already
present in the target array for that key.
- **`toReadOnlyParamSet(ps: ParamSet): ReadOnlyParamSet`**:
- **Why**: To provide a type assertion that casts a mutable `ParamSet` to
an immutable `ReadOnlyParamSet`. This is useful for signaling that a
`ParamSet` should not be modified further, typically when passing it to
components or functions that expect read-only data.
- **How**: It performs a type assertion. No actual data transformation
occurs; it's a compile-time type hint.
- **`queryFromKey(key: string): string`**:
- **Why**: To convert a structured key into a URL query string format
(e.g., `a=1&b=2&c=3`). This is specifically useful for frontend
applications, like `explore-simple-sk`, where state or filters are often
represented in the URL.
- **How**: It first uses `fromKey` to parse the structured key into a
`Params` object. Then, it leverages the `URLSearchParams` browser API to
construct a query string from these parameters. This ensures proper URL
encoding of keys and values. `Input Key: ",a=1,b=2,c=3," | V fromKey ->
Params: { "a": "1", "b": "2", "c": "3" } | V URLSearchParams -> Query
String: "a=1&b=2&c=3"`
The design choice to have these functions operate with less stringent validation
than their server-side Go counterparts is deliberate. The server remains the
ultimate authority on data validity. These client-side functions prioritize ease
of use and performance for UI interactions, assuming that the data they operate
on has either originated from or will eventually be validated by the server.
The `index_test.ts` file provides comprehensive unit tests for these functions,
ensuring their correctness and robustness across various scenarios, including
handling empty inputs, duplicate values, and special key formats. This focus on
testing is crucial for maintaining the reliability of these foundational utility
functions.
# Module: /modules/perf-scaffold-sk
The `perf-scaffold-sk` module provides a consistent layout and navigation
structure for all pages within the Perf application. It acts as a wrapper,
ensuring that common elements like the title bar, navigation sidebar, and error
notifications are present and behave uniformly across different sections of
Perf.
**Core Responsibilities:**
- **Layout Management:** Establishes the primary visual structure, dividing
the page into a header, a collapsible sidebar for navigation, and a main
content area.
- **Navigation:** Provides a standardized set of navigation links in the
sidebar, allowing users to easily access different Perf features (e.g., New
Query, Favorites, Alerts).
- **Global Elements:** Hosts globally relevant components like the login
status (`alogin-sk`), theme chooser (`theme-chooser-sk`), and error/toast
notifications (`error-toast-sk`).
- **Dynamic Content Injection:** Handles the placement of page-specific
content into the `main` content area and allows for specific content (like
help text) to be injected into the sidebar.
**Key Components and Design Decisions:**
- **`perf-scaffold-sk.ts`:** This is the heart of the module, defining the
`PerfScaffoldSk` custom element.
- **Why:** Encapsulating the scaffold logic within a custom element
promotes reusability and modularity. It allows any Perf page to adopt
the standard layout simply by including this element.
- **How:** It uses Lit for templating and rendering the structure
(`<app-sk>`, `header`, `aside#sidebar`, `main`, `footer`).
- **Content Redistribution:** A crucial design choice is how it handles
child elements. Since it doesn't use Shadow DOM for the main content
area (to allow global styles to apply easily to the page content), it
programmatically moves children of `<perf-scaffold-sk>` into the
`<main>` section.
- **Process:**
1. When `connectedCallback` is invoked, existing children of
`<perf-scaffold-sk>` are temporarily moved out.
2. The scaffold's own template (header, sidebar, etc.) is rendered.
3. The temporarily moved children are then appended to the newly
rendered `<main>` element.
4. A `MutationObserver` is set up to watch for any new children added
to `<perf-scaffold-sk>` and similarly move them to `<main>`.
- **Sidebar Content:** An exception is made for elements with the specific
ID `SIDEBAR_HELP_ID`. These are moved into the `#help` div within the
sidebar. This allows pages to provide context-specific help information
directly within the scaffold.
```
<perf-scaffold-sk>
<!-- This will go into <main> -->
<div>Page specific content</div>
<!-- This will go into <aside>#help -->
<div id="sidebar_help">Contextual help</div>
</perf-scaffold-sk>
```
- **Configuration via `window.perf`:** The scaffold reads various
configuration options from the global `window.perf` object. This allows
instances of Perf to customize links (help, feedback, chat), behavior
(e.g., `show_triage_link`), and display information (e.g., instance URL,
build tag). This makes the scaffold adaptable to different Perf
deployments.
- For example, the `_helpUrl` and `_reportBugUrl` are initialized with
defaults but can be overridden by `window.perf.help_url_override` and
`window.perf.feedback_url` respectively.
- The visibility of the "Triage" link is controlled by
`window.perf.show_triage_link`.
- **Build Information:** It displays the current application build tag,
fetching it via `getBuildTag()` from
`//perf/modules/window:window_ts_lib` and linking it to the
corresponding commit in the buildbot git repository.
- **Instance Title:** It can display the name of the Perf instance,
extracted from `window.perf.instance_url`.
- **`perf-scaffold-sk.scss`:** Defines the styles for the scaffold.
- **Why:** Separates styling concerns from the element's logic.
- **How:** It uses SASS and imports common themes from
`//perf/modules/themes:themes_sass_lib`. It defines the layout,
including the sidebar width and the main content area's width (using
`calc(99vw - var(--sidebar-width))` to avoid horizontal scrollbars
caused by `100vw` including the scrollbar width). It also styles the
navigation links and other elements within the scaffold.
- **`perf-scaffold-sk-demo.html` & `perf-scaffold-sk-demo.ts`:** Provide a
demonstration page for the scaffold.
- **Why:** Allows developers to see the scaffold in action and test its
appearance and behavior in isolation.
- **How:** `perf-scaffold-sk-demo.ts` initializes a mock `window.perf`
object with various settings and then injects an instance of
`<perf-scaffold-sk>` with some placeholder content (including a `div`
with `id="sidebar_help"`) into the `perf-scaffold-sk-demo.html` page.
**Workflow: Initializing and Rendering a Page with the Scaffold**
1. A Perf page (e.g., the "New Query" page) includes `<perf-scaffold-sk>` as
its top-level layout element. `html <!-- new_query_page.html --> <body>
<perf-scaffold-sk> <!-- Content specific to the New Query page -->
<query-composer-sk></query-composer-sk> <div id="sidebar_help"> <p>Tips for
creating new queries...</p> </div> </perf-scaffold-sk> </body>`
2. The `PerfScaffoldSk` element's `connectedCallback` fires.
3. `perf-scaffold-sk.ts`: - Temporarily moves `<query-composer-sk>` and `<div
id="sidebar_help">...</div>` out of `perf-scaffold-sk`. - Renders its own internal template (header with title, login, theme
chooser; sidebar with nav links; empty main area; footer with error
toast). `<app-sk> <header>...</header> <aside id=sidebar> <div
id=links>...</div> <div id=help></div> <-- Placeholder for sidebar help
... </aside> <main></main> <-- Placeholder for main content
<footer>...</footer> </app-sk>` - The `redistributeAddedNodes` function is called: - `<query-composer-sk>` (since it doesn't have `id="sidebar_help"`) is
appended to the `<main>` element. - `<div id="sidebar_help">...</div>` is appended to the `<div
id="help">` element within the `<aside>` sidebar. - A `MutationObserver` starts listening for any further children added
directly to `<perf-scaffold-sk>`.
The final rendered structure (simplified) would look something like:
```
perf-scaffold-sk
└── app-sk
├── header
│ ├── h1.name (Instance Title)
│ ├── div.spacer
│ ├── alogin-sk
│ └── theme-chooser-sk
├── aside#sidebar
│ ├── div#links
│ │ ├── a (New Query)
│ │ ├── a (Favorites)
│ │ └── ... (other nav links)
│ ├── div#help
│ │ └── div#sidebar_help (Content from original page)
│ │ └── <p>Tips for creating new queries...</p>
│ └── div#chat
├── main
│ └── query-composer-sk (Content from original page)
└── footer
└── error-toast-sk
```
# Module: /modules/picker-field-sk
The `picker-field-sk` module provides a custom HTML element that serves as a
stylized text input field with an associated dropdown menu for selecting from a
predefined list of options. This component is designed to offer a user-friendly
way to pick a single value from potentially many choices, enhancing the user
experience in forms or selection-heavy interfaces.
**Core Functionality and Design:**
The primary goal of `picker-field-sk` is to present a familiar text input that,
upon interaction (focus or click), reveals a filterable list of valid options.
This addresses the need for a compact and efficient way to select an item,
especially when the number of options is large.
The implementation leverages the Vaadin ComboBox component (`@vaadin/combo-box`)
for its underlying dropdown and filtering capabilities. This choice was made to
utilize a well-tested and feature-rich component, avoiding the need to
reimplement complex dropdown logic, keyboard navigation, and accessibility
features. `picker-field-sk` then wraps this Vaadin component, applying custom
styling and providing a simplified API tailored to its specific use case.
**Key Responsibilities and Components:**
- **`picker-field-sk.ts`**: This is the heart of the module, defining the
`PickerFieldSk` custom element which extends `ElementSk`.
- **Properties:**
- `label`: A string that serves as both the visual label above the input
field and the placeholder text within it when empty. This provides
context to the user about the expected input.
- `options`: An array of strings representing the valid choices the user
can select from. The component dynamically adjusts the width of the
dropdown overlay to accommodate the longest option, ensuring
readability.
- `helperText`: An optional string displayed below the input field,
typically used for providing additional guidance or information to the
user.
- **Events:**
- `value-changed`: This custom event is dispatched whenever the selected
value in the combo box changes. This includes selecting an item from the
dropdown, typing a value that matches an option (due to `autoselect`),
or clearing the input. The new value is available in
`event.detail.value`. This event is crucial for parent components to
react to user selections.
- **Methods:**
- `focus()`: Programmatically sets focus to the input field.
- `openOverlay()`: Programmatically opens the dropdown list of options.
This is useful for guiding the user or for integrating with other UI
elements.
- `disable()`: Makes the input field read-only, preventing user
interaction.
- `enable()`: Removes the read-only state, allowing user interaction.
- `clear()`: Clears the current value in the input field.
- `setValue(val: string)`: Programmatically sets the value of the input
field.
- `getValue()`: Retrieves the current value of the input field.
- **Rendering:** Uses `lit-html` for templating. The template renders a
`<vaadin-combo-box>` element and binds its properties and events to the
`PickerFieldSk` element's state.
- **Overlay Width Calculation:** The `calculateOverlayWidth()` private
method dynamically adjusts the `--vaadin-combo-box-overlay-width` CSS
custom property. It iterates through the `options` to find the longest
string and sets the overlay width to be slightly larger than this
string, ensuring all options are fully visible without truncation. This
is a key usability enhancement. `User provides options -->
PickerFieldSk.options setter | V calculateOverlayWidth() | V Find max
option length | V Set --vaadin-combo-box-overlay-width CSS property`
- **`picker-field-sk.scss`**: Contains the SASS styles for the component.
- It primarily targets the underlying `vaadin-combo-box` and its shadow
parts (e.g., `::part(label)`, `::part(input-field)`, `::part(items)`) to
customize its appearance to match the application's theme (including
dark mode support).
- CSS custom properties like `--vaadin-field-default-width`,
`--vaadin-combo-box-overlay-width`, and `--lumo-text-field-size` are
used to control the dimensions and sizing of the Vaadin component.
- Dark mode styles are applied by targeting `.darkmode picker-field-sk`,
adjusting colors for labels, helper text, and input fields to ensure
proper contrast and visual integration.
- **`index.ts`**: A simple entry point that imports and thereby registers the
`picker-field-sk` custom element, making it available for use in HTML.
- **`picker-field-sk-demo.html` & `picker-field-sk-demo.ts`**: These files
create a demonstration page for the `picker-field-sk` component.
- `picker-field-sk-demo.html` includes instances of the `picker-field-sk`
element and buttons to trigger its various functionalities (focus, fill,
open overlay, disable/enable).
- `picker-field-sk-demo.ts` contains JavaScript to initialize the demo
elements with sample data (a large list of "speedometer" options to
showcase performance with many items) and to wire up the buttons to the
corresponding methods of the `PickerFieldSk` instances. This allows
developers to visually inspect and interact with the component.
**Workflow Example: User Selects an Option**
1. **Initialization**: A parent component instantiates `<picker-field-sk>` and
sets its `label` and `options` properties. `<picker-field-sk .label="Fruit"
.options=${['Apple', 'Banana', 'Cherry']}></picker-field-sk>`
2. **User Interaction**: The user clicks on or focuses the `picker-field-sk`
input. `User clicks/focuses input --> vaadin-combo-box internally handles
focus/click | V vaadin-combo-box displays dropdown with options`
3. **Filtering (Optional)**: The user types into the input field. The
`vaadin-combo-box` filters the displayed options based on the typed text.
4. **Selection**: The user clicks an option from the dropdown or presses Enter
when an option is highlighted. `User selects "Banana" --> vaadin-combo-box
updates its internal value | V vaadin-combo-box emits 'value-changed' event`
5. **Event Propagation**: - The `vaadin-combo-box` within `picker-field-sk` emits its native
`value-changed` event. - The `onValueChanged` method in `PickerFieldSk` catches this event. - `PickerFieldSk` then dispatches its own `value-changed` custom event,
with the selected value in `event.detail.value`.
`picker-field-sk.onValueChanged(vaadinEvent) | V Dispatch new
CustomEvent('value-changed', { detail: { value: vaadinEvent.detail.value
}})`
6. **Parent Component Reaction**: The parent component, listening for the
`value-changed` event on the `<picker-field-sk>` element, receives the event
and can act upon the selected value. `Parent component listens for
'value-changed' --> Accesses event.detail.value | V Update application
state`
This layered approach, building upon the Vaadin ComboBox, provides a robust and
themeable selection component while abstracting away the complexities of the
underlying library for the consumers of `picker-field-sk`.
# Module: /modules/pinpoint-try-job-dialog-sk
## Pinpoint Try Job Dialog (`pinpoint-try-job-dialog-sk`)
The `pinpoint-try-job-dialog-sk` module provides a user interface element for
initiating Pinpoint A/B try jobs.
**Purpose:**
The primary reason for this module's existence within the Perf application is to
allow users to request additional trace data for specific benchmark runs. While
Pinpoint itself supports a wider range of try job use cases, this dialog is
specifically tailored for this trace generation scenario. It's important to note
that this component is considered a legacy feature, and future development
should favor the newer Pinpoint frontend.
**How it Works:**
The dialog is designed to gather the necessary parameters from the user to
construct and submit a Pinpoint A/B try job request. This process involves:
1. **Initialization:** The dialog can be pre-populated with initial values such
as the test path, base commit, and end commit. This often happens when a
user interacts with a chart tooltip and wants to investigate a specific data
point further.
2. **User Input:** The user can modify the pre-filled values or enter new ones.
Key inputs include:
- **Base Commit:** The starting commit hash for the A/B comparison.
- **Experiment Commit:** The ending commit hash for the A/B comparison.
- **Tracing Arguments:** A string specifying the categories and options
for the trace generation. A default value is provided, and a link to
Chromium source documentation offers more details on available options.
3. **Authentication:** The dialog uses `alogin-sk` to determine the logged-in
user. The user's email is included in the try job request.
4. **Submission:** Upon submission, the dialog constructs a
`CreateLegacyTryRequest` object. This object encapsulates all the necessary
information for the Pinpoint backend.
- The `testPath` (e.g., `master/benchmark_name/story_name`) is parsed to
extract the configuration (e.g., `benchmark_name`) and the benchmark
(e.g., `story_name`).
- The `story` is typically the last segment of the `testPath`.
- The `extra_test_args` field is formatted to include the user-provided
tracing arguments.
5. **API Interaction:** The dialog sends a POST request to the `/_/try/`
endpoint with the JSON payload.
6. **Response Handling:**
- **Success:** If the request is successful, the Pinpoint backend responds
with a JSON object containing the `jobUrl` for the newly created
Pinpoint job. This URL is then displayed to the user, allowing them to
navigate to the Pinpoint UI to monitor the job's progress.
- **Error:** If an error occurs during the request or processing, an error
message is displayed to the user.
**Workflow:**
```
User Interaction (e.g., click on chart tooltip)
|
V
Dialog Pre-populated with context (testPath, commits)
|
V
pinpoint-try-job-dialog-sk.open()
|
V
User reviews/modifies input fields (Base Commit, Exp. Commit, Trace Args)
|
V
User clicks "Send to Pinpoint"
|
V
[pinpoint-try-job-dialog-sk]
- Gathers input values
- Retrieves logged-in user via alogin-sk
- Constructs `CreateLegacyTryRequest` JSON
- Sends POST request to /_/try/
|
V
[Backend Pinpoint Service]
- Processes the request
- Creates A/B try job
- Returns jobUrl (success) or error
|
V
[pinpoint-try-job-dialog-sk]
- Displays spinner during request
- On Success:
- Displays link to the created Pinpoint job (jobUrl)
- Hides spinner
- On Error:
- Displays error message
- Hides spinner
```
**Key Components/Files:**
- **`pinpoint-try-job-dialog-sk.ts`:** This is the core TypeScript file that
defines the custom element's logic.
- **`PinpointTryJobDialogSk` class:** Extends `ElementSk` and manages the
dialog's state, user input, and interaction with the Pinpoint API.
- **`template`:** Defines the HTML structure of the dialog using
`lit-html`. This includes input fields for commits and tracing
arguments, a submit button, a spinner for loading states, and a link to
the created Pinpoint job.
- **`connectedCallback()`:** Initializes the dialog, sets up event
listeners (e.g., for form submission, closing the dialog on outside
click), and fetches the logged-in user's information.
- **`setTryJobInputParams(params: TryJobPreloadParams)`:** Allows external
components to pre-fill the dialog's input fields. This is crucial for
integrating the dialog with other parts of the Perf UI, like chart
tooltips.
- **`open()`:** Displays the modal dialog.
- **`closeDialog()`:** Closes the modal dialog.
- **`postTryJob()`:** This is the central method for handling the job
submission. It reads values from the input fields, constructs the
`CreateLegacyTryRequest` payload, and makes the `fetch` call to the
Pinpoint API. It also handles the UI updates based on the API response
(showing the job URL or an error message).
- **`TryJobPreloadParams` interface:** Defines the structure for the
parameters used to pre-populate the dialog.
- **`pinpoint-try-job-dialog-sk.scss`:** Contains the SASS/CSS styles for the
dialog, ensuring it aligns with the application's visual theme. It styles
the input fields, buttons, and the overall layout of the dialog.
- **`index.ts`:** A simple entry point that imports and registers the
`pinpoint-try-job-dialog-sk` custom element.
- **`BUILD.bazel`:** Defines the build rules for the module, specifying its
dependencies (e.g., `elements-sk` components like `select-sk`, `spinner-sk`,
`alogin-sk`, and Material Web components) and how it should be compiled.
**Design Decisions:**
- **Based on `bisect-dialog-sk`:** The dialog's structure and initial
functionality were adapted from an existing bisect dialog. This likely
accelerated development by reusing common patterns for dialog interactions
and API calls.
- **Legacy Component:** The explicit note to avoid building on top of this
dialog indicates a strategic decision to migrate towards a newer Pinpoint
frontend. This suggests that this component is maintained for existing
functionality but is not the target for future enhancements related to
Pinpoint interactions.
- **Specific Use Case:** The dialog is narrowly focused on requesting
additional traces. This simplifies the UI and the request payload, making it
easier for users to achieve this specific task.
- **Client-Side Request Construction:** The `CreateLegacyTryRequest` object is
fully constructed on the client-side before being sent to the backend. This
gives the frontend more control over the request parameters.
- **Standard HTML Dialog:** The use of the `<dialog>` HTML element provides
built-in modal behavior, simplifying the implementation of showing and
hiding the dialog.
- **Error Handling:** The dialog includes basic error handling by displaying
messages returned from the API, improving the user experience when things go
wrong.
- **Spinner for Feedback:** The `spinner-sk` component provides visual
feedback to the user while the API request is in progress.
This component serves as a bridge for users of the Perf application to leverage
Pinpoint's capabilities for generating detailed trace information, even as the
broader Pinpoint tooling evolves.
# Module: /modules/pivot-query-sk
The `pivot-query-sk` module provides a custom HTML element for users to
configure and interact with pivot table requests. Pivot tables are a powerful
data summarization tool, and this element allows users to define how data should
be grouped, what aggregate operations should be performed, and what summary
statistics should be displayed.
The core of the module is the `PivotQuerySk` class, which extends `ElementSk`.
This class manages the state of the pivot request and renders the UI for user
interaction. It leverages other custom elements like `multi-select-sk` and
`select-sk` to provide intuitive input controls.
**Key Design Choices and Implementation Details:**
- **Event-Driven Updates:** The element emits a custom event, `pivot-changed`,
whenever the user modifies any part of the pivot request. This allows
consuming applications to react to changes in real-time. The event detail
(`PivotQueryChangedEventDetail`) contains the updated `pivot.Request` object
or `null` if the current configuration is invalid. This decouples the UI
component from the application logic that processes the pivot request.
- **Data Binding and Rendering:** The `PivotQuerySk` element uses Lit's `html`
templating for rendering. It maintains internal state for the
`_pivotRequest` (the current pivot configuration) and `_paramset` (the
available options for grouping). When these properties are set or updated,
the `_render()` method is called to re-render the component, ensuring the UI
reflects the current state.
- **Handling Null Pivot Requests:** The `createDefaultPivotRequestIfNull()`
method ensures that if `_pivotRequest` is initially `null`, it's initialized
with a default valid structure before any user interaction attempts to
modify it. This prevents errors and provides a sensible starting point.
- **Dynamic Option Generation:** The options for "group by" and "summary" are
dynamically generated based on the provided `_paramset` and the existing
`_pivotRequest`. The `allGroupByOptions()` method is particularly noteworthy
as it ensures that even if the `_paramset` changes, any currently selected
`group_by` keys in the `_pivotRequest` are still displayed as options. This
prevents accidental data loss during `_paramset` updates. It achieves this
by concatenating keys from both sources, sorting, and then filtering out
duplicates.
- **Input Validation:** The `pivotRequest` getter includes a call to
`validatePivotRequest` (from `pivotutil`). This ensures that the component
only returns a valid `pivot.Request` object. If the current configuration is
invalid, it returns `null`. This promotes data integrity.
**Responsibilities and Key Components:**
- **`pivot-query-sk.ts`**: This is the main file defining the `PivotQuerySk`
custom element.
- **`PivotQuerySk` class**:
- Manages the `pivot.Request` object, which defines the grouping,
operation, and summary statistics for a pivot table.
- Takes a `ParamSet` as input, which provides the available keys for the
"group by" selection. This `ParamSet` likely originates from the dataset
being analyzed.
- Renders UI controls (multi-selects and a select dropdown) for users to
specify:
- **Group By Keys**: Which parameters to use for grouping data rows
(e.g., 'config', 'os'). This uses `multi-select-sk`.
- **Operation**: The primary aggregate function to apply (e.g., 'avg',
'sum', 'count'). This uses a standard `select` element.
- **Summary Statistics**: Optional additional aggregate functions to
calculate for each group (e.g., 'stddev', 'percentile'). This also
uses `multi-select-sk`.
- Emits a `pivot-changed` custom event when the user modifies the pivot
request.
- **`PivotQueryChangedEventDetail` type**: Defines the structure of the
data passed in the `pivot-changed` event.
- **`PivotQueryChangedEventName` constant**: The string name of the custom
event.
- **Event Handlers (`groupByChanged`, `operationChanged`,
`summaryChanged`)**: These methods are triggered by user interactions
with the respective UI elements. They update the internal
`_pivotRequest` and then call `emitChangeEvent`.
- **`emitChangeEvent()`**: Constructs and dispatches the `pivot-changed`
event.
- **Property Getters/Setters (`pivotRequest`, `paramset`)**: Provide
controlled access to the element's core data, triggering re-renders when
set.
- **`pivot-query-sk.scss`**: Contains the styling for the `pivot-query-sk`
element. It ensures a consistent look and feel, leveraging styles from
`themes_sass_lib` and `select_sass_lib`. The layout is primarily flex-based
to arrange the different selection components.
- **`pivot-query-sk-demo.html` and `pivot-query-sk-demo.ts`**: These files
provide a demonstration page for the `pivot-query-sk` element.
- The HTML sets up a basic page structure and includes an instance of
`pivot-query-sk`.
- The TypeScript initializes the demo element with sample `pivot.Request`
data and a `ParamSet`. It also includes an event listener for
`pivot-changed` to display the selected pivot configuration as JSON,
illustrating how to consume the element's output.
**Workflow for User Interaction and Event Emission:**
1. **Initialization:**
- The `pivot-query-sk` element is created.
- The consuming application sets the `paramset` (available grouping keys)
and optionally an initial `pivotRequest`.
- The element renders its initial state based on these inputs.
2. **User Modifies a Selection (e.g., changes a "group by" option):**
- `multi-select-sk` (for "group by") emits a `selection-changed` event.
- `PivotQuerySk.groupByChanged()` is called.
- `createDefaultPivotRequestIfNull()` ensures `_pivotRequest` is not null.
- `_pivotRequest.group_by` is updated based on the new selection.
- `emitChangeEvent()` is called.
3. **Event Emission:**
- `emitChangeEvent()`:
- Retrieves the current `pivotRequest` (which might be `null` if
invalid).
- Creates a new `CustomEvent` named `pivot-changed`.
- The `detail` of the event is the current (potentially validated)
`pivotRequest`.
- The event is dispatched, bubbling up the DOM.
4. **Application Responds:**
- The consuming application, listening for `pivot-changed` events on the
`pivot-query-sk` element or one of its ancestors, receives the event.
- The application can then use the `event.detail` (the `pivot.Request`) to
update its data display, fetch new data, or perform other actions.
This flow can be visualized as:
```
User Interaction (e.g., click on multi-select)
|
v
Internal element event (e.g., @selection-changed from multi-select-sk)
|
v
PivotQuerySk Event Handler (e.g., groupByChanged)
|
v
Update internal _pivotRequest state
|
v
PivotQuerySk.emitChangeEvent()
|
v
Dispatch "pivot-changed" CustomEvent (with pivot.Request as detail)
|
v
Consuming Application's Event Listener
|
v
Application processes the new pivot.Request
```
# Module: /modules/pivot-table-sk
The `pivot-table-sk` module provides a custom HTML element, `<pivot-table-sk>`,
designed to display pivoted data in a tabular format. This element is
specifically for DataFrames that have been pivoted and contain summary values,
as opposed to summary traces (which would be displayed in a plot).
**Core Functionality and Design**
The primary purpose of `pivot-table-sk` is to present complex, multi-dimensional
data in an understandable and interactive table. The "why" behind its design is
to offer a user-friendly way to explore summarized data that arises from
pivoting operations.
Key design considerations include:
- **Data Input:** It takes a `DataFrame` (from
`//perf/modules/json:index_ts_lib`) and a `pivot.Request` (also from
`//perf/modules/json:index_ts_lib`) as input. The `pivot.Request` is crucial
as it dictates how the `DataFrame` was originally pivoted, including the
`group_by` keys, the main `operation`, and the `summary` operations.
- **Display:** The data is rendered as an HTML table. The table headers are
derived from the `group_by` keys and the `summary` operations.
- **Interactivity (Sorting):** Users can sort the table by clicking on column
headers. The sorting mechanism is designed to be intuitive, mimicking
spreadsheet behavior where subsequent sorts on different columns break ties
from previous sorts.
- **Query Context:** The element also displays the query parameters, the
"group by" keys, the primary operation, and the summary operations that led
to the current view of the data. This provides context to the user.
- **Validation:** It includes a mechanism to validate if the provided
`pivot.Request` is suitable for display as a pivot table (using
`validateAsPivotTable` from `//perf/modules/pivotutil:index_ts_lib`). This
prevents rendering errors or confusing displays if the input data structure
isn't appropriate.
**Key Components and Files**
- **`pivot-table-sk.ts`**: This is the heart of the module, defining the
`PivotTableSk` custom element.
- **`PivotTableSk` class:**
- Extends `ElementSk` (from `//infra-sk/modules/ElementSk:index_ts_lib`).
- Manages the input `DataFrame` (`df`), `pivot.Request` (`req`), and the
original `query` string.
- **`KeyValues` type and `keyValuesFromTraceSet` function:** This is a
critical internal data structure. `KeyValues` is an object where keys
are trace keys (e.g., `',arch=x86,config=8888,'`) and values are arrays
of strings. These string arrays represent the values of the parameters
specified in `req.group_by`, in the same order. For example, if
`req.group_by` is `['config', 'arch']`, then for the trace
`',arch=arm,config=8888,'`, the corresponding `KeyValues` entry would be
`['8888', 'arm']`. This transformation is performed by
`keyValuesFromTraceSet` and is essential for rendering the "key" columns
of the table and for sorting by these keys.
- **`SortSelection` class:** Represents the sorting state of a single
column. It stores:
- `column`: The index of the column.
- `kind`: Whether the column represents 'keyValues' (from `group_by`)
or 'summaryValues' (from `summary` operations).
- `dir`: The sort direction ('up' or 'down').
- It provides methods to `toggleDirection`, `buildCompare` (to create
a JavaScript sort comparison function based on its state), and
`encode`/`decode` for serialization.
- **`SortHistory` class:** Manages the overall sorting state of the table.
- It holds an array (`history`) of `SortSelection` objects.
- The "spreadsheet-like" multi-column sorting is achieved here. When a
user clicks a column to sort, that column's `SortSelection` is moved
to the _front_ of the `history` array, and its direction is toggled.
- `buildCompare` in `SortHistory` creates a composite comparison
function that iterates through the `SortSelection` objects in
`history`. The first `SortSelection` determines the primary sort
order. If it results in a tie, the second `SortSelection` is used to
break the tie, and so on. This creates the effect of a stable sort
across multiple user interactions without needing a true stable sort
algorithm for each click.
- It also provides `encode`/`decode` methods to serialize the entire
sort history (e.g., for persisting sort state in a URL).
- **`set()` method:** The primary way to provide data to the component. It
initializes `keyValues`, `sortHistory`, and the main `compare` function.
It can also accept an `encodedHistory` string to restore a previous sort
state.
- **Rendering Logic (Templates):** Uses `lit-html` for templating.
- `queryDefinition()`: Renders the contextual information about the
query and pivot operations.
- `tableHeader()`, `keyColumnHeaders()`, `summaryColumnHeaders()`:
Generate the table header row, including sort icons.
- `sortArrow()`: Dynamically displays the correct sort icon (up arrow,
down arrow, or neutral sort icon) based on the current
`SortHistory`.
- `tableRows()`, `keyRowValues()`, `summaryRowValues()`: Generate the
data rows of the table, applying the current sort order.
- `displayValue()`: Formats numerical values for display, converting a
special sentinel value (`MISSING_DATA_SENTINEL` from
`//perf/modules/const:const_ts_lib`) to '-'.
- **Event Emission:** Emits a `change` event when the user sorts the
table. The event detail (`PivotTableSkChangeEventDetail`) is the encoded
`SortHistory` string. This allows parent components to react to sort
changes and potentially persist the state.
- **Dependencies:**
- Relies on `paramset-sk` to display the query parameters.
- Uses various icon elements (`arrow-drop-down-icon-sk`,
`arrow-drop-up-icon-sk`, `sort-icon-sk`) for the sort indicators.
- `//perf/modules/json:index_ts_lib` for `DataFrame`, `TraceSet`,
`pivot.Request` types.
- `//perf/modules/pivotutil:index_ts_lib` for `operationDescriptions` and
`validateAsPivotTable`.
- `//perf/modules/paramtools:index_ts_lib` for `fromKey` (to parse trace
keys into parameter sets).
- `//infra-sk/modules:query_ts_lib` for `toParamSet` (to convert a query
string into a `ParamSet`).
- **`pivot-table-sk.scss`**: Provides the styling for the `pivot-table-sk`
element, including table borders, padding, text alignment, and cursor styles
for interactive elements. It leverages themes from
`//perf/modules/themes:themes_sass_lib`.
- **`index.ts`**: A simple entry point that imports and thereby registers the
`pivot-table-sk` custom element.
- **`pivot-table-sk-demo.html` & `pivot-table-sk-demo.ts`**:
- These files set up a demonstration page for the `pivot-table-sk`
element.
- `pivot-table-sk-demo.ts` creates sample `DataFrame` and `pivot.Request`
objects and uses them to populate instances of `pivot-table-sk` on the
demo page. This is crucial for development and visual testing. It
demonstrates valid use cases, cases with invalid pivot requests, and
cases with null DataFrames to ensure the component handles these
scenarios gracefully.
- **Test Files (`pivot-table-sk_test.ts`,
`pivot-table-sk_puppeteer_test.ts`)**:
- **`pivot-table-sk_test.ts` (Karma test):** Contains unit tests for the
`PivotTableSk` element and its internal logic, particularly the
`SortSelection` and `SortHistory` classes. It verifies:
- Correct initialization and rendering.
- The sorting behavior when column headers are clicked (e.g., sort
direction changes, correct sort icons appear, `change` event is emitted
with the correct encoded history).
- The `buildCompare` functions in `SortSelection` and `SortHistory`
produce the correct sorting results for various data types and sort
directions.
- The `encode` and `decode` methods for `SortSelection` and `SortHistory`
work correctly, allowing for round-tripping of sort state.
- The `keyValuesFromTraceSet` function correctly transforms `TraceSet`
data based on the `pivot.Request`.
- **`pivot-table-sk_puppeteer_test.ts` (Puppeteer test):** Performs
end-to-end tests by loading the demo page in a headless browser.
- It checks if the elements render correctly on the page (smoke test).
- It takes screenshots of the rendered component for visual regression
testing.
**Workflow Example: User Sorting the Table**
1. **Initial State:**
- The `pivot-table-sk` element is initialized with a `DataFrame`, a
`pivot.Request`, and an optional initial `encodedHistory` string.
- `pivot-table-sk` creates a `SortHistory` object. If `encodedHistory` is
provided, `SortHistory.decode()` is called. Otherwise, a default sort
order is established (usually based on the order of summary columns,
then key columns, all initially 'up').
- `SortHistory.buildCompare()` generates the initial comparison function.
- The table is rendered, sorted according to this initial comparison
function. Each column header shows a default `sort-icon-sk`.
2. **User Clicks a Column Header (e.g., "config" key column):**
- `changeSort(columnIndex, 'keyValues')` is called within
`pivot-table-sk`.
- `this.sortHistory.selectColumnToSortOn(columnIndex, 'keyValues')` is
invoked:
- The `SortSelection` for the "config" column is found in
`this.sortHistory.history`.
- It's removed from its current position.
- Its `direction` is toggled (e.g., from 'up' to 'down').
- This updated `SortSelection` is prepended to
`this.sortHistory.history`. `Before: [SummaryCol0(up),
SummaryCol1(up), KeyCol0(config, up), KeyCol1(arch, up)] Click on
KeyCol0 (config): After: [KeyCol0(config, down), SummaryCol0(up),
SummaryCol1(up), KeyCol1(arch, up)]` -`this.compare = this.sortHistory.buildCompare(...)`is called. A new
composite comparison function is generated. Now, rows will primarily be
sorted by "config" (descending). Ties will be broken by "SummaryCol0"
(ascending), then "SummaryCol1" (ascending), and finally "KeyCol1"
(ascending).
- A`CustomEvent('change')`is dispatched. The`event.detail`contains
`this.sortHistory.encode()`, which is a string representation of the new
sort order (e.g., "dk0-su0-su1-ku1").
- `this.\_render()`is called, re-rendering the table with the new sort
order. The "config" column header now shows an
`arrow-drop-down-icon-sk`.
3. **User Clicks Another Column Header (e.g., "avg" summary column):**
- The process repeats. The `SortSelection` for the "avg" column is moved
to the front of `this.sortHistory.history` and its direction is toggled.
`Before: [KeyCol0(config, down), SummaryCol0(avg, up), SummaryCol1(sum,
up), KeyCol1(arch, up)] Click on SummaryCol0 (avg): After:
[SummaryCol0(avg, down), KeyCol0(config, down), SummaryCol1(sum, up),
KeyCol1(arch, up)]` - The table is re-rendered, now primarily sorted by "avg" (descending),
with ties broken by "config" (descending), then "sum" (ascending), then
"arch" (ascending).
This multi-level sorting, driven by the `SortHistory` maintaining the sequence
of user sort actions, is a key aspect of the "how" behind the `pivot-table-sk`'s
user experience. It aims to provide a powerful yet familiar way to analyze
pivoted data.
# Module: /modules/pivotutil
The `pivotutil` module provides utility functions and constants for working with
pivot table requests. Its primary purpose is to ensure the validity and
integrity of pivot requests before they are processed, and to offer
human-readable descriptions for pivot operations. This centralization of
pivot-related logic helps maintain consistency and simplifies the handling of
pivot table configurations across different parts of the application.
### Key Components and Responsibilities
**`index.ts`**: This is the core file of the module and contains all the
exported functionalities.
- **`operationDescriptions`**:
- **Why**: Pivot operations are often represented by short, cryptic
identifiers (e.g., `avg`, `std`). To improve user experience and make
UIs more understandable, a mapping to human-readable names is necessary.
- **How**: This is a simple JavaScript object (dictionary) where keys are
the `pivot.Operation` enum values (imported from `../json`) and values
are their corresponding descriptive strings (e.g., "Mean", "Standard
Deviation"). This allows for easy lookup and display of operation names.
- **`validatePivotRequest(req: pivot.Request | null): string`**:
- **Why**: Before attempting to process a pivot request, it's crucial to
ensure that the request is structurally sound and contains the minimally
required information. This prevents runtime errors and provides early
feedback to the user or calling code if the request is malformed.
- **How**: This function performs basic validation checks on a
`pivot.Request` object.
* It first checks if the request itself is `null`. If so, it returns an
error message.
* It then verifies that the `group_by` property is present and is an array
with at least one element. A pivot table fundamentally relies on
grouping data, so this is a mandatory field.
* If all checks pass, it returns an empty string, indicating a valid
request. Otherwise, it returns a string describing the specific
validation error.
- **Workflow**: `Input: pivot.Request | null | V Is request null?
--(Yes)--> Return "Pivot request is null." | (No) V Is req.group_by null
or empty? --(Yes)--> Return "Pivot must have at least one GroupBy." |
(No) V Return "" (Valid)`
- **`validateAsPivotTable(req: pivot.Request | null): string`**:
- **Why**: Some contexts specifically require a pivot _table_ that
displays summary values, not just a pivot _plot_ which might only group
traces without performing summary calculations. This function enforces
the presence of summary operations.
- **How**:
* It first calls `validatePivotRequest` to ensure the basic structure of
the request is valid. If `validatePivotRequest` returns an error, that
error is immediately returned.
* If the basic validation passes, it then checks if the `summary` property
of the request is present and is an array with at least one element.
Summary operations (like sum, average, etc.) are essential for
generating the aggregated values displayed in a pivot table. Without
them, the request might be valid for plotting individual traces grouped
by some criteria, but not for a typical pivot table with summarized
data.
* If the `summary` array is missing or empty, an error message is
returned. Otherwise, an empty string is returned.
- **Workflow**: `Input: pivot.Request | null | V Call
validatePivotRequest(req) --> invalidMsg | V Is invalidMsg not empty?
--(Yes)--> Return invalidMsg | (No) V Is req.summary null or empty?
--(Yes)--> Return "Must have at least one Summary operation." | (No) V
Return "" (Valid for pivot table)`
**`index_test.ts`**: This file contains unit tests for the functions in
`index.ts`.
- **Why**: To ensure the validation logic correctly identifies valid and
invalid pivot requests under various conditions. This maintains the
reliability of the `pivotutil` module.
- **How**: It uses the `chai` assertion library to define test cases.
- For `validatePivotRequest`, it tests scenarios like:
- `null` request.
- `group_by` being `null`.
- `group_by` being an empty array.
- A completely valid request.
- For `validateAsPivotTable`, it builds upon the `validatePivotRequest`
checks and adds tests for: _ `summary` being `null`. _ `summary` being
an empty array. \* A valid request with at least one summary operation.
Each test asserts whether the validation functions return an empty
string (for valid inputs) or a non-empty error message string (for
invalid inputs) as expected.
The design decision to separate `validatePivotRequest` and
`validateAsPivotTable` allows for more granular validation. Some parts of an
application might only need the basic validation (e.g., ensuring data can be
grouped), while others specifically require summary operations for display in a
tabular format. This separation provides flexibility. The use of descriptive
error messages aids in debugging and user feedback.
# Module: /modules/plot-google-chart-sk
The `plot-google-chart-sk` module provides a custom element for rendering
interactive time-series charts using Google Charts. It is designed to display
performance data, including anomalies and user-reported issues, and allows users
to interact with the chart through panning, zooming, and selecting data points.
**Key Responsibilities:**
- **Data Visualization:** Renders line charts based on `DataTable` objects,
which are consumed from a Lit context (`dataTableContext`). This `DataTable`
typically contains time-series data where the first column is a commit
identifier (e.g., revision number or timestamp), the second is a date
object, and subsequent columns represent different data traces.
- **Interactivity:**
- **Panning:** Allows users to pan the chart horizontally by clicking and
dragging.
- **Zooming:** Supports both horizontal and vertical zooming. Users can
Ctrl-click and drag to select a region to zoom into. A reset button
allows returning to the original view.
- **Delta Calculation:** Enables users to Shift-click and drag vertically
to measure the difference (both raw and percentage) between two Y-axis
values.
- **Tooltip Display:** Shows detailed information about a data point when
the user hovers over it.
- **Data Point Selection:** Allows users to click on a data point to
select it, which can trigger other actions in the application.
- **Anomaly and Issue Display:** Overlays icons on the chart to indicate
anomalies (regressions, improvements, untriaged, ignored) and user-reported
issues at specific data points. These are also consumed from Lit contexts
(`dataframeAnomalyContext` and `dataframeUserIssueContext`).
- **Legend and Trace Management:** Includes a side panel (`side-panel-sk`)
that displays a legend for the plotted traces. Users can toggle the
visibility of individual traces using checkboxes in the side panel.
- **Dynamic Updates:** Responds to changes in data, selected ranges, and other
properties by redrawing or updating the chart view.
**Design Decisions and Implementation Choices:**
- **Google Charts Integration:** Leverages the
`@google-web-components/google-chart` library for the core charting
functionality. This provides a robust and feature-rich charting engine.
- **LitElement and Context API:** Built as a LitElement custom element, making
it easy to integrate into modern web applications. It utilizes Lit's Context
API to consume shared data like the `DataTable`, anomaly information, and
loading states from parent components or a centralized data store. This
promotes a decoupled architecture.
- **Modular Sub-components:**
- `v-resizable-box-sk`: A dedicated component for the vertical selection
box used in the "deltaY" mode. It calculates and displays the difference
between the start and end points of the drag.
- `drag-to-zoom-box-sk`: Handles the visual representation of the
selection box during the drag-to-zoom interaction. It manages the
display and dimensions of the box as the user drags.
- `side-panel-sk`: Encapsulates the legend and trace visibility controls.
This separation of concerns keeps the main chart component focused on
plotting.
- **Event-Driven Communication:** Emits custom events (e.g.,
`selection-changed`, `plot-data-mouseover`, `plot-data-select`) to notify
parent components of user interactions and chart state changes. This allows
for integration with other parts of an application.
- **Overlay for Anomalies and Issues:** Anomalies and user issues are rendered
as absolutely positioned `md-icon` elements on top of the chart. Their
positions are calculated based on the chart's layout and the data point
coordinates. This approach avoids modifying the Google Chart's internal
rendering and allows for more flexible styling and interaction with these
markers.
- **Caching and Performance:**
- Caches the Google Chart object (`this.chart`) and chart layout
information (`this.cachedChartArea`) to avoid redundant lookups.
- Maintains a `removedLabelsCache` to efficiently hide and show traces
without reconstructing the entire `DataView` each time.
- **Separate Interaction Modes:** The `navigationMode` property (`pan`,
`deltaY`, `dragToZoom`) manages the current mouse interaction state. This
simplifies event handling by directing mouse events to the appropriate logic
based on the active mode.
- **Dynamic Y-Axis Title:** The `determineYAxisTitle` method attempts to
create a meaningful Y-axis title by examining the `unit` and
`improvement_direction` parameters from the trace names. It displays these
only if they are consistent across all visible traces.
**Key Components/Files:**
- **`plot-google-chart-sk.ts`:** The core component that orchestrates the
chart display and interactions.
- Manages the Google Chart instance.
- Handles mouse events for panning, zooming, delta calculation, and data
point interactions.
- Consumes data (`DataTable`, `AnomalyMap`, `UserIssueMap`) via Lit
context.
- Renders anomaly and user issue icons as overlays.
- Communicates with `side-panel-sk` to manage trace visibility.
- Dispatches custom events for user interactions.
- **`side-panel-sk.ts`:** Implements the side panel containing the legend and
checkboxes for toggling trace visibility.
- Generates legend entries based on the `DataTable`.
- Manages the checked state of traces and communicates changes to
`plot-google-chart-sk`.
- Can display the calculated delta values from the `v-resizable-box-sk`.
- **`v-resizable-box-sk.ts`:** A custom element for the vertical resizable
selection box used during the delta calculation (Shift-click + drag).
- Displays the selection box and calculates the raw and percentage
difference between the Y-values at the start and end of the drag.
- **`drag-to-zoom-box-sk.ts`:** A custom element for the selection box used
during the drag-to-zoom interaction (Ctrl-click + drag).
- Draws a semi-transparent rectangle indicating the area to be zoomed.
- **`plot-google-chart-sk-demo.ts` and `plot-google-chart-sk-demo.html`:**
Provide a demonstration page showcasing the `plot-google-chart-sk` element
with sample data. This is crucial for development and testing.
- **`index.ts`:** Serves as the entry point for the module, importing and
registering all the custom elements defined within.
**Key Workflows:**
1. **Initial Chart Rendering:** `DataTable` (from context) ->
`plot-google-chart-sk` -> `updateDataView()` -> Creates
`google.visualization.DataView` -> Sets columns based on `domain`
(commit/date) and visible traces -> `updateOptions()` configures chart
appearance (colors, axes, view window) -> `plotElement.value.view = view`
and `plotElement.value.options = options` -> Google Chart renders. ->
`onChartReady()`: -> Caches chart object. -> Calls `drawAnomaly()`,
`drawUserIssues()`, `drawXbar()`.
2. **Panning:** User mousedown (not Shift or Ctrl) -> `onChartMouseDown()`:
`navigationMode = 'pan'` User mousemove -> `onWindowMouseMove()`: ->
Calculates deltaX based on mouse movement and current domain. -> Updates
`this.selectedRange`. -> Calls `updateOptions()` to update chart's
horizontal view window. -> Dispatches `selection-changing` event. User
mouseup -> `onWindowMouseUp()`: -> Dispatches `selection-changed` event. ->
`navigationMode = null`.
3. **Drag-to-Zoom:** User Ctrl + mousedown -> `onChartMouseDown()`:
`navigationMode = 'dragToZoom'` -> `zoomRangeBox.value.initializeShow()`:
Displays the drag box. User mousemove -> `onWindowMouseMove()`: ->
`zoomRangeBox.value.handleDrag()`: Updates the drag box dimensions. User
mouseup -> `onChartMouseUp()`: -> Calculates zoom boundaries based on drag
box and `isHorizontalZoom`. -> `zoomRangeBox.value.hide()`. ->
`showResetButton = true`. -> `updateBounds()`: Updates chart's
`hAxis.viewWindow` or `vAxis.viewWindow`. -> `navigationMode = null`.
4. **Delta Calculation (Shift-Click):** User Shift + mousedown ->
`onChartMouseDown()`: `navigationMode = 'deltaY'` ->
`deltaRangeBox.value.show()`: Displays the vertical resizable box. User
mousemove -> `onWindowMouseMove()`: ->
`deltaRangeBox.value.updateSelection()`: Updates box height and calculates
delta. -> Updates `sidePanel.value` with delta values. User Shift +
mousedown (again) or regular mousedown -> `onChartMouseDown()`: -> Toggles
`deltaRangeOn`. If finishing, `sidePanel.value.showDelta = true`. User
mouseup (after dragging) -> `onChartMouseUp()`: -> Updates `sidePanel.value`
with final delta values. -> `navigationMode = null`.
5. **Toggling Trace Visibility:** User clicks checkbox in `side-panel-sk` ->
`side-panel-sk` dispatches `side-panel-selected-trace-change`.
`plot-google-chart-sk` listens (`sidePanelCheckboxUpdate()`): -> Updates
`this.removedLabelsCache`. -> Calls `updateDataView()`: -> Recreates
`DataView`, hiding/showing columns based on `removedLabelsCache`. -> Updates
chart.
6. **Anomaly/Issue Display:** `anomalyMap` or `userIssues` (from context)
changes -> `plot-google-chart-sk.willUpdate()` ->
`plotElement.value.redraw()` (if chart already rendered). Chart redraw
triggers `onChartReady()`: -> `drawAnomaly()` / `drawUserIssues()`: ->
Iterates through anomalies/issues for visible traces. -> Calculates screen
coordinates (x, y) using `chart.getChartLayoutInterface().getXLocation()`
and `getYLocation()`. -> Clones template `md-icon` elements from slots. ->
Positions the icons absolutely within `anomalyDiv` or `userIssueDiv`.
This detailed explanation should provide a solid understanding of the
`plot-google-chart-sk` module's purpose, architecture, and key functionalities.
# Module: /modules/plot-simple-sk
The `plot-simple-sk` module provides a custom HTML element for rendering 2D line
graphs. It's designed to be interactive, allowing users to zoom, inspect
individual data points, and highlight specific traces.
**Core Functionality and Design:**
The primary goal of `plot-simple-sk` is to display time-series data or any data
that can be represented as a set of (x, y) coordinates. Key design
considerations include:
1. **Performance:** To handle potentially large datasets and maintain a smooth
user experience, the element employs several optimization techniques:
- **Dual Canvases:** It uses two `<canvas>` elements stacked on top of
each other.
- The bottom canvas (`traces`) is for drawing the static parts of the
plot: the lines, axes, and dots representing data points. These are
pre-rendered into `Path2D` objects for efficient redrawing.
- The top canvas (`overlay`) is for dynamic elements that change
frequently, such as crosshairs, zoom selection rectangles, and hover
highlights. This separation prevents unnecessary redrawing of the
entire plot.
- **`Path2D` Objects:** Trace lines and data point dots are converted into
`Path2D` objects. This allows the browser to optimize their rendering,
leading to faster redraws compared to repeatedly issuing drawing
commands.
- **k-d Tree for Point Proximity:** For features like displaying
information on mouse hover or selecting the nearest data point on click,
a k-d tree (`kd.ts`) is used. This data structure allows for efficient
searching of the closest point in a 2D space, crucial for interactivity
with potentially many data points.
- **Debounced Redraws and Calculations:** Operations like rebuilding the
k-d tree (`recalcSearchTask`) or redrawing after a zoom (`zoomTask`) are
often scheduled using `window.setTimeout`. This prevents these
potentially expensive operations from blocking the main thread and
ensures they only happen when necessary, improving responsiveness.
`requestAnimationFrame` is used for mouse movement updates to
synchronize with the browser's repaint cycle.
2. **Interactivity:**
- **Zooming:**
- **Summary and Detail Views:** The plot can optionally display a
"summary" area above the main "detail" area. The summary shows an
overview of all data, and users can drag a region on the summary to
zoom the detail view to that specific x-axis range.
- **Detail View Zoom:** Users can also drag a rectangle directly on
the detail view to zoom into a specific x and y range.
- **Zoom Stack:** The element maintains a stack of zoom levels
(`detailsZoomRangesStack`), allowing users to progressively zoom in
and potentially (though not explicitly stated as a current feature
for _out_) navigate back through zoom levels.
- **Hover and Selection:**
- Moving the mouse near a trace highlights the closest data point and
emits a `trace_focused` event.
- Clicking on a trace selects the closest data point and emits a
`trace_selected` event.
- **Crosshairs:** When the shift key is held, crosshairs are displayed,
indicating the mouse's current x and y position on the plot.
- **Highlighting Traces:** Specific traces can be programmatically
highlighted, making them stand out.
- **X-Bar and Bands:** Vertical lines (`xbar`) or regions (`bands`) can be
drawn on the plot to mark specific x-axis values or ranges.
- **Anomalies and User Issues:** The plot can display markers for
anomalies (regressions, improvements) and user-reported issues at
specific data points.
3. **Appearance and Theming:**
- **Responsive Sizing:** The plot adapts to the `width` and `height`
attributes of the custom element and uses `ResizeObserver` to redraw
when its dimensions change.
- **Device Pixel Ratio:** It accounts for `window.devicePixelRatio` to
render crisply on high-DPI displays by drawing to a larger canvas and
then scaling it down with CSS transforms.
- **CSS Variables for Theming:** The element is designed to integrate with
`elements-sk/themes` and uses CSS variables for colors (e.g.,
`--on-background`, `--success`, `--failure`), allowing its appearance to
be customized by the surrounding application's theme. It listens for
`theme-chooser-toggle` events to redraw when the theme changes.
**Key Files and Responsibilities:**
- **`plot-simple-sk.ts`:** This is the heart of the module, defining the
`PlotSimpleSk` custom element.
- **Rendering Logic:** Contains all the drawing code for the traces, axes,
labels, summary view, detail view, crosshairs, zoom indicators,
anomalies, etc. It manages the two canvas contexts (`ctx` for traces,
`overlayCtx` for overlays).
- **State Management:** Manages the internal state, including the
`lineData` (traces and their pre-rendered paths), `labels` (x-axis tick
information), current `_zoom` state, `detailsZoomRangesStack` for detail
view zooms, `hoverPt`, `crosshair`, `highlighted` traces, `_xbar`,
`_bands`, and `_anomalyDataMap`.
- **Event Handling:** Sets up event listeners for mouse interactions
(move, down, up, leave, click) to handle zooming, hovering, and
selection. It also listens for `theme-chooser-toggle` and
`ResizeObserver` events.
- **API Methods:** Exposes methods like `addLines`, `deleteLines`,
`removeAll`, and properties like `highlight`, `xbar`, `bands`, `zoom`,
`anomalyDataMap`, `userIssueMap`, and `dots` to control the plot's
content and appearance.
- **Coordinate Transformations:** Uses `d3-scale` (specifically
`scaleLinear`) to map data coordinates (domain) to canvas pixel
coordinates (range) and vice-versa. Functions like `rectFromRange` and
`rectFromRangeInvert` handle these transformations for rectangular
regions.
- **Path and Search Builders:**
- `PathBuilder`: A helper class to construct `Path2D` objects for trace
lines and dots based on the current scales and data.
- `SearchBuilder`: A helper class to prepare the data points for the
`KDTree` by converting source coordinates to canvas coordinates.
- **Drawing Areas:** Defines `SummaryArea` and `DetailArea` interfaces and
manages their respective rectangles, axes, and scaling ranges.
- **`kd.ts`:** Implements a k-d tree.
- **Purpose:** Provides an efficient way (`O(log n)` on average for
search) to find the nearest data point to a given mouse coordinate on
the canvas. This is crucial for interactivity like mouse hovering and
clicking to identify specific points on traces.
- **Implementation:** It's a trimmed-down version of an existing k-d tree
library, specifically tailored for finding the single closest 2D point.
It takes an array of points (each with `x` and `y` properties), a
distance metric function, and the dimensions to consider (`['x', 'y']`).
The `nearest()` method is the primary interface used by
`plot-simple-sk.ts`.
- **`ticks.ts`:** Responsible for generating appropriate tick marks and labels
for the time-based x-axis.
- **Purpose:** Given an array of `Date` objects representing the x-axis
values, it determines a sensible set of tick positions and their
corresponding formatted string labels (e.g., "Jul", "Mon, 8 AM", "10:30
AM").
- **Logic:** It considers the total duration spanned by the dates and
selects an appropriate time granularity (e.g., months, days, hours,
minutes) for the labels using `Intl.DateTimeFormat`. It aims for a
reasonable number of ticks (`MIN_TICKS` to `MAX_TICKS`) and uses a
`fixTicksLength` function to thin out the ticks if too many are
generated.
- **Output:** The `ticks()` function returns an array of objects, each
with an `x` (index in the original data) and a `text` (formatted label).
- **`plot-simple-sk.scss`:** Contains the SASS/CSS styles for the
`plot-simple-sk` element.
- **Layout:** Defines the positioning of the canvas elements (absolute
positioning for the overlay on top of the trace canvas).
- **Theming Integration:** Imports `themes.scss` and uses CSS variables
(e.g., `var(--on-background)`, `var(--background)`) to ensure the plot's
colors match the application's theme.
- **`index.ts`:** A simple entry point that imports `plot-simple-sk.ts` to
ensure the custom element is defined and registered with the browser.
- **Demo Files (`plot-simple-sk-demo.html`, `plot-simple-sk-demo.ts`,
`plot-simple-sk-demo.scss`):**
- Provide a live demonstration of the `plot-simple-sk` element's
capabilities.
- The HTML sets up the plot elements and buttons to trigger various
actions.
- The TypeScript file (`plot-simple-sk-demo.ts`) contains the logic to
interact with the plot, such as adding random trace data, highlighting
traces, zooming, clearing the plot, and displaying anomaly markers. It
also logs events emitted by the plot.
**Key Workflows:**
1. **Initialization and Rendering:** `ElementSk constructor` ->
`connectedCallback` -> `render` `render` -> `_render` (lit-html template
instantiation) -> `canvas.getContext` -> `updateScaledMeasurements` ->
`updateScaleRanges` -> `recalcDetailPaths` -> `recalcSummaryPaths` ->
`drawTracesCanvas`
2. **Adding Data (`addLines`):** `addLines` -> Convert `MISSING_DATA_SENTINEL`
to `NaN` -> Store in `this.lineData` -> `updateScaleDomains` ->
`recalcSummaryPaths` -> `recalcDetailPaths` -> `drawTracesCanvas`
`recalcDetailPaths` / `recalcSummaryPaths` -> For each line: `PathBuilder`
creates `linePath` and `dotsPath`. `recalcDetailPaths` -> `recalcSearch`
(schedules `recalcSearchImpl`) `recalcSearchImpl` -> `SearchBuilder`
populates points -> `new KDTree`
3. **Mouse Hover and Focus:** `mousemove` event -> `this.mouseMoveRaw` updated
`raf` loop -> checks `this.mouseMoveRaw` -> `eventToCanvasPt` -> If
`this.pointSearch`: `this.pointSearch.nearest(pt)` -> updates `this.hoverPt`
-> dispatches `trace_focused` event -> Updates `this.crosshair` (based on
shift key and `hoverPt`) -> `drawOverlayCanvas`
4. **Zooming via Summary Drag:** `mousedown` on summary -> `this.inZoomDrag =
'summary'` -> `this.zoomBegin` set `mousemove` (while dragging) -> `raf`
loop: -> `eventToCanvasPt` -> `clampToRect` (summary area) ->
`this.summaryArea.range.x.invert(pt.x)` to get source x -> `this.zoom =
[min_x, max_x]` (triggers `_zoomImpl` via setter task) `_zoomImpl` (after
timeout) -> `updateScaleDomains` -> `recalcDetailPaths` ->
`drawTracesCanvas` `mouseup` / `mouseleave` -> dispatches `zoom` event ->
`this.inZoomDrag = 'no-zoom'`
5. **Zooming via Detail Area Drag:** `mousedown` on detail ->
`this.inZoomDrag = 'details'` -> `this.zoomRect` initialized `mousemove`
(while dragging) -> `raf` loop: -> `eventToCanvasPt` -> `clampToRect`
(detail area) -> Updates `this.zoomRect.width/height` -> `drawOverlayCanvas`
(to show the dragging rectangle) `mouseup` / `mouseleave` ->
`dispatchZoomEvent` -> `doDetailsZoom` `doDetailsZoom` -> If zoom box is
large enough: `this.detailsZoomRangesStack.push(rectFromRangeInvert(...))`
-> `_zoomImpl`
6. **Drawing Process:**
- `drawTracesCanvas()`:
1. Clears the appropriate part of the main trace canvas (`this.ctx`).
2. Draws detail area:
- Saves context, clips to detail rect.
- Calls `drawXAxis` (for detail).
- Iterates `this.lineData`: draws `line.detail.linePath` and
`line.detail.dotsPath` if `this.dots` is true.
- Restores context.
- Calls `drawXAxis` again (to draw labels outside the clipped
region).
3. If `this.summary` and not dragging zoom:
- Draws summary area similarly.
4. Calls `drawYAxis` (for detail).
5. Calls `drawOverlayCanvas()`.
- `drawOverlayCanvas()`:
1. Clears the entire overlay canvas (`this.overlayCtx`).
2. If `this.summary`:
- Saves context, clips to summary rect.
- Calls `drawXBar`, `drawBands`.
- Draws detail zoom indicator box if `detailsZoomRangesStack` is
not empty.
- Draws summary zoom bars and shaded regions based on
`this._zoom`.
- Restores context.
3. Clips to detail rect:
- Calls `drawXBar`, `drawBands`.
- Draws highlighted lines.
- Draws hovered line/dots.
- Calls `drawUserIssues`, `drawAnomalies`.
- If dragging zoom in detail: draws `this.zoomRect` (dashed).
- If not dragging: draws crosshairs and hover label.
- Restores context.
This structured approach allows `plot-simple-sk` to be both feature-rich and
performant for visualizing and interacting with 2D data plots.
# Module: /modules/plot-summary-sk
The `plot-summary-sk` module provides a custom HTML element,
`<plot-summary-sk>`, designed to display a summary plot of performance data and
allow users to select a range within that plot. This is particularly useful for
visualizing trends over time or commit ranges and enabling interactive
exploration of the data.
At its core, `plot-summary-sk` leverages the Google Charts library to render an
area chart. It's designed to work with a `DataFrame`, a data structure commonly
used in Perf for holding timeseries data. The element can display data based on
either commit offsets or timestamps (`domain` attribute).
Key Responsibilities:
- **Data Visualization**: Renders an area chart representing performance data
over a specified domain (commit or date).
- **Range Selection**: Allows users to interactively select a range on the
plot. This selection can be initiated by dragging on the chart or by
programmatically setting the selection.
- **Event Emission**: Emits a `summary_selected` custom event when the user
makes or changes a selection. This event carries details about the selected
range (start, end, value, and domain).
- **Dynamic Data Loading**: Optionally, it can display controls to load more
data in either direction (earlier or later), integrating with a
`DataFrameRepository` to fetch and append new data.
- **Theming**: Adapts to theme changes (e.g., dark mode) by redrawing the
chart with appropriate styles.
- **Responsiveness**: The chart redraws itself when its container is resized,
ensuring it remains visually correct.
Key Components/Files:
- **`plot-summary-sk.ts`**: This is the main file defining the `PlotSummarySk`
LitElement.
- **Why**: It encapsulates the logic for chart rendering, user
interaction, data handling, and event emission.
- **How**:
- It consumes `DataFrame` data (from `dataTableContext`) and renders it
using `<google-chart>`.
- It manages the display of single or all traces based on the
`selectedTrace` property.
- It uses an internal `h-resizable-box-sk` element to provide the visual
selection rectangle and handles the mouse events for drawing and
resizing this selection.
- It translates between the visual coordinates of the selection box and
the data values (commit offsets or timestamps) of the underlying chart.
- It listens for `google-chart-ready` events to ensure operations like
setting a selection programmatically happen after the chart is fully
initialized.
- It provides `controlTemplate` for optional "load more data" buttons,
which interact with a `DataFrameRepository` (consumed via
`dataframeRepoContext`).
- It uses a `ResizeObserver` to detect when the element is resized and
triggers a chart redraw.
- It manages colors for different traces to ensure consistent
visualization.
- **`h-resizable-box-sk.ts`**: This file defines the `HResizableBoxSk`
LitElement, a reusable component for creating a horizontally resizable and
draggable selection box.
- **Why**: To decouple the complex UI interaction logic of drawing,
moving, and resizing a selection rectangle from the main
`plot-summary-sk` component. This promotes reusability and simplifies
the main component's logic.
- **How**:
- It renders a `div` (`.surface`) that represents the selection.
- It listens for `mousedown` events on its container to initiate an
action: 'draw' (if clicking outside the existing selection), 'drag' (if
clicking inside the selection), 'left' (if clicking on the left edge),
or 'right' (if clicking on the right edge).
- It listens for `mousemove` events on the `window` to update the
selection's position and size during an action. This ensures interaction
continues even if the mouse moves outside the element's bounds.
- It listens for `mouseup` events on the `window` to finalize the action
and emits a `selection-changed` event with the new range.
- It uses CSS to style the selection box and provide visual cues for
dragging and resizing (e.g., `cursor: move`, `cursor: ew-resize`).
- The `selectionRange` property (getter and setter) allows programmatic
control and retrieval of the selection, defined by `begin` and `end`
pixel offsets relative to the component.
- **`plot-summary-sk.css.ts`**: Contains the CSS styles for the
`plot-summary-sk` element, defined as a Lit `css` tagged template literal.
- **Why**: To encapsulate the visual styling, ensuring the plot and its
controls are laid out correctly and are visually consistent with the
application's theme.
- **How**: It uses flexbox for layout, positions the selection box
(`h-resizable-box-sk`) absolutely over the chart, and styles the
optional loading buttons and loading indicator.
- **`plot-summary-sk-demo.ts` and `plot-summary-sk-demo.html`**: Provide a
demonstration page for the `plot-summary-sk` element.
- **Why**: To allow developers to see the component in action, test its
features, and understand how to integrate it.
- **How**: The HTML sets up multiple instances of `plot-summary-sk` with
different configurations (e.g., `domain`, `selectionType`). The
TypeScript file generates sample `DataFrame` objects, converts them to
Google DataTable format, and populates the plot elements. It also
listens for `summary_selected` events and displays their details.
- **Test Files (`*.test.ts`, `*_puppeteer_test.ts`)**:
- **Why**: To ensure the component functions as expected and to prevent
regressions.
- **How**:
- Unit tests (`plot-summary-sk_test.ts`, `h_resizable_box_sk_test.ts`)
verify individual component logic, such as programmatic selection and
state changes. They often mock dependencies like the Google Chart
library or use test utilities to generate data.
- Puppeteer tests (`plot-summary-sk_puppeteer_test.ts`) perform end-to-end
testing by interacting with the component in a real browser environment.
They simulate user actions like mouse drags and verify the emitted event
details and visual output (via screenshots).
Key Workflows:
1. **Initialization and Data Display**:
```
[DataFrame via context or property]
|
v
plot-summary-sk
|
v
[willUpdate/updateDataView] --> Converts DataFrame to Google DataTable
|
v
<google-chart> --> Renders area chart
|
v
[google-chart-ready event] --> plot-summary-sk may apply cached selection
```
2. **User Selecting a Range by Drawing**:
```
User mousedowns on <plot-summary-sk> (outside existing selection in h-resizable-box-sk)
|
v
h-resizable-box-sk (action = 'draw')
|
v
User moves mouse (mousemove on window)
|
v
h-resizable-box-sk --> Updates selection box dimensions
|
v
User mouseups (mouseup on window)
|
v
h-resizable-box-sk --> Emits 'selection-changed' (with pixel coordinates)
|
v
plot-summary-sk (onSelectionChanged)
|
v
Converts pixel coordinates to data values (commit/timestamp)
|
v
Emits 'summary_selected' (with data values)
```
3. **User Resizing/Moving an Existing Selection**:
```
User mousedowns on <h-resizable-box-sk> (on edge for resize, or middle for drag)
|
v
h-resizable-box-sk (action = 'left'/'right'/'drag')
|
v
User moves mouse (mousemove on window)
|
v
h-resizable-box-sk --> Updates selection box position/dimensions
|
v
User mouseups (mouseup on window)
|
v
h-resizable-box-sk --> Emits 'selection-changed'
|
v
plot-summary-sk (onSelectionChanged) --> Converts & Emits 'summary_selected'
```
4. **Programmatic Selection**: `Application calls
plotSummarySkElement.Select(beginHeader, endHeader) OR Application sets
plotSummarySkElement.selectedValueRange = { begin: val1, end: val2 } | v
plot-summary-sk | v Caches selectedValueRange (important if chart not ready)
| v [If chart ready] --> Converts data values to pixel coordinates | v Sets
selectionRange on <h-resizable-box-sk>` If the chart is not ready when
`selectedValueRange` is set, the conversion and setting of the
`h-resizable-box-sk` selection is deferred until the `google-chart-ready`
event fires.
The design separates the concerns of data plotting (Google Charts), interactive
range selection UI (`h-resizable-box-sk`), and the overall orchestration and
data conversion logic (`plot-summary-sk`). This makes the system more modular
and easier to maintain. The use of LitElement and contexts allows for a reactive
programming model and clean integration with other parts of the Perf
application.
# Module: /modules/point-links-sk
The `point-links-sk` module is a custom HTML element designed to display links
associated with specific data points in a performance analysis context. These
links often originate from ingestion files and can include commit details, build
logs, or other relevant resources.
The primary purpose of this module is to provide users with quick access to
contextual information related to a data point. It achieves this by:
1. **Fetching and Displaying Links:** The module fetches link data from a
backend API based on a commit ID and a trace ID. It then renders these links
as clickable anchor elements.
2. **Generating Commit Range Links:** A key feature is its ability to generate
links representing the range of commits between two data points. This is
particularly useful for understanding changes that might have occurred
between two performance measurements.
- If the commit hashes for a given key (e.g., "V8 Git Hash") are different
between the current and previous data points, it constructs a URL that
shows the log of commits between those two specific commit hashes.
- If the commit hashes are the same, it simply links to the individual
commit, indicating no change in that specific dependency.
3. **Caching:** To optimize performance and avoid redundant API calls, the
module can utilize a provided cache of previously loaded commit links. If
the links for a specific commit and trace ID are already in the cache, it
will use those instead of re-fetching.
4. **User-Friendly Presentation:** Links are presented in a list format, with a
"copy to clipboard" button for each link, enhancing usability.
**Key Responsibilities and Components:**
- **`point-links-sk.ts`**: This is the core file defining the `PointLinksSk`
custom element.
- It extends `ElementSk` from `infra-sk`.
- **`load()` method**: This is the main public method responsible for
initiating the process of fetching and displaying links. It takes the
current commit ID, the previous commit ID, a trace ID, and arrays of
keys to identify which links should be treated as commit ranges and
which are general "useful links". It handles the logic for checking the
cache, fetching data from the API, processing commit ranges, and
updating the display.
- **`getLinksForPoint()` and `invokeLinksForPointApi()` methods**: These
private methods handle the actual API interaction to retrieve link data.
`getLinksForPoint` attempts to fetch from `/_/links/` first and falls
back to `/_/details/?results=false` if the initial attempt fails. It
also includes workarounds for specific data inconsistencies (e.g., V8
and WebRTC URLs).
- **`renderPointLinks()` and `renderRevisionLink()` methods**: These
methods, along with the static `template`, use `lit-html` to generate
the HTML structure for displaying the links.
- **Helper methods (`getCommitIdFromCommitUrl`, `getRepoUrlFromCommitUrl`,
`getFormattedCommitRangeText`, `extractUrlFromStringForFuchsia`)**:
These provide utility functions for parsing URLs and formatting text.
- **Data properties (`commitPosition`, `displayUrls`, `displayTexts`)**:
These store the state of the component, such as the current commit and
the links to be displayed.
- **`point-links-sk.scss`**: Provides the styling for the `point-links-sk`
element, ensuring a consistent look and feel, including styling for Material
Design icons and buttons.
- **`index.ts`**: A simple entry point that imports and thereby registers the
`point-links-sk` custom element.
- **`point-links-sk-demo.html` & `point-links-sk-demo.ts`**: These files set
up a demonstration page for the `point-links-sk` element. The
`point-links-sk-demo.ts` file uses `fetch-mock` to simulate the backend API,
allowing developers to test the component's behavior in isolation. It
demonstrates how to instantiate and use the `point-links-sk` element with
different configurations.
**Workflow for Loading and Displaying Links:**
The typical workflow when the `load()` method is called can be visualized as:
```
Caller invokes pointLinksSk.load(currentCID, prevCID, traceID, rangeKeys, usefulKeys, cachedLinks)
|
V
Check if links for (currentCID, traceID) exist in `cachedLinks`
|
+-- YES --> Use cached links
| |
| V
| Render links
|
+-- NO ---> Fetch links for `currentCID` from API (`getLinksForPoint`)
|
V
If `rangeKeys` are provided:
| Fetch links for `prevCID` from API (`getLinksForPoint`)
| For each key in `rangeKeys`:
| Extract current commit hash from `currentCID` links
| Extract previous commit hash from `prevCID` links
| If hashes are different:
| Generate "commit range" URL (e.g., .../+log/prevHash..currentHash)
| Else (hashes are same):
| Use current commit URL
| Add to `displayUrls` and `displayTexts`
|
V
If `usefulKeys` are provided:
| For each key in `usefulKeys`:
| Add corresponding link from `currentCID` links to `displayUrls`
|
V
Update cache with newly fetched/generated links for (currentCID, traceID)
|
V
Render links
```
This module is designed to be flexible, allowing the consuming application to
specify which types of links should be processed for commit ranges and which
should be displayed as direct links. The inclusion of error handling (via
`errorMessage`) and the fallback mechanism in API calls (`/_/links/` then
`/_/details/`) make it more robust.
# Module: /modules/progress
The `progress` module provides a mechanism for initiating and monitoring the
status of long-running tasks on the server. This is crucial for user experience,
as it allows the client to display progress information and avoid appearing
unresponsive during lengthy operations.
The core of this module is the `startRequest` function. This function is
designed to handle asynchronous server-side processes that might take a
significant amount of time to complete.
**How `startRequest` Works:**
1. **Initiation:**
- It begins by sending an initial POST request to a specified
`startingURL` with a given `body`. This request typically triggers the
long-running task on the server.
- If a `spinner-sk` element is provided, it's activated to visually
indicate that a process is underway.
2. **Polling:**
- The server's response to the initial request (and subsequent polling
requests) is expected to be a JSON object of type
`progress.SerializedProgress`. This object contains:
- `status`: Indicates whether the task is "Running" or "Finished" (or
potentially other states like "Error").
- `messages`: An array of key-value pairs providing more detailed
information about the current state of the task (e.g., current step,
progress percentage).
- `url`: If the `status` is "Running", this URL is used for the next
polling request to get updated progress.
- `results`: If the `status` is "Finished", this field contains the
final output of the long-running process.
- If the `status` is "Running", `startRequest` will schedule a
`setTimeout` to make a GET request to the `url` provided in the response
after a specified `period`. This creates a polling loop.
3. **Callback and Completion:**
- An optional `callback` function can be provided. This function is
invoked after each successful fetch (both the initial request and every
polling update), receiving the `progress.SerializedProgress` object.
This allows the UI to update with the latest progress information.
- The polling continues until the server responds with a `status` that is
not "Running" (e.g., "Finished").
- Once the task is complete, the Promise returned by `startRequest`
resolves with the final `progress.SerializedProgress` object.
- If a `spinner-sk` was provided, it is deactivated.
4. **Error Handling:**
- If any network request fails (e.g., non-2xx HTTP status), the Promise
returned by `startRequest` is rejected with an error.
- The spinner (if provided) is also deactivated in case of an error.
**Workflow Diagram:**
```
Client UI startRequest Function Server
---------- --------------------- ------
| |
| -- Call startRequest --> |
| | -- POST to startingURL (body) --> |
| | |
| | <-- Response (SerializedProgress) -- |
| |
| -- (Optional) Activate -- |
| Spinner |
| |
| | -- If status is "Running": --------> Schedule setTimeout(period)
| | |
| | V
| | -- GET to progress.url -----------> |
| | |
| | <-- Response (SerializedProgress) -- |
| | |
| | --- (Invoke callback) ---------> Client UI (Update progress)
| | |
| | --- Loop back to "If status is 'Running'"
| |
| | -- If status is "Finished": -------> Resolve Promise
| | |
| -- (Optional) Deactivate | <-----------------------------------
| Spinner |
| |
| <-- Promise Resolves ---- |
| (SerializedProgress) |
```
**Key Files:**
- **`progress.ts`**:
- **Responsibilities**: Implements the core logic for initiating requests,
polling for status updates, handling responses, and managing callbacks
and promises. It also provides utility functions for formatting progress
messages.
- **Key Components**:
- `startRequest`: The primary function that orchestrates the entire
progress monitoring flow. It encapsulates the logic for making the
initial POST request and subsequent GET requests for polling. The use of
a single `processFetch` internal function is a design choice to reduce
code duplication, as the response handling logic is identical for both
the initial and polling fetches.
- `messagesToErrorString`: A utility function designed to extract a
user-friendly error message from the `messages` array within
`SerializedProgress`. It prioritizes messages with the key "Error" but
falls back to concatenating all messages if no specific error message is
found. This ensures that some form of feedback is available even if the
server doesn't explicitly flag an error.
- `messagesToPreString`: Formats messages for display, typically within a
`<pre>` tag, by putting each key-value pair on a new line. This is
useful for presenting detailed progress logs.
- `messageByName`: Allows retrieval of a specific message's value by its
key from the `messages` array, with a fallback if the key is not found.
This is useful for extracting specific pieces of information from the
progress updates (e.g., the current step number).
- **Dependencies**:
- `elements-sk/modules/spinner-sk`: Used to visually indicate that a
background task is in progress.
- `perf/modules/json`: Provides the `progress.SerializedProgress` type
definition, ensuring consistency in how progress information is
structured between the client and server.
- **`progress_test.ts`**:
- **Responsibilities**: Contains unit tests for the `progress.ts` module.
- **Key Focus**:
- Verifies that `startRequest` correctly handles different server response
scenarios: immediate completion, one or more polling steps, and network
errors.
- Ensures that the optional callback is invoked correctly during the
polling process.
- Tests the behavior of the message formatting utility functions
(`messagesToErrorString`, `messageByName`) with various inputs.
- **Methodology**: Uses `fetch-mock` to simulate server responses,
allowing for controlled testing of the asynchronous network interactions
without relying on an actual backend. This is crucial for creating
reliable and fast unit tests.
The design of this module prioritizes a clear separation of concerns.
`startRequest` focuses on the communication and polling logic, while the utility
functions provide convenient ways to interpret and display the progress
information received from the server. The use of Promises simplifies handling
asynchronous operations, and the optional callback provides flexibility for
updating the UI in real-time.
# Module: /modules/query-chooser-sk
## Query Chooser Element (`query-chooser-sk`)
The `query-chooser-sk` module provides a user interface element for selecting
and modifying query parameters. It's designed to offer a compact way to display
the currently active query and provide a mechanism to change it through a
dialog.
### Core Functionality and Design
The primary goal of `query-chooser-sk` is to present a summarized view of the
current query and allow users to edit it in a more detailed interface. This is
achieved by:
1. **Displaying a summary:** The current query is displayed in a concise format
using the `paramset-sk` element. This gives users a quick overview of the
active filters.
2. **Providing an "Edit" button:** This button triggers the display of a
dialog.
3. **Embedding `query-sk` in a dialog:** The dialog contains a `query-sk`
element. This is where the user can interactively build or modify their
query by selecting values for different parameters.
4. **Showing query match count:** Alongside the `query-sk` element,
`query-count-sk` is used to display how many items match the currently
constructed query. This provides immediate feedback to the user as they
refine their selection.
5. **Event propagation:** `query-chooser-sk` listens for `query-change` events
from the embedded `query-sk` element. When a change occurs,
`query-chooser-sk` updates its own `current_query` and re-renders,
effectively propagating the change. It also emits its own `query-change`
event, allowing parent components to react to query modifications.
This design separates the concerns of displaying the current state from the more
complex interaction of query building. The dialog provides a focused environment
for query modification without cluttering the main UI.
### Key Components and Files
- **`query-chooser-sk.ts`**: This is the core TypeScript file defining the
`QueryChooserSk` custom element.
- It manages the visibility of the editing dialog.
- It orchestrates the interaction between the summary display
(`paramset-sk`), the query editing interface (`query-sk`), and the match
count display (`query-count-sk`).
- It defines properties like `current_query`, `paramset`, `key_order`, and
`count_url` which are essential for its operation and for configuring
its child elements.
- The `_editClick` and `_closeClick` methods handle the opening and
closing of the dialog.
- The `_queryChange` method is crucial for reacting to changes in the
embedded `query-sk` element and updating the `current_query`.
- **`query-chooser-sk.html` (template within `query-chooser-sk.ts`)**: This
Lit HTML template defines the structure of the element.
- It includes a `div` with class `row` to display the "Edit" button and
the `paramset-sk` summary.
- Another `div` with id `dialog` acts as the container for `query-sk`,
`query-count-sk`, and the "Close" button. The visibility of this dialog
is controlled by adding/removing the `display` class.
- **`query-chooser-sk.scss`**: This file provides the styling for the element.
It ensures proper layout of the button, summary, and the dialog content. It
also includes theming support.
- **`index.ts`**: A simple entry point that imports and registers the
`query-chooser-sk` custom element.
- **`query-chooser-sk-demo.html` / `query-chooser-sk-demo.ts`**: These files
provide a demonstration page for the element, showcasing its usage with
sample data and event handling. `fetchMock` is used in the demo to simulate
the `count_url` endpoint.
- **`query-chooser-sk_puppeteer_test.ts`**: Contains Puppeteer tests to verify
the rendering and basic functionality of the element.
### Workflow: Editing a Query
The typical workflow for a user interacting with `query-chooser-sk` is as
follows:
```
User sees current query summary & "Edit" button
|
| (User clicks "Edit")
V
Dialog appears, showing:
- `query-sk` (for selecting parameters/values)
- `query-count-sk` (displaying number of matches)
- "Close" button
|
| (User interacts with `query-sk`, changing selections)
V
`query-sk` emits "query-change" event
|
V
`query-chooser-sk` (_queryChange method):
- Updates its `current_query` property/attribute
- Re-renders to reflect new `current_query` in summary & `query-count-sk`
- Emits its own "query-change" event (for parent components)
|
| (User is satisfied with the new query)
V
User clicks "Close"
|
V
Dialog is hidden
|
V
`query-chooser-sk` displays the updated query summary.
```
The `paramset` attribute is crucial as it provides the available keys and values
that `query-sk` will use to render its selection interface. The `key_order`
attribute influences the order in which parameters are displayed within
`query-sk`. The `count_url` is passed directly to `query-count-sk` to fetch the
number of matching items for the current query.
# Module: /modules/query-count-sk
The `query-count-sk` module provides a custom HTML element designed to display
the number of results matching a given query. Its primary purpose is to offer a
dynamic and responsive way to inform users about the scope of their queries in
real-time, without requiring a full page reload or complex UI updates. This is
particularly useful in applications where users frequently refine search
criteria and need immediate feedback on the impact of those changes.
The core functionality revolves around the `QueryCountSk` class, which extends
`ElementSk`. This class manages the state of the displayed count, handles
asynchronous data fetching, and updates the UI accordingly.
**Key Components and Design Decisions:**
- **`query-count-sk.ts`:** This is the heart of the module.
- **Asynchronous Data Fetching:** When the `current_query` or `url`
attributes change, the element initiates a POST request to the specified
`url`.
- The request body includes the `current_query`, and a default time window
of the last 24 hours (`begin` and `end` timestamps). This design choice
implies that the element is typically used for querying recent data.
- To prevent race conditions and unnecessary network requests, any ongoing
fetch operation is aborted if a new query is initiated. This is achieved
using an `AbortController`. This is a crucial design decision for
performance and responsiveness, especially when users rapidly change
query parameters.
- The component expects a JSON response with a `count` (number of matches)
and a `paramset` (a read-only representation of parameters related to
the query).
- **State Management:** The `_count` property stores the fetched count as
a string, and `_requestInProgress` is a boolean flag indicating whether
a fetch operation is currently active. This flag is used to show/hide a
loading spinner (`spinner-sk`).
- **Rendering:** The component uses `lit-html` for efficient template
rendering. The template displays the `_count` and the `spinner-sk`
conditionally.
- **Event Emission:** Upon successful data retrieval, a `paramset-changed`
custom event is dispatched. This event carries the `paramset` received
from the server. This allows other components on the page to react to
changes in the available parameters based on the current query results.
This decoupling is a key design aspect for building modular UIs.
- **Error Handling:** Network errors or non-OK HTTP responses are caught,
and an error message is displayed to the user via the `errorMessage`
utility (likely from `perf/modules/errorMessage`). AbortErrors are
handled gracefully by simply stopping the current operation without
displaying an error, as this usually means the user initiated a new
action.
- **`query-count-sk.scss`:** Provides styling for the element, ensuring the
count and spinner are displayed appropriately. The `display: inline-block`
and flexbox layout for the internal `div` are chosen for simple alignment of
the count and spinner.
- **`query-count-sk-demo.html` and `query-count-sk-demo.ts`:** These files
provide a demonstration and testing environment for the `query-count-sk`
element.
- The demo sets up a `fetch-mock` to simulate server responses, allowing
for isolated testing of the component's behavior.
- It showcases how to instantiate the element and interact with its
attributes (`url`, `current_query`).
- The presence of `<error-toast-sk>` in the demo suggests that this is the
intended mechanism for displaying errors surfaced by `errorMessage`.
- **`index.ts`:** A simple entry point that imports and registers the
`query-count-sk` custom element, making it available for use in an HTML
page.
**Workflow for Displaying Query Count:**
1. **Initialization:**
- The `query-count-sk` element is added to the DOM.
- The `url` attribute (pointing to the backend endpoint) is set.
```
Page query-count-sk
| |
|--(Set url)---->|
```
2. **Query Update:**
- The `current_query` attribute is set or updated (e.g., by user input in
another part of the application).
```
Page query-count-sk
| |
|--(Set current_query)-->|
```
3. **Data Fetching:**
- The `attributeChangedCallback` (or `connectedCallback` on initial load)
triggers the `_fetch()` method.
- If a previous fetch is in progress, it's aborted.
- `_requestInProgress` is set to `true`, and the spinner becomes visible.
- A POST request is made to `this.url` with the `current_query` and time
range.
```
query-count-sk Server
| |
|--(Set _requestInProgress=true)------>| (Spinner shows)
| |
|----(POST / {q: current_query, ...})-->|
```
4. **Response Handling:**
- **Success:**
- The server responds with JSON: `{ count: N, paramset: {...} }`.
- `_count` is updated with `N`.
- `_requestInProgress` is set to `false` (spinner hides).
- The component re-renders to display the new count.
- A `paramset-changed` event is dispatched with the `paramset`.
```
query-count-sk Server
| |
|<----(HTTP 200, {count, paramset})----|
| |
|--(Update _count, _requestInProgress=false)-->| (Spinner hides, count updates)
| |
|--(Dispatch 'paramset-changed')------>| (Other components may react)
```
- **Error (e.g., network issue, server error):**
- `_requestInProgress` is set to `false` (spinner hides).
- An error message is displayed (e.g., via `error-toast-sk`).
```
query-count-sk Server
| |
|<----(HTTP Error or Network Error)----|
| |
|--(Set _requestInProgress=false)------>| (Spinner hides)
| |
|--(Display error message)------------>|
```
- **Abort:**
- If the fetch was aborted (e.g., new query initiated before
completion), the `catch` block for `AbortError` is entered.
- No UI update for count or error display happens; the new fetch
operation takes precedence.
The design emphasizes responsiveness by aborting stale requests and provides a
clear visual indication of ongoing activity (the spinner). The
`paramset-changed` event promotes loose coupling between components, allowing
other parts of the application to adapt based on the query results without
direct dependencies on `query-count-sk`'s internal implementation.
# Module: /modules/regressions-page-sk
The `regressions-page-sk` module provides a user interface for viewing and
managing performance regressions. It allows users to select a "subscription"
(often representing a team or area of ownership, like "Sheriff Config 1") and
then displays a list of detected performance anomalies (regressions or
improvements) associated with that subscription.
The core functionality revolves around fetching and displaying this data in a
user-friendly way.
**Key Responsibilities and Components:**
- **`regressions-page-sk.ts`**: This is the main TypeScript file that defines
the `RegressionsPageSk` custom HTML element.
- **State Management (`State` interface, `stateReflector`)**: The
component maintains its UI state (selected subscription, whether to show
triaged items or improvements, and a flag for using a Skia-specific
backend) in the `state` object. The `stateReflector` utility is crucial
here. It synchronizes this internal state with the URL query parameters.
This means a user can bookmark a specific view (e.g., a particular
subscription with improvements shown) and share it, or refresh the page
and return to the same state.
- _Why `stateReflector`?_ It provides a clean way to manage application
state that needs to be persistent across page loads and shareable via
URLs, without manually parsing and updating the URL.
- **Data Fetching (`fetchRegressions`, `init`)**:
- The `init` method is called during component initialization and whenever
the state changes significantly (like selecting a new subscription). It
fetches the list of available subscriptions (sheriff lists) from either
a legacy endpoint (`/_/anomalies/sheriff_list`) or a Skia-specific one
(`/_/anomalies/sheriff_list_skia`) based on the `state.useSkia` flag.
The fetched subscriptions are then sorted alphabetically for display in
a dropdown.
- The `fetchRegressions` method is responsible for fetching the actual
anomaly data. It constructs a query based on the current `state`
(selected subscription, filters for triaged/improvements, and a cursor
for pagination). It also chooses between legacy and Skia-specific
anomaly list endpoints. The fetched anomalies are then appended to the
`cpAnomalies` array, and if a cursor is returned, a "Show More" button
is made visible.
- _Why two sets of endpoints (legacy vs. Skia)?_ This suggests a migration
path or different data sources/backends being supported, allowing the
component to adapt based on configuration.
- **Rendering (`template`, `_render`)**: The component uses `lit-html` for
templating. The `template` static method defines the HTML structure,
which includes:
- A dropdown (`<select id="filter">`) to choose a subscription.
- Buttons to toggle the display of triaged items and improvements.
- A `<subscription-table-sk>` to display details about the selected
subscription and its associated alerts.
- An `<anomalies-table-sk>` to display the list of anomalies/regressions.
- Spinners (`spinner-sk`) to indicate loading states.
- A "Show More" button for paginating through anomalies.
- The `_render()` method (implicitly called by `ElementSk` when properties
change) re-renders the component with the latest data.
- **Event Handling (`filterChange`, `triagedChange`,
`improvementChange`)**: These methods handle user interactions like
selecting a subscription or toggling filters. They update the
component's `state`, trigger `stateHasChanged` (which in turn updates
the URL and can re-fetch data), and then explicitly call
`fetchRegressions` and `_render` to reflect the changes.
- **Legacy Regression Display (`getRegTemplate`, `regRowTemplate`)**:
There's also code related to displaying `regressions` directly in a
table within this component (the `regressions` property and
`getRegTemplate`). However, the primary display of anomalies seems to be
delegated to `anomalies-table-sk`. This older regression display logic
might be for a previous version or a specific use case not currently
active in the demo. The `isRegressionImprovement` static method
determines if a given regression object represents an improvement based
on direction and cluster type.
- **`anomalies-table-sk` (external dependency)**: This component is
responsible for rendering the detailed table of anomalies.
`regressions-page-sk` fetches the anomaly data and then passes it to
`anomalies-table-sk` for display. This promotes modularity, separating data
fetching/management from presentation.
- **`subscription-table-sk` (external dependency)**: This component displays
information about the currently selected subscription, including any
configured alerts. Similar to `anomalies-table-sk`, it receives data from
`regressions-page-sk`.
- **`regressions-page-sk.scss`**: Provides styling for the
`regressions-page-sk` component, including colors for positive/negative
changes and styles for spinners and buttons.
- **`regressions-page-sk-demo.html` and `regressions-page-sk-demo.ts`**: These
files set up a demonstration page for the `regressions-page-sk` component.
- `regressions-page-sk-demo.ts` is particularly important for
understanding how the component is intended to be used and tested. It
initializes a global `window.perf` object with configuration settings
that the main component might rely on (though direct usage isn't evident
in `regressions-page-sk.ts` itself, it's a common pattern in Perf).
- It uses `fetchMock` to simulate API responses for `/users/login/status`,
`/_/subscriptions`, and `/_/regressions` (which seems to be an older
endpoint pattern compared to what `regressions-page-sk.ts` uses). This
mocking is crucial for creating a standalone demo environment.
- _Why `fetchMock`?_ It allows developers to work on and test the UI
component without needing a live backend, ensuring predictable data and
behavior for demos and tests.
**Workflow for Displaying Regressions:**
1. **Initialization (`connectedCallback`, `init`)**:
- `regressions-page-sk` element is added to the DOM.
- `stateReflector` is set up to read initial state from URL or use
defaults.
- `init()` is called:
- Fetches the list of available subscriptions (e.g., "Sheriff Config
1", "Sheriff Config 2").
- Populates the subscription dropdown (`<select id="filter">`).
2. **User Selects a Subscription (`filterChange`)**:
- User selects "Sheriff Config 2" from the dropdown.
- `filterChange("Sheriff Config 2")` is triggered.
- `state.selectedSubscription` is updated to "Sheriff Config 2".
- `cpAnomalies` is cleared, `anomalyCursor` is reset.
- `stateHasChanged()` is called, updating the URL (e.g.,
`?selectedSubscription=Sheriff%20Config%202`).
- `fetchRegressions()` is called.
3. **Fetching Anomalies (`fetchRegressions`)**:
- An API request is made to
`/_/anomalies/anomaly_list?sheriff=Sheriff%20Config%202` (or the Skia
equivalent).
- A loading spinner is shown.
- The server responds with a list of anomalies and potentially a cursor
for pagination.
4. **Displaying Anomalies**:
- The fetched anomalies are appended to `this.cpAnomalies`.
- The `subscriptionTable` is updated with subscription details and alerts
from the response.
- The `anomaliesTable` (the `anomalies-table-sk` instance) is populated
with `this.cpAnomalies`.
- If a cursor was returned, the "Show More" button becomes visible.
- Loading spinner is hidden.
- The component re-renders.
```
User Action Component State API Interaction UI Update
----------- --------------- --------------- ---------
Page Load
|
V
regressions-page-sk.init()
| state = {selectedSubscription:''}
V
fetch('/_/anomalies/sheriff_list') -> ["Sheriff1", "Sheriff2"]
| subscriptionList = ["Sheriff1", "Sheriff2"]
V
Populate dropdown
Disable filter buttons
Selects "Sheriff1"
|
V
regressions-page-sk.filterChange("Sheriff1")
| state = {selectedSubscription:'Sheriff1', ...}
| (URL updates via stateReflector)
V
regressions-page-sk.fetchRegressions()
| anomaliesLoadingSpinner = true
V
fetch('/_/anomalies/anomaly_list?sheriff=Sheriff1') -> {anomaly_list: [...], anomaly_cursor: 'cursor123'}
| cpAnomalies = [...], anomalyCursor = 'cursor123', showMoreAnomalies = true
| anomaliesLoadingSpinner = false
V
Update anomaliesTable
Update subscriptionTable
Show "Show More" button
Enable filter buttons
Clicks "Show More"
|
V
regressions-page-sk.fetchRegressions() (called by button click)
| showMoreLoadingSpinner = true
V
fetch('/_/anomalies/anomaly_list?sheriff=Sheriff1&anomaly_cursor=cursor123') -> {anomaly_list: [more...], anomaly_cursor: null}
| cpAnomalies = [all...], anomalyCursor = null, showMoreAnomalies = false
| showMoreLoadingSpinner = false
V
Update anomaliesTable (append)
Hide "Show More" button
```
5. **Toggling Filters (e.g., "Show Triaged", `triagedChange`)**:
- User clicks "Show Triaged".
- `triagedChange()` is triggered.
- `state.showTriaged` is toggled.
- Button text updates (e.g., to "Hide Triaged").
- `stateHasChanged()` updates the URL (e.g.,
`?selectedSubscription=Sheriff%20Config%202&showTriaged=true`).
- `fetchRegressions()` is called again, this time with `triaged=true` in
the query.
- The UI updates with the newly filtered list of anomalies.
The design separates concerns: `regressions-page-sk` handles overall page logic,
state, and orchestration of data fetching, while specialized components like
`anomalies-table-sk` and `subscription-table-sk` handle the rendering of
specific data views. The use of `stateReflector` ensures the UI state is
bookmarkable and shareable. The demo files with `fetchMock` are critical for
isolated development and testing of the UI component.
# Module: /modules/report-page-sk
The `report-page-sk` module is designed to display a detailed report page for
performance anomalies. Its primary purpose is to provide users with a
comprehensive view of selected anomalies, including their associated graphs and
commit information, facilitating the analysis and understanding of performance
regressions or improvements.
At its core, the `report-page-sk` element orchestrates the display of several
key pieces of information. It fetches anomaly data from a backend endpoint
(`/_/anomalies/group_report`) based on URL parameters (like revision, anomaly
IDs, bug ID, etc.). This data is then used to populate an `anomalies-table-sk`
element, which presents a tabular view of the anomalies.
A crucial design decision is the use of an `AnomalyTracker` class. This class is
responsible for managing the state of each anomaly, including whether it's
selected (checked) by the user, its associated graph, and the relevant time
range for graphing. This separation of concerns keeps the main `ReportPageSk`
class cleaner and focuses its responsibilities on rendering and user
interaction.
When an anomaly is selected in the table, `report-page-sk` dynamically generates
and displays an `explore-simple-sk` graph for that anomaly. The
`explore-simple-sk` element is configured to show data around the anomaly's
occurrence, typically a week before and after, to provide context. If multiple
anomalies are selected, their graphs are displayed, and their heights are
adjusted to fit the available space. A key feature is the synchronized X-axis
across all displayed graphs, ensuring a consistent time scale for comparison.
The page also attempts to identify and display common commits related to the
selected anomalies. It fetches commit details using the `lookupCids` function
and highlights commits that appear to be "roll" commits (e.g., "Roll repo from
hash to hash"). For these roll commits, it provides a link to the underlying
commit or the parent commit if the roll pattern is not directly parseable from
the commit message, which can be helpful for developers to trace the source of a
change.
**Key Components and Responsibilities:**
- **`report-page-sk.ts`**: This is the main TypeScript file defining the
`ReportPageSk` custom element.
- **`ReportPageSk` class**:
- **Initialization**: Fetches default configurations (`/_/defaults/`) and
then anomaly data based on URL parameters.
- **Anomaly Management**: Uses an `AnomalyTracker` instance to manage the
state of individual anomalies (selected, graphed, time range).
- **Rendering**: Dynamically renders the `anomalies-table-sk` and
`explore-simple-sk` graphs based on user interactions and fetched data.
It uses the `lit-html` library for templating.
- **Event Handling**: Listens for `anomalies_checked` events from the
`anomalies-table-sk` to update the displayed graphs. It also handles
`x-axis-toggled` events from `explore-simple-sk` to synchronize the
x-axis across multiple graphs.
- **Graph Generation**: When an anomaly is selected, it creates an
`explore-simple-sk` instance, configures its query based on the
anomaly's test path, and sets the appropriate time range.
- **Commit Information**: Fetches commit details relevant to the anomalies
and displays a list of common commits, with special handling for "roll"
commits.
- **Spinner**: Shows a loading spinner (`spinner-sk`) during data fetching
operations.
- **`AnomalyTracker` class**:
- **State Management**: Stores `AnomalyDataPoint` objects, each containing
an `Anomaly`, its checked status, its associated `ExploreSimpleSk` graph
instance (if any), and its `Timerange`.
- **Loading Data**: Populates its internal tracker from a list of
anomalies and their corresponding time ranges.
- **Accessors**: Provides methods to get individual anomaly data,
set/unset graphs, and retrieve lists of all or selected anomalies. This
abstraction is key to decoupling the graph display logic from the raw
anomaly data.
- **`AnomalyDataPoint` interface**: Defines the structure for storing
information about a single anomaly within the `AnomalyTracker`.
- **`report-page-sk.scss`**: Contains the SASS/CSS styles for the
`report-page-sk` element, including styling for the common commits section
and the dialog for displaying all commits (though the dialog itself is not
fully implemented in the provided `showAllCommitsTemplate`).
- **Data Fetching Workflow**:
1. `ReportPageSk` element is connected to the DOM.
2. URL parameters (e.g., `rev`, `anomalyIDs`, `bugID`) are read.
3. `fetchAnomalies()` is called.
- POST request to `/_/anomalies/group_report` with URL parameters in
the body.
- Backend responds with `anomaly_list`, `timerange_map`, and
`selected_keys`.
4. `AnomalyTracker` is loaded with this data.
5. `anomalies-table-sk` is populated.
6. Graphs for initially selected anomalies are rendered.
- **User Interaction Workflow (Selecting an Anomaly)**:
1. User checks/unchecks an anomaly in `anomalies-table-sk`.
2. `anomalies-table-sk` fires an `anomalies_checked` custom event with the
anomaly and its checked state.
3. `ReportPageSk` listens for this event.
4. `updateGraphs()` is called:
- If checked and no graph exists:
- `addGraph()` is called.
- A new `explore-simple-sk` instance is created and configured.
- The graph is added to the DOM.
- The `AnomalyTracker` is updated with the new graph instance.
- If unchecked and a graph exists:
- The graph is removed from the DOM.
- The `AnomalyTracker` is updated to remove the graph reference.
5. `updateChartHeights()` is called to adjust the height of all visible
graphs.
The design emphasizes dynamic content loading and interactive exploration. By
using separate custom elements for the table (`anomalies-table-sk`) and graphs
(`explore-simple-sk`), the module maintains a good separation of concerns and
leverages reusable components. The `AnomalyTracker` further enhances this by
encapsulating the state and logic related to individual anomalies.
# Module: /modules/revision-info-sk
The `revision-info-sk` custom HTML element is designed to display information
about anomalies detected around a specific revision. This is particularly useful
for understanding the impact of a code change on performance metrics.
The core functionality revolves around fetching and presenting `RevisionInfo`
objects. A `RevisionInfo` object contains details like the benchmark, bot, bug
ID, start and end revisions of an anomaly, the associated test, and links to
explore the anomaly further.
**Key Components and Workflow:**
1. **`revision-info-sk.ts`**: This is the main TypeScript file defining the
`RevisionInfoSk` element.
- **State Management**: The element maintains its state in a `State`
object, primarily storing the `revisionId`. It utilizes `stateReflector`
from `infra-sk/modules/statereflector` to keep the URL in sync with the
element's state. This allows users to share links that directly open to
a specific revision's information.
- `URL change` -> `stateReflector updates State.revisionId` ->
`getRevisionInfo() is called`
- `User types revision ID and clicks "Get Revision Information"` ->
`State.revisionId updated` -> `stateReflector updates URL` ->
`getRevisionInfo() is called`
- **Data Fetching (`getRevisionInfo`)**: When a revision ID is provided
(either via URL or user input), this method is triggered.
- It displays a spinner (`spinner-sk`) to indicate loading.
- It makes a `fetch` request to the `/_/revision/?rev=<revisionId>`
endpoint.
- The JSON response, an array of `RevisionInfo` objects, is parsed
using `jsonOrThrow`.
- The fetched `revisionInfos` are stored, and the UI is re-rendered to
display the information.
- **Rendering (`template`, `getRevInfosTemplate`, `revInfoRowTemplate`)**:
Lit-html is used for templating.
- The main template (`template`) includes an input field for the
revision ID, a button to trigger fetching, a spinner, and a
container for the revision information.
- `getRevInfosTemplate` generates an HTML table if `revisionInfos` is
populated. This table includes a header row with a "select all"
checkbox and columns for bug ID, revision range, master, bot,
benchmark, and test.
- `revInfoRowTemplate` renders each individual `RevisionInfo` as a row
in the table. Each row has a checkbox for selection, a link to the
bug (if any), a link to explore the anomaly, and the other relevant
details.
- **Multi-Graph Functionality**: The element allows users to select
multiple detected anomaly ranges and view them together on a multi-graph
page.
- **Selection**: Checkboxes (`checkbox-sk`) are provided for each
revision info row and a "select all" checkbox. The `toggleSelectAll`
method handles the logic for the master checkbox.
- **`updateMultiGraphStatus`**: This method is called whenever a
checkbox state changes. It checks if any revisions are selected and
enables/disables the "View Selected Graph(s)" button accordingly. It
also updates the `selectAll` state if no individual revisions are
checked.
- **`getGraphConfigs`**: This helper function takes an array of
selected `RevisionInfo` objects and transforms them into an array of
`GraphConfig` objects. Each `GraphConfig` contains the query string
associated with the anomaly.
- **`getMultiGraphUrl`**: This asynchronous method constructs the URL
for the multi-graph view.
* It calls `getGraphConfigs` to get the configurations for the
selected revisions.
* It calls `updateShortcut` (from `explore-simple-sk`) to generate a
shortcut ID for the combined graph configurations. This typically
involves a POST request to `/_/shortcut/update`.
* It determines the overall time range (`begin` and `end` timestamps)
encompassing all selected anomalies.
* It gathers all unique `anomaly_ids` from the selected revisions to
highlight them on the multi-graph page.
* It constructs the final URL, including the `begin`, `end`
timestamps, the `shortcut` ID, the `totalGraphs`, and
`highlight_anomalies` parameters.
- **`viewMultiGraph`**: This method is called when the "View Selected
Graph(s)" button is clicked.
* It gathers all checked `RevisionInfo` objects.
* It calls `getMultiGraphUrl` to generate the redirect URL.
* If a URL is successfully generated, it navigates the current window
(`window.open(url, '_self')`) to the multi-graph page. If not, it
displays an error message.
- **Styling (`revision-info-sk.scss`)**: Provides basic styling for the
element, such as left-aligning table headers and styling the spinner.
2. **`index.ts`**: Simply imports and thereby registers the `revision-info-sk`
custom element.
3. **Demo Page (`revision-info-sk-demo.html`, `revision-info-sk-demo.ts`,
`revision-info-sk-demo.scss`)**:
- Provides a simple HTML page to showcase the `revision-info-sk` element.
- The `revision-info-sk-demo.ts` file uses `fetch-mock` to mock the
`/_/revision/` API endpoint. This is crucial for demonstrating the
element's functionality without needing a live backend. When the demo
page loads and the user interacts with the element (e.g., enters a
revision ID '12345'), the mocked response is returned.
**Design Decisions and Rationale:**
- **Custom Element**: Encapsulating this functionality as a custom element
(`<revision-info-sk>`) promotes reusability across different parts of the
Perf application or potentially other Skia web applications.
- **State Reflection**: Using `stateReflector` enhances user experience by
allowing direct navigation to a revision's details via URL and updating the
URL as the user interacts with the element. This makes sharing and
bookmarking specific views straightforward.
- **Lit-html for Templating**: Lit-html is chosen for its efficiency and
declarative approach to building UIs, making the rendering logic concise and
maintainable.
- **Asynchronous Operations**: Data fetching and shortcut generation are
asynchronous operations. The use of `async/await` makes the code easier to
read and manage compared to traditional Promise chaining.
- **Dedicated Multi-Graph URL Generation**: The logic for constructing the
multi-graph URL is encapsulated in `getMultiGraphUrl`. This separates
concerns and makes the process of generating the complex URL clearer. It
relies on the `explore-simple-sk` module's `updateShortcut` function,
promoting reuse of existing shortcut generation logic.
- **Error Handling**: `jsonOrThrow` is used to simplify error handling for
fetch requests. The `viewMultiGraph` method also includes basic error
handling if the URL generation fails.
- **Clear Separation of Concerns**: The element focuses on displaying revision
information and providing navigation to related views (bug tracker, explore
page, multi-graph view). It doesn't concern itself with the details of how
anomalies are detected or how the multi-graph page itself functions.
**Workflow for Displaying Revision Information:**
```
User Interaction / URL Change
|
v
[revision-info-sk] stateReflector updates internal 'state.revisionId'
|
v
[revision-info-sk] getRevisionInfo() called
|
+--------------------------------+
| |
v v
[revision-info-sk] shows spinner [revision-info-sk] makes fetch request to `/_/revision/?rev=<ID>`
| |
| v
| [Backend] processes request, returns RevisionInfo[]
| |
| v
+------------------> [revision-info-sk] receives JSON response, parses with jsonOrThrow
|
v
[revision-info-sk] stores 'revisionInfos', hides spinner
|
v
[revision-info-sk] re-renders using Lit-html templates to display table
```
**Workflow for Viewing Multi-Graph:**
```
User selects one or more revision info rows (checkboxes)
|
v
[revision-info-sk] updateMultiGraphStatus() enables "View Selected Graph(s)" button
|
v
User clicks "View Selected Graph(s)" button
|
v
[revision-info-sk] viewMultiGraph() called
|
v
[revision-info-sk] collects selected RevisionInfo objects
|
v
[revision-info-sk] calls getMultiGraphUrl(selectedRevisions)
|
+------------------------------------------------------+
| |
v v
[getMultiGraphUrl] calls getGraphConfigs() to create GraphConfig[] [getMultiGraphUrl] calls updateShortcut(GraphConfig[])
| | (makes POST to /_/shortcut/update)
| v
| [Backend] returns shortcut ID
| |
+-------------------------------------> [getMultiGraphUrl] constructs final URL (with begin, end, shortcut, anomaly IDs)
|
v
[viewMultiGraph] receives the multi-graph URL
|
v
[Browser] navigates to the generated multi-graph URL
```
# Module: /modules/split-chart-menu-sk
The `split-chart-menu-sk` module provides a user interface element for selecting
an attribute by which to split a chart. This is particularly useful in data
visualization scenarios where users need to break down aggregated data into
smaller, more specific views. For example, in a performance monitoring
dashboard, a user might want to see performance metrics split by benchmark,
specific test case (story), or sub-component (subtest).
The core functionality revolves around presenting a list of available attributes
to the user in a dropdown menu. These attributes are dynamically derived from
the underlying data. When an attribute is selected, the component emits an
event, allowing other parts of the application to react and update the chart
display accordingly.
**Key Components and Design:**
- **`split-chart-menu-sk.ts`**: This is the main TypeScript file that defines
the `SplitChartMenuSk` LitElement.
- **Data Consumption:** The component utilizes the Lit `context` API
(`@consume`) to access data from two sources: `dataframeContext` and
`dataTableContext`.
- `dataframeContext` provides the `DataFrame` (from
`//perf/modules/json:index_ts_lib` and
`//perf/modules/dataframe:dataframe_context_ts_lib`). The `DataFrame` is
the source from which the list of available attributes for splitting is
derived. This design decouples the menu from the specifics of data
fetching and management, allowing it to focus solely on the UI aspect of
attribute selection. The `getAttributes` function (from
`//perf/modules/dataframe:traceset_ts_lib`) is used to extract these
attributes.
- `dataTableContext` provides `DataTable` (also from
`//perf/modules/dataframe:dataframe_context_ts_lib`). While consumed,
its direct usage within this specific component's rendering logic isn't
immediately apparent in the provided `render` method, but it might be
used by other parts of the application or for future enhancements.
- **User Interaction:**
- A Material Design outlined button (`<md-outlined-button>`) labeled
"Split By" serves as the trigger to open the menu.
- The menu itself is a Material Design menu (`<md-menu>`), which is
populated with `<md-menu-item>` elements, one for each attribute
retrieved from the `DataFrame`.
- The `menuOpen` state property controls the visibility of the menu.
Clicking the button toggles this state. The menu also closes itself via
the `@closed` event.
- **Event Emission:** When a user clicks on a menu item, the
`bubbleAttribute` method is called. This method dispatches a custom
event named `split-chart-selection`.
- The event detail (`SplitChartSelectionEventDetails`) contains the
selected `attribute` (a string).
- The event is configured to bubble (`bubbles: true`) and pass through
shadow DOM boundaries (`composed: true`), making it easy for ancestor
elements to listen and react to the selection. This event-driven
approach is crucial for decoupling the menu from the chart component or
any other component that needs to know about the selected split
attribute.
- **Styling:** Styles are imported from `split-chart-menu-sk.css.ts`
(`style`). This keeps the component's presentation concerns separate
from its logic. The styles ensure the component is displayed as an
inline block and sets a default background color, also styling the
Material button.
- **`split-chart-menu-sk.css.ts`**: This file defines the CSS styles for the
component using Lit's `css` tagged template literal. The primary styling
focuses on the host element's positioning and background, and customizing
the Material Design button's border radius.
- **`index.ts`**: This file simply imports and registers the
`split-chart-menu-sk` custom element, making it available for use in HTML.
**Workflow: Selecting a Split Attribute**
1. **Initialization:**
- The `split-chart-menu-sk` component is rendered.
- It consumes the `DataFrame` from the `dataframeContext`.
- The `getAttributes()` method is called (implicitly via the render
method's map function) to populate the list of attributes for the menu.
2. **User Interaction:**
- User clicks the "Split By" button.
- `menuClicked` handler is invoked -> `this.menuOpen` becomes `true`.
- The `<md-menu>` component becomes visible, displaying the list of
attributes.
```
User split-chart-menu-sk DataFrame
| | |
|---Clicks "Split By"->| |
| |---Toggles menuOpen=true-->|
| | |
| |<--Displays Menu-------|
| | |
```
3. **Attribute Selection:**
- User clicks on an attribute in the menu (e.g., "benchmark").
- The `click` handler on `<md-menu-item>` calls
`this.bubbleAttribute("benchmark")`.
- `bubbleAttribute` creates a `CustomEvent('split-chart-selection', {
detail: { attribute: "benchmark" } })`. - The event is dispatched.
```
User split-chart-menu-sk (Parent Component)
| | |
|---Clicks "benchmark"->| |
| |---Calls bubbleAttribute("benchmark")-->|
| | |
| |---Dispatches "split-chart-selection" event--> (Listens for event)
| | | |
| | | |---Handles event, updates chart
```
4. **Menu Closes:**
- The `<md-menu>` component emits a `closed` event.
- The `menuClosed` handler is invoked -> `this.menuOpen` becomes `false`.
This design ensures that `split-chart-menu-sk` is a self-contained, reusable UI
component whose sole responsibility is to provide a way to select a splitting
attribute and communicate that selection to the rest of the application via a
well-defined event. The use of context for data consumption and custom events
for output makes it highly decoupled and easy to integrate.
The demo page (`split-chart-menu-sk-demo.html` and
`split-chart-menu-sk-demo.ts`) demonstrates how to use the component and listen
for the `split-chart-selection` event. The Puppeteer test
(`split-chart-menu-sk_puppeteer_test.ts`) provides a basic smoke test and a
visual regression test by taking a screenshot.
# Module: /modules/subscription-table-sk
The `subscription-table-sk` module provides a custom HTML element designed to
display information about a "subscription" and its associated "alerts". This is
particularly useful in contexts where users need to understand the configuration
of automated monitoring or alerting systems.
The core functionality is encapsulated within the `subscription-table-sk.ts`
file, which defines the `SubscriptionTableSk` custom element. This element is
built using Lit, a library for creating fast, lightweight web components.
**Why and How:**
The primary goal is to present complex subscription and alert data in a
user-friendly and interactive manner. Instead of a static display, this
component allows for toggling the visibility of the detailed alert
configurations. This design choice avoids overwhelming the user with too much
information upfront, providing a cleaner initial view focused on the
subscription summary.
The `SubscriptionTableSk` element takes `Subscription` and `Alert[]` objects as
input. The `Subscription` object contains general information like name, contact
email, revision, bug tracking details (component, hotlists, priority, severity,
CC emails). The `Alert[]` array holds detailed configurations for individual
alerts, including their query parameters, step algorithm, radius, and other
specific settings.
**Key Responsibilities and Components:**
- **`subscription-table-sk.ts`**:
- **`SubscriptionTableSk` class**: This is the heart of the module. It
extends `ElementSk`, a base class for Skia custom elements.
- **Data Handling**: It stores the `subscription` and `alerts` data
internally.
- **Rendering Logic (`template` static method)**: It uses Lit's `html`
tagged template literal to define the structure and content of the
element. It conditionally renders the subscription details and the
alerts table based on the available data and the `showAlerts` state.
- Subscription details are always visible if a subscription is loaded.
- The alerts table is only rendered if `showAlerts` is true. This
state is toggled by a button.
- **`load(subscription: Subscription, alerts: Alert[])` method**: This
public method is the primary way to feed data into the component. It
updates the internal state and triggers a re-render.
- **`toggleAlerts()` method**: This method flips the `showAlerts` boolean
flag and triggers a re-render, effectively showing or hiding the alerts
table.
- **`formatRevision(revision: string)` method**: A helper function to
display the revision string as a clickable link, pointing to a specific
configuration file URL. This improves usability by allowing users to
quickly navigate to the source of the configuration.
- **`paramset-sk` integration**: For displaying the alert `query`, it
utilizes the `paramset-sk` element. The `toParamSet` utility function
(from `infra-sk/modules/query`) is used to convert the query string into
a format suitable for `paramset-sk`, which then renders it as a
structured set of key-value pairs. This enhances readability of complex
query strings.
- **Styling (`subscription-table-sk.scss`)**: This file defines the visual
appearance of the element. It uses SCSS and imports styles from shared
libraries (`themes_sass_lib`, `buttons_sass_lib`, `select_sass_lib`) to
maintain a consistent look and feel with other Skia elements. The styles
focus on clear presentation of information, with distinct sections for
subscription details and the alerts table.
**Workflow: Displaying Subscription and Alerts**
1. **Initialization**: An instance of `subscription-table-sk` is added to the
DOM. `<subscription-table-sk></subscription-table-sk>`
2. **Data Loading**: External code (e.g., in a demo page or a larger
application) calls the `load()` method on the element instance, passing in
the `Subscription` object and an array of `Alert` objects.
`element.load(mySubscriptionData, myAlertsData);`
3. **Initial Render**:
- The `SubscriptionTableSk` element updates its internal `subscription`
and `alerts` properties.
- `showAlerts` is set to `false` by default upon loading new data.
- The `_render()` method is called (implicitly by Lit or explicitly).
- The `template` function generates the HTML:
- Subscription details (name, email, revision, etc.) are displayed.
- A button labeled "Show [N] Alert Configurations" is displayed.
- The alerts table is _not_ rendered yet.
4. **User Interaction (Toggling Alerts)**: - The user clicks the "Show [N] Alert Configurations" button. - The `click` event triggers the `toggleAlerts()` method. - `showAlerts` becomes `true`. - `_render()` is called again. - The `template` function now also renders the `<table
id="alerts-table">`. - The table header is displayed. - For each `Alert` object in `ele.alerts`: - A table row (`<tr>`) is created. - Cells (`<td>`) display various alert properties (step algorithm,
radius, k, etc.). - The alert `query` is passed to a `<paramset-sk>` element for
structured display. - The button label changes to "Hide Alert Configurations".
5. **Further Toggling**: Clicking the button again will hide the table, and the
label will revert.
**Diagram: Data Flow and Rendering**
```
External Code ---> subscriptionTableSkElement.load(subscription, alerts)
|
V
SubscriptionTableSk Internal State:
- this.subscription = subscription
- this.alerts = alerts
- this.showAlerts = false (initially or after load)
|
V
_render() ------> Lit Template Evaluation
|
-------------------------------------
| |
V (if this.subscription is not null) V (if this.showAlerts is true)
Render Subscription Details Render Alerts Table
- Name, Email, Revision (formatted link) - Iterate through this.alerts
- Bug info, Hotlists, CCs - For each alert:
- "Show/Hide Alerts" Button - Display properties in <td>
- Use <paramset-sk> for query
```
**Demo Page (`subscription-table-sk-demo.html`,
`subscription-table-sk-demo.ts`)**
The demo page serves as an example and testing ground.
- `subscription-table-sk-demo.html`: Sets up the basic HTML structure,
including instances of `subscription-table-sk` (one for light mode, one for
dark mode to test theming) and buttons to interact with them. It also
includes an `error-toast-sk` for displaying potential errors.
- `subscription-table-sk-demo.ts`: Contains JavaScript to:
- Import and register the `subscription-table-sk` element.
- Define sample `Subscription` and `Alert` data.
- Add event listeners to the "Populate Tables" button, which calls the
`load()` method on the `subscription-table-sk` instances with the sample
data.
- Add event listeners to the "Toggle Alerts Table" button, which calls the
`toggleAlerts()` method on the instances.
This setup allows developers to see the component in action and verify its
functionality with predefined data.
# Module: /modules/test-picker-sk
The `test-picker-sk` module provides a custom HTML element, `<test-picker-sk>`,
designed to guide users in selecting a valid trace or test for plotting. It
achieves this by presenting a series of dependent input fields, where the
options available in each field are dynamically updated based on selections made
in previous fields. This ensures that users can only construct valid
combinations of parameters.
**Core Functionality and Design:**
The primary goal of `test-picker-sk` is to simplify the process of selecting a
specific data series (a "trace" or "test") from a potentially large and complex
dataset. This is often necessary in performance analysis tools where data is
categorized by multiple parameters (e.g., benchmark, bot, specific test,
sub-test variations).
The design enforces a specific order for filling out these parameters. This
hierarchical approach is crucial because the valid options for a parameter often
depend on the values chosen for its preceding parameters.
**Key Components and Responsibilities:**
- **`test-picker-sk.ts`**: This is the heart of the module, defining the
`TestPickerSk` custom element.
- **`FieldInfo` class**: This internal class is a simple data structure
used to manage the state of each individual input field within the
picker. It stores a reference to the `PickerFieldSk` element, the
parameter name (e.g., "benchmark", "bot"), and the currently selected
value.
- **Dynamic Field Generation (`addChildField`)**: When a value is selected
in a field, and if there are more parameters in the hierarchy, a new
`PickerFieldSk` input is dynamically added to the UI. The options for
this new field are fetched from the backend. This progressive disclosure
prevents overwhelming the user with too many options at once.
- **Backend Communication (`callNextParamList`)**: The element interacts
with a backend endpoint (`/_/nextParamList/`). This endpoint is
responsible for:
* Providing the list of valid options for the _next_ input field based on
the current selections.
* Returning a count of how many unique traces/tests match the current
partial or complete selection.
- **State Management (`_fieldData`, `_currentIndex`)**: The `_fieldData`
array holds `FieldInfo` objects for each parameter field.
`_currentIndex` tracks which field is currently active or the next to be
added.
- **Event Handling (`value-changed`, `plot-button-clicked`)**:
- It listens for `value-changed` events from its child `picker-field-sk`
elements. When a value changes, it triggers logic to update subsequent
fields and the match count.
- It emits a `plot-button-clicked` custom event when the user clicks the
"Add Graph" button. This event includes the fully constructed query
string representing the selected trace.
- **Query Population (`populateFieldDataFromQuery`)**: This method allows
the picker to be initialized with a pre-existing query string. It will
populate the fields sequentially based on the query parameters. If a
parameter in the hierarchy is missing or empty in the query, the
population stops at that point.
- **Plotting Logic (`onPlotButtonClick`, `PLOT_MAXIMUM`)**: The "Add
Graph" button is enabled only when the number of matching traces is
within a manageable range (greater than 0 and less than or equal to
`PLOT_MAXIMUM`). This prevents users from attempting to plot an
overwhelming number of traces.
- **Rendering and UI Updates**: The component uses Lit library for
templating and re-renders itself when its internal state changes (e.g.,
new fields added, count updated, request in progress). It also manages
the enabled/disabled state of input fields during backend requests.
- **`picker-field-sk` (Dependency)**: While not part of this module,
`test-picker-sk` heavily relies on the `picker-field-sk` element. Each
parameter in the test picker is represented by an instance of
`picker-field-sk`. This child component is responsible for displaying a
label, an input field, and a dropdown menu of selectable options.
- **`test-picker-sk.scss`**: Defines the visual styling for the
`test-picker-sk` element and its internal components, ensuring a consistent
look and feel. It styles the layout of the fields, the match count display,
and the plot button.
**Workflow: User Selecting a Test**
1. **Initialization (`initializeTestPicker`)**:
- `test-picker-sk` is given an ordered list of parameter names (e.g.,
`['benchmark', 'bot', 'test']`) and optional default parameters.
- `test-picker-sk` -> Backend (`/_/nextParamList/`): Requests options for
the _first_ parameter (e.g., "benchmark") with an empty query.
```
User Interface: Backend:
[test-picker-sk]
|
initializeTestPicker(['benchmark', 'bot', 'test'], {})
|
---> POST /_/nextParamList/ (q="")
|
(Processes request, queries data source)
|
<--- {paramset: {benchmark: ["b1", "b2"]}, count: 100}
|
(Renders first PickerFieldSk for "benchmark" with options "b1", "b2")
[Benchmark: [select â–¼]] [Matches: 100] [Add Graph (disabled)]
```
2. **User Selects a Value**:
- The user selects "b1" for "benchmark".
- The `picker-field-sk` for "benchmark" emits a `value-changed` event.
- `test-picker-sk` -> Backend: Requests options for the _next_ parameter
("bot"), now including the selection `benchmark=b1` in the query.
```
User Interface:
[Benchmark: [b1 â–¼]]
| (value-changed: {value: "b1"})
[test-picker-sk]
|
---> POST /_/nextParamList/ (q="benchmark=b1")
|
(Processes request, filters based on benchmark=b1)
|
<--- {paramset: {bot: ["botX", "botY"]}, count: 20}
|
(Renders PickerFieldSk for "bot" with options "botX", "botY")
[Benchmark: [b1 â–¼]] [Bot: [select â–¼]] [Matches: 20] [Add Graph (disabled)]
```
3. **Process Repeats**: This continues for each parameter in the hierarchy.
4. **Final Selection and Plotting**:
- Once all necessary parameters are selected (or the user chooses to
stop), the match count reflects the number of specific traces.
- If the count is within the `PLOT_MAXIMUM`, the "Add Graph" button
enables.
- User clicks "Add Graph".
- `test-picker-sk` emits `plot-button-clicked` with the final query (e.g.,
`benchmark=b1&bot=botX&test=testZ`).
```
User Interface:
[Benchmark: [b1 â–¼]] [Bot: [botX â–¼]] [Test: [testZ â–¼]] [Matches: 5] [Add Graph (enabled)]
| (User clicks "Add Graph")
[test-picker-sk]
|
emits 'plot-button-clicked' (detail: {query: "benchmark=b1&bot=botX&test=testZ"})
```
**Why this Approach?**
- **Guided Selection**: Prevents users from creating invalid or non-existent
trace combinations.
- **Performance**: By fetching options incrementally, the backend doesn't need
to return massive lists of all possible values for all parameters at once.
Queries to the backend are progressively filtered.
- **User Experience**: The interface is less cluttered as fields appear only
when needed. The match count provides immediate feedback on the specificity
of the selection.
The `test-picker-sk-demo.html` and `test-picker-sk-demo.ts` files provide a
runnable example of the component, mocking the backend `/_/nextParamList/`
endpoint to showcase its functionality without needing a live backend. This is
essential for development and testing. The Puppeteer and Karma tests
(`test-picker-sk_puppeteer_test.ts`, `test-picker-sk_test.ts`) ensure the
component behaves as expected under various conditions.
# Module: /modules/themes
The `/modules/themes` module is responsible for defining the visual styling and
theming for the application. It builds upon the base theming provided by
`infra-sk` and introduces application-specific overrides and additions.
**Why and How:**
The primary goal of this module is to establish a consistent and branded look
and feel across the application. Instead of defining all styles from scratch, it
leverages the `infra-sk` theming library as a foundation. This promotes code
reuse and ensures that common UI elements have a familiar appearance.
The approach taken is to:
1. **Import Base Styles:** The `themes.scss` file begins by importing the core
styles from `../../../infra-sk/themes`. This brings in the foundational
design system, including color palettes, typography, spacing, and component
styles.
2. **Import External Resources:** It also imports the Material Icons font
library directly from Google Fonts
(`https://fonts.googleapis.com/icon?family=Material+Icons`). This makes a
wide range of standard icons readily available for use within the
application's UI.
3. **Define Application-Specific Overrides and Additions:** The core principle
is to only define _deltas_ from the base `infra-sk` theme and global changes
from `elements-sk` components. This means that `themes.scss` focuses on
styling aspects that are unique to this specific application or require
modifications to the default `infra-sk` appearance.
**Key Components and Files:**
- **`themes.scss`**: This is the central SCSS (Sassy CSS) file for the module.
- **Responsibility:** It orchestrates the application's theme by importing
base styles, external resources, and defining application-specific
styling rules.
- **Implementation Details:**
- `@import '../../../infra-sk/themes';`: This line incorporates the
foundational theme from the `infra-sk` library. The relative path
indicates that `infra-sk` is expected to be a sibling or ancestor
directory in the project structure.
- `@import
url('https://fonts.googleapis.com/icon?family=Material+Icons');`: This
directive pulls in the Material Icons font stylesheet, enabling the use
of standard Google Material Design icons throughout the application.
- `body { margin: 0; padding: 0; }`: This is an example of a global
override. It resets the default browser margins and padding on the
`<body>` element, providing a cleaner baseline for layout. This is a
common practice to ensure consistent spacing across different browsers.
Other application-specific styles would follow this pattern, targeting
specific elements or defining new CSS classes.
- **`BUILD.bazel`**: This file defines how the `themes.scss` file is processed
and made available to the rest of the application.
- **Responsibility:** It uses the `sass_library` rule (defined in
`//infra-sk:index.bzl`) to compile the SCSS into CSS and declare it as a
reusable library.
- **Implementation Details:**
- `load("//infra-sk:index.bzl", "sass_library")`: Imports the necessary
Bazel rule for handling SASS compilation.
- `sass_library(name = "themes_sass_lib", ...)`: Defines a SASS library
target named `themes_sass_lib`.
- `srcs = ["themes.scss"]`: Specifies that `themes.scss` is the source
file for this library.
- `visibility = ["//visibility:public"]`: Makes this compiled CSS
library accessible to any other part of the project.
- `deps = ["//infra-sk:themes_sass_lib"]`: Declares a dependency on
the `infra-sk` SASS library. This is crucial because `themes.scss`
imports styles from `infra-sk`. The build system needs to know about
this dependency to ensure `infra-sk` styles are available during the
compilation of `themes.scss`.
**Workflow (Styling Application):**
```
Browser Request --> HTML Document
|
v
Link to Compiled CSS (from themes_sass_lib)
|
v
Application of Styles:
1. Base browser styles
2. infra-sk/themes.scss styles (imported)
3. Material Icons styles (imported)
4. modules/themes/themes.scss overrides & additions (applied last, taking precedence)
|
v
Rendered Page with Application-Specific Theme
```
In essence, this module provides a layered approach to theming. It starts with a
robust base, incorporates external resources like icon fonts, and then applies
specific customizations to achieve the desired visual identity for the
application. The `BUILD.bazel` file ensures that these SASS files are correctly
processed and made available as CSS to the application during the build process.
# Module: /modules/trace-details-formatter
This module provides a mechanism for formatting trace details and converting
trace strings into query strings. The core idea is to offer a flexible way to
represent and interpret trace information, accommodating different formatting
conventions, particularly for Chrome-specific trace structures.
The "why" behind this module stems from the need to handle various trace
formats. Different systems or parts of the application might represent trace
identifiers (which are essentially a collection of parameters) in distinct ways.
This module centralizes the logic for translating between these representations.
For example, a compact string representation of a trace might be used in URLs or
displays, while a more structured `ParamSet` is needed for querying data.
The "how" is achieved through an interface `TraceFormatter` and concrete
implementations. This allows for different formatting strategies to be plugged
in as needed. The `GetTraceFormatter()` function acts as a factory, returning
the appropriate formatter based on the application's configuration
(`window.perf.trace_format`).
**Key Components/Files:**
- **`traceformatter.ts`**: This is the central file containing the core logic.
- **`TraceFormatter` interface**: Defines the contract for all trace
formatters. It mandates two primary methods:
- `formatTrace(params: Params): string`: Takes a `Params` object (a
key-value map representing trace parameters) and returns a string
representation of the trace. This is useful for displaying trace
identifiers in a user-friendly or system-specific format.
- `formatQuery(trace: string): string`: Takes a string representation of a
trace and converts it into a query string (e.g.,
"key1=value1&key2=value2"). This is crucial for constructing API
requests to fetch data related to a specific trace.
- **`DefaultTraceFormatter` class**: Provides a basic implementation of
`TraceFormatter`.
- Its `formatTrace` method generates a string like "Trace ID:
,key1=value1,key2=value2,...". This is a generic way to represent the
trace parameters.
- Its `formatQuery` method currently returns an empty string, indicating
that this default formatter doesn't have a specific logic for converting
its trace string representation back into a query.
- **`ChromeTraceFormatter` class**: Implements `TraceFormatter`
specifically for traces originating from Chrome's performance
infrastructure.
- **Why `ChromeTraceFormatter`?** Chrome's performance data often uses a
hierarchical, slash-separated string to identify traces (e.g.,
`master/bot/benchmark/test/subtest_1`). This formatter handles this
specific convention.
- **`keys` array**: This private property (`['master', 'bot', 'benchmark',
'test', 'subtest_1', 'subtest_2', 'subtest_3']`) defines the expected
order of parameters in the Chrome-style trace string. This order is
significant for both formatting and parsing.
- **`formatTrace(params: Params): string`**: It iterates through the
predefined `keys` and constructs a slash-separated string from the
corresponding values in the input `params`. `Input Params: { master:
"m", bot: "b", benchmark: "bm", test: "t" } keys: [ "master", "bot",
"benchmark", "test", ... ] Output String: "m/b/bm/t"`
- **`formatQuery(trace: string): string`**: This is the inverse operation.
It takes a slash-separated trace string, splits it, and maps the parts
back to the predefined `keys` to build a `ParamSet`. It then converts
this `ParamSet` into a standard URL query string. - **Handling Statistics (Ad-hoc logic for Chromeperf/Skia bridge)**: A
special piece of logic exists within `formatQuery` related to
`window.perf.enable_skia_bridge_aggregation`. If a trace's 'test'
value ends with a known statistic suffix (e.g., `_avg`, `_count`),
this suffix is used to determine the `stat` parameter in the output
query, and the suffix is removed from the 'test' parameter. If no
such suffix is found, a default `stat` value of 'value' is added.
This logic is a temporary measure to bridge formatting differences
between Chromeperf and Skia systems and is intended to be removed
once Chromeperf is deprecated. `Input Trace String
(enable_skia_bridge_aggregation = true):
"master/bot/benchmark/test_name_max/subtest" Splits into: ["master",
"bot", "benchmark", "test_name_max", "subtest"] Processed ParamSet:
{ master: ["master"], bot: ["bot"], benchmark: ["benchmark"], test:
["test_name"], stat: ["max"], subtest_1: ["subtest"] } Output Query:
"master=master&bot=bot&benchmark=benchmark&test=test_name&stat=max&subtest_1=subtest"`
- **`STATISTIC_SUFFIX_TO_VALUE_MAP`**: A map used by
`ChromeTraceFormatter` to translate common statistic suffixes (like
"avg", "count") found in test names to their corresponding "stat"
parameter values (like "value", "count").
- **`traceFormatterRecords`**: A record (object map) that associates
`TraceFormat` enum values (like `''` for default, `'chrome'` for
Chrome-specific) with their corresponding `TraceFormatter` instances.
This acts as a registry for available formatters.
- **`GetTraceFormatter()` function**: This is the public entry point for
obtaining a trace formatter. It reads `window.perf.trace_format` (a
global configuration setting) and returns the appropriate formatter
instance from `traceFormatterRecords`. If the format is not found, it
defaults to `DefaultTraceFormatter`.
```
Global Config: window.perf.trace_format = "chrome"
|
v
GetTraceFormatter()
|
v
traceFormatterRecords["chrome"]
|
v
Returns new ChromeTraceFormatter() instance
```
- **`traceformatter_test.ts`**: Contains unit tests for the
`ChromeTraceFormatter`, specifically focusing on the `formatQuery` method
and its logic for handling statistic suffixes under different configurations
of `window.perf.enable_skia_bridge_aggregation`.
This module depends on:
- `infra-sk/modules:query_ts_lib`: For the `fromParamSet` function, used to
convert a `ParamSet` object into a URL query string.
- `perf/modules/json:index_ts_lib`: For type definitions like `Params`,
`ParamSet`, and `TraceFormat`.
- `perf/modules/paramtools:index_ts_lib`: For the `makeKey` function, used by
`DefaultTraceFormatter` to create a string representation of a `Params`
object.
- `perf/modules/window:window_ts_lib`: To access global configuration values
like `window.perf.trace_format` and
`window.perf.enable_skia_bridge_aggregation`.
# Module: /modules/triage-menu-sk
The `triage-menu-sk` module provides a user interface element for managing and
triaging anomalies in bulk. It's designed to streamline the process of handling
multiple performance regressions or improvements detected in data.
The core purpose of this module is to allow users to efficiently take action on
a set of selected anomalies. Instead of interacting with each anomaly
individually, this menu provides centralized controls for common triage
operations. This is crucial for workflows where many anomalies might be
identified simultaneously, requiring a quick and consistent way to categorize or
address them.
Key responsibilities and components:
- **`triage-menu-sk.ts`**: This is the heart of the module, defining the
`TriageMenuSk` custom element.
- **Anomaly Aggregation**: It receives a list of `Anomaly` objects and
associated `trace_names`. This allows it to operate on multiple
anomalies at once.
- **Action Buttons**: It renders buttons for common triage actions:
- **"New Bug"**: Triggers the `new-bug-dialog-sk` element, allowing the
user to create a new bug report associated with the selected anomalies.
- **"Existing Bug"**: Triggers the `existing-bug-dialog-sk` element,
enabling the user to link the selected anomalies to an already existing
bug.
- **"Ignore"**: Marks the selected anomalies as "Ignored". This is useful
for anomalies that are deemed not actionable or are false positives.
- **Nudging Functionality**:
- The `NudgeEntry` class and related logic (`generateNudgeButtons`,
`nudgeAnomaly`, `makeNudgeRequest`) allow users to adjust the perceived
start and end points of an anomaly. This is a subtle but important
feature for refining the automated anomaly detection. The UI presents a
set of buttons (e.g., -2, -1, 0, +1, +2) that shift the anomaly's
boundaries.
- The `_allowNudge` flag controls whether the nudge buttons are visible,
allowing for contexts where nudging might not be appropriate (e.g., when
multiple, disparate anomalies are selected).
- **State Management**: It maintains the state of the selected anomalies
(`_anomalies`, `_trace_names`) and the nudge options (`_nudgeList`).
- **Communication with Backend**: The `makeEditAnomalyRequest` and
`makeNudgeRequest` methods handle sending HTTP POST requests to the
`/_/triage/edit_anomalies` endpoint. This endpoint is responsible for
persisting the triage decisions (bug associations, ignore status, nudge
adjustments) in the backend database.
- The `editAction` parameter in `makeEditAnomalyRequest` can take values
like `IGNORE`, `RESET` (to de-associate bugs), or implicitly associate
with a bug ID when called from the bug dialogs.
- **Event Emission**: It emits an `anomaly-changed` custom event. This
event signals to parent components (likely a component displaying a list
or plot of anomalies) that one or more anomalies have been modified and
their representation needs to be updated. The event detail includes the
affected `traceNames`, the `editAction` performed, and the updated
`anomalies`.
- **Integration with Dialogs**:
- It directly embeds and interacts with `new-bug-dialog-sk` and
`existing-bug-dialog-sk`. When the user clicks "New Bug" or "Existing
Bug", this element calls the respective `open()` methods on these dialog
components.
- It passes the currently selected anomalies and trace names to these
dialogs using their `setAnomalies` methods, so the dialogs know which
anomalies the bug report will be associated with.
- **`triage-menu-sk.html` (Implicit via Lit template in `.ts`)**: Defines the
visual structure of the menu, including the layout of the action buttons and
the nudge buttons. The rendering is dynamic based on the number of selected
anomalies and whether nudging is allowed.
- **`triage-menu-sk.scss`**: Provides the styling for the menu, ensuring it
integrates visually with the surrounding application.
**Key Workflow Example (Ignoring Anomalies):**
1. **User Selects Anomalies**: In a parent component (e.g., a plot or a list),
the user selects one or more anomalies.
2. **`triage-menu-sk` Receives Data**: The parent component calls
`triageMenuSkElement.setAnomalies(selectedAnomalies,
correspondingTraceNames, nudgeOptions)`.
3. **Menu Updates**: `triage-menu-sk` re-renders, enabling the "Ignore" button
(and potentially others). `User Action (Selects Anomalies) --> Parent
Component | v triage-menu-sk.setAnomalies() | v UI Renders (Buttons
enabled)`
4. **User Clicks "Ignore"**: `User Click ("Ignore") -->
triage-menu-sk.ignoreAnomaly() | v makeEditAnomalyRequest(anomalies, traces,
"IGNORE") | v POST /_/triage/edit_anomalies | (Backend processes) v HTTP 200
OK | v Dispatch "anomaly-changed" event`
5. **Backend Interaction**: `makeEditAnomalyRequest` is called. It constructs a
JSON payload with the anomaly keys, trace names, and the action "IGNORE".
This payload is sent to `/_/triage/edit_anomalies`.
6. **Event Notification**: Upon a successful response from the backend,
`triage-menu-sk` updates the local state of the anomalies (setting `bug_id`
to -2 for ignored anomalies) and dispatches the `anomaly-changed` event.
7. **Parent Component Reacts**: The parent component listens for
`anomaly-changed` and updates its display to reflect that the anomalies are
now ignored (e.g., by changing their color, removing them from an active
list).
The design decision to have `triage-menu-sk` orchestrate calls to the backend
and then emit a generic `anomaly-changed` event decouples it from the specifics
of how anomalies are displayed. Parent components only need to know that
anomalies have changed and can react accordingly. The use of dedicated dialog
components (`new-bug-dialog-sk`, `existing-bug-dialog-sk`) encapsulates the
complexity of bug reporting, keeping the triage menu itself focused on
initiating these actions.
# Module: /modules/triage-page-sk
## Triage Page (`triage-page-sk`)
The `triage-page-sk` module provides the user interface for viewing and triaging
regressions in performance data. It allows users to filter regressions based on
time range, commit status (all, regressions, untriaged), and alert
configurations. The primary goal is to present a clear overview of regressions
and facilitate the process of identifying their cause and impact.
### Responsibilities and Key Components
The module is responsible for:
- **Fetching and Displaying Regression Data:** It communicates with a backend
endpoint (`/_/reg/`) to retrieve regression information for a specified time
range and filter criteria. This data is then rendered in a tabular format,
showing commits along with any associated regressions.
- **State Management:** The component's state (selected time range, filters)
is reflected in the URL. This allows users to bookmark specific views or
share links to particular triage scenarios. The `stateReflector` utility
from `infra-sk/modules/statereflector` is used for this purpose.
- **User Interaction for Filtering:** It provides UI elements (select
dropdowns, date range pickers) for users to define what data they want to
see. Changes to these filters trigger new data fetches.
- **Triage Workflow:** When a user initiates a triage action on a specific
regression, a dialog (`<dialog>`) containing the `cluster-summary2-sk`
element is displayed. This dialog allows the user to view details of the
regression and assign a triage status (e.g., "positive", "negative",
"acknowledged").
- **Communicating Triage Decisions:** Once a triage status is submitted, the
module sends this information to a backend endpoint (`/_/triage/`) to
persist the decision.
- **Displaying Triage Status:** Each regression in the table is visually
represented by a `triage-status-sk` element, which shows its current triage
state.
### Key Files
- **`triage-page-sk.ts`**: This is the core TypeScript file defining the
`TriagePageSk` custom element.
- **Why**: It encapsulates all the logic for data fetching, rendering,
state management, and handling user interactions. It leverages Lit for
templating and rendering the UI.
- **How**:
- It defines a `State` interface to manage the component's configuration
(begin/end timestamps, subset filter, alert filter).
- The `connectedCallback` initializes the `stateReflector` to synchronize
the component's state with the URL.
- `updateRange()` is a crucial method that fetches regression data from
the `/_/reg/` endpoint whenever the state changes (e.g., date range or
filter selection). It uses the `fetch` API for network requests.
- The `template` function (using `lit/html`) defines the HTML structure of
the component, including the filter controls, the main table displaying
regressions, and the triage dialog.
- Event handlers like `commitsChange`, `filterChange`, `rangeChange`,
`triage_start`, and `triaged` manage user input and interactions with
child components.
- The `triage_start` method is triggered when a user wants to triage a
specific regression. It prepares the data for the `cluster-summary2-sk`
element and displays the triage dialog.
- The `triaged` method is called when the user submits a triage decision
from the `cluster-summary2-sk` dialog. It sends a POST request to
`/_/triage/` with the triage information.
- Helper methods like `stepUpAt`, `stepDownAt`, `alertAt`, etc., are used
to determine how to render cells in the regression table based on the
data received.
- `calc_all_filter_options` dynamically generates the list of available
alert filters based on categories returned from the backend.
- **`triage-page-sk.scss`**: Contains the SASS/CSS styles for the
`triage-page-sk` element.
- **Why**: To ensure the component has a consistent and appropriate visual
appearance within the application.
- **How**: It defines styles for the layout of the header, filter
sections, the regression table, and the triage dialog. It imports shared
styles for buttons, selects, and theming.
- **`triage-page-sk-demo.html` / `triage-page-sk-demo.ts`**: Provide a
demonstration page for the `triage-page-sk` element.
- **Why**: To allow developers to see the component in action and test its
basic functionality in isolation.
- **How**: The HTML file includes an instance of `<triage-page-sk>`. The
TypeScript file simply imports the main component to register it.
### Key Workflows
**1. Initial Page Load and Data Fetch:**
```
User navigates to page / URL with state parameters
|
V
triage-page-sk.connectedCallback()
|
V
stateReflector initializes state from URL (or defaults)
|
V
triage-page-sk.updateRange()
|
V
FETCH /_/reg/ with current state (begin, end, subset, alert_filter)
|
V
Backend responds with RegressionRangeResponse (header, table, categories)
|
V
triage-page-sk.reg is updated
|
V
triage-page-sk.calc_all_filter_options() (if categories present)
|
V
triage-page-sk._render() displays the regression table
```
**2. User Changes Filter or Date Range:**
```
User interacts with <select> (commits/filter) or <day-range-sk>
|
V
Event handler (e.g., commitsChange, filterChange, rangeChange) updates this.state
|
V
this.stateHasChanged() (triggers stateReflector to update URL)
|
V
triage-page-sk.updateRange()
|
V
FETCH /_/reg/ with new state
|
V
Backend responds with updated RegressionRangeResponse
|
V
triage-page-sk.reg is updated
|
V
triage-page-sk._render() re-renders the regression table with new data
```
**3. User Initiates Triage:**
```
User clicks on a regression in the table (within a <triage-status-sk> element)
|
V
<triage-status-sk> emits 'start-triage' event with details (alert, full_summary, cluster_type)
|
V
triage-page-sk.triage_start(event)
|
V
this.dialogState is populated with event.detail
|
V
this._render() (updates the <cluster-summary2-sk> properties within the dialog)
|
V
this.dialog.showModal() (displays the triage dialog)
```
**4. User Submits Triage:**
```
User interacts with <cluster-summary2-sk> in the dialog and clicks "Save" (or similar)
|
V
<cluster-summary2-sk> emits 'triaged' event with details (columnHeader, triage status)
|
V
triage-page-sk.triaged(event)
|
V
Constructs TriageRequest body (cid, triage, alert, cluster_type)
|
V
this.dialog.close()
|
V
this.triageInProgress = true; this._render() (shows spinner)
|
V
FETCH POST /_/triage/ with TriageRequest
|
V
Backend responds (e.g., with a bug link if applicable)
|
V
this.triageInProgress = false; this._render() (hides spinner)
|
V
(Optional) If json.bug exists, window.open(json.bug)
|
V
(Implicit) The <triage-status-sk> for the triaged item may update its display, or a full data refresh might be triggered if necessary to show the updated status.
```
### Design Decisions
- **State Reflection in URL:** The decision to reflect the component's state
(date range, filters) in the URL is crucial for shareability and
bookmarking. It allows users to return to a specific view of regressions or
share it with colleagues.
- **Component-Based Architecture:** The page is built using custom elements
(`triage-page-sk`, `commit-detail-sk`, `day-range-sk`, `triage-status-sk`,
`cluster-summary2-sk`). This promotes modularity, reusability, and
separation of concerns. Each component handles a specific piece of
functionality.
- **Asynchronous Operations:** Data fetching and triage submissions are
asynchronous operations handled using `fetch` and Promises. Spinners
(`spinner-sk`) are used to provide visual feedback to the user during these
operations.
- **Dedicated Triage Dialog:** Instead of inline editing, a modal dialog
(`<dialog>`) is used for the triage process. This provides a focused
interface for the user to review cluster details and make a triage decision
without cluttering the main regression table.
- **Dynamic Filter Options:** The "Which alerts to display" filter options are
dynamically populated based on the categories returned from the backend.
This ensures that the filter options are relevant to the current dataset.
- **Use of Lit for Templating:** Lit is used for its efficient rendering and
declarative templating, making it easier to manage the UI structure and
updates.
The `triage-page-sk` serves as the central hub for users to actively engage with
and manage performance regressions, making it a critical component in the
performance monitoring workflow.
# Module: /modules/triage-status-sk
The `triage-status-sk` module provides a custom HTML element designed to
visually represent and interact with the triage status of a "cluster" within the
Perf application. A cluster, in this context, likely refers to a group of
related performance measurements or anomalies that require user attention and
classification (triaging).
**Core Functionality & Design:**
The primary purpose of this element is to offer a concise and interactive way
for users to understand the current triage state of a cluster and to initiate
the triaging process.
1. **Visual Indication:** The element displays a button. The appearance of this
button (specifically, an icon within it) changes based on the cluster's
triage status: "positive," "negative," or "untriaged." This provides an
immediate visual cue to the user.
- **Why:** Direct visual feedback is crucial for quickly assessing the
state of many items in a list or dashboard. Instead of reading text,
users can rely on familiar icons.
- **How:** It leverages the `tricon2-sk` element to display the
appropriate icon based on the `triage.status` property. The styling for
these states is defined in `triage-status-sk.scss`, ensuring visual
consistency with the application's theme (including dark mode).
2. **Initiating Triage:** Clicking the button does not directly change the
triage status within this element. Instead, it emits a custom event named
`start-triage`.
- **Why:** This follows a common pattern in web components where
individual components are responsible for a specific piece of UI and
interaction, but delegate more complex actions or state management to
parent components or application-level logic. This keeps the
`triage-status-sk` element focused and reusable. The actual triaging
process likely involves a dialog or a more complex UI, which is beyond
the scope of this simple button.
- **How:** The `_start_triage` method is invoked on button click. This
method constructs a `detail` object containing all relevant information
about the cluster (`full_summary`, current `triage` status, `alert`
configuration, `cluster_type`, and a reference to the element itself)
and dispatches the `start-triage` `CustomEvent`.
**Key Components & Files:**
- **`triage-status-sk.ts`:** This is the heart of the module, defining the
`TriageStatusSk` custom element class which extends `ElementSk`.
- **Properties:** It manages several key pieces of data as properties:
- `triage`: An object of type `TriageStatus` (defined in
`perf/modules/json`) holding the `status` ('positive', 'negative',
'untriaged') and a `message` string. This is the primary driver for the
element's appearance.
- `full_summary`: Potentially detailed information about the cluster, of
type `FullSummary`.
- `alert`: Information about any alert configuration associated with the
cluster, of type `Alert`.
- `cluster_type`: A string ('high' or 'low'), likely indicating the
priority or type of the cluster.
- **Rendering:** It uses `lit-html` for templating
(`TriageStatusSk.template`). The template renders a `<button>`
containing a `tricon2-sk` element. The `class` of the button and the
`value` of the `tricon2-sk` are bound to `ele.triage.status`,
dynamically changing the appearance.
- **Event Dispatch:** The `_start_triage` method is responsible for
creating and dispatching the `start-triage` event.
- **`triage-status-sk.scss`:** Defines the visual styling for the
`triage-status-sk` element. It includes specific styles for the different
triage states (`.positive`, `.negative`, `.untriaged`) and their hover
states, ensuring they integrate with the application's themes (including
dark mode variables like `--positive`, `--negative`, `--surface`).
- **`index.ts`:** A simple entry point that imports and thereby registers the
`triage-status-sk` custom element, making it available for use in HTML.
- **`triage-status-sk-demo.html` & `triage-status-sk-demo.ts`:** These files
provide a demonstration page for the `triage-status-sk` element.
- The HTML sets up instances of the element in different theme contexts
(default and dark mode).
- The TypeScript file demonstrates how to listen for the `start-triage`
event and how to programmatically set the `triage` property of the
element. This is crucial for developers to understand how to integrate
and use the component.
- **`BUILD.bazel`:** Defines how the module is built and its dependencies. It
specifies `tricon2-sk` as a UI dependency and includes necessary SASS and
TypeScript libraries.
- **`triage-status-sk_puppeteer_test.ts`:** Contains Puppeteer-based tests to
ensure the element renders correctly and behaves as expected in a browser
environment. This is important for maintaining code quality and preventing
regressions.
**Workflow Example: User Initiates Triage**
```
User sees a triage-status-sk button (e.g., showing an 'untriaged' icon)
|
V
User clicks the button
|
V
[triage-status-sk.ts] _start_triage() method is called
|
V
[triage-status-sk.ts] Creates a 'detail' object with:
- triage: { status: 'untriaged', message: '...' }
- full_summary: { ... }
- alert: { ... }
- cluster_type: 'low' | 'high'
- element: (reference to itself)
|
V
[triage-status-sk.ts] Dispatches a 'start-triage' CustomEvent with the 'detail' object
|
V
[Parent Component/Application Logic] Listens for 'start-triage' event
|
V
[Parent Component/Application Logic] Receives event.detail
|
V
[Parent Component/Application Logic] Uses the received data to:
- Open a triage dialog
- Populate the dialog with cluster details
- Allow user to select a new triage status
- (Potentially) update the original triage-status-sk element's
'triage' property after the dialog interaction is complete.
```
This design allows `triage-status-sk` to be a focused, presentational component,
while the more complex logic of handling the triage process itself is managed
elsewhere in the application. This promotes separation of concerns and
reusability.
# Module: /modules/triage2-sk
The `triage2-sk` module provides a custom HTML element for selecting a triage
status. This element is designed to be a simple, reusable UI component for
indicating whether a particular item is "positive", "negative", or "untriaged".
Its primary purpose is to offer a standardized way to represent and interact
with triage states across different parts of the Perf application.
The core of the module is the `triage2-sk` custom element, defined in
`triage2-sk.ts`. This element leverages the Lit library for templating and
rendering. It presents three buttons, each representing one of the triage
states:
- **Positive:** Indicated by a check circle icon (`<check-circle-icon-sk>`).
- **Negative:** Indicated by a cancel icon (`<cancel-icon-sk>`).
- **Untriaged:** Indicated by a help icon (`<help-icon-sk>`).
The "why" behind this design is to provide a clear visual representation of the
current triage status and an intuitive way for users to change it. By using
distinct icons and styling for each state, the element aims to reduce ambiguity.
**Key Implementation Details:**
- **`triage2-sk.ts`:** This is the main TypeScript file defining the
`TriageSk` class, which extends `ElementSk`.
- **State Management:** The current triage state is managed by the `value`
attribute (and corresponding property). It can be one of "positive",
"negative", or "untriaged". If no value is provided, it defaults to
"untriaged".
- **Event Emission:** When the user clicks a button to change the triage
state, the element dispatches a custom event named `change`. The
`detail` property of this event contains the new triage status as a
string (e.g., "positive"). This allows parent components to react to
changes in the triage status. `User clicks "Positive" button | V
triage2-sk sets its 'value' attribute to "positive" | V triage2-sk
dispatches a 'change' event with detail: "positive"`
- **Rendering:** The `template` static method uses Lit's `html` tagged
template literal to define the structure of the element. It dynamically
sets the `selected` attribute on the appropriate button based on the
current `value`.
- **Attribute Observation:** The element observes the `value` attribute.
When this attribute changes (either programmatically or through user
interaction), the `attributeChangedCallback` is triggered, which
re-renders the component and dispatches the `change` event.
- **Type Safety:** The `isStatus` function ensures that the `value`
property is always one of the allowed `Status` types, defaulting to
"untriaged" if an invalid value is encountered. This contributes to the
robustness of the component.
- **`triage2-sk.scss`:** This file contains the SASS styles for the
`triage2-sk` element.
- **Theming:** It defines styles for both a legacy color scheme and a
theme-based color scheme (including dark mode). This ensures the
component integrates visually with the rest of the application,
regardless of the active theme. The styling differentiates the selected
button and provides hover effects for better user feedback. The fill
colors of the icons change based on the triage state (e.g., green for
positive, red for negative).
- **`index.ts`:** This file serves as the entry point for the module,
exporting the `TriageSk` class and ensuring the custom element is defined.
- **Demo and Testing:**
- `triage2-sk-demo.html` and `triage2-sk-demo.ts`: Provide a simple
demonstration page showcasing the element in various states and how to
listen for the `change` event. This is useful for manual testing and
visual inspection.
- `triage2-sk_test.ts`: Contains Karma unit tests that verify the event
emission and value changes of the component.
- `triage2-sk_puppeteer_test.ts`: Includes Puppeteer-based end-to-end
tests that check the rendering of the component in a browser environment
and capture screenshots for visual regression testing.
The design choice of using custom elements and Lit allows for a modular and
maintainable component that can be easily integrated into larger applications.
The clear separation of concerns (logic in TypeScript, styling in SASS, and
structure in the template) follows common best practices for web component
development.
# Module: /modules/tricon2-sk
The `tricon2-sk` module provides a custom HTML element `<tricon2-sk>` designed
to visually represent triage states. This component is crucial for user
interfaces where quick identification of an item's status (e.g., in a bug
tracker, code review system, or monitoring dashboard) is necessary.
The core idea is to offer a standardized, reusable icon that clearly
communicates whether an item is "positive," "negative," or "untriaged." This
avoids inconsistencies and reduces cognitive load for users who frequently
interact with such systems.
**Key Components and Responsibilities:**
- **`tricon2-sk.ts`**: This is the heart of the module. It defines the
`TriconSk` class, which extends `ElementSk` (a base class for custom
elements in the Skia infrastructure).
- **Purpose:** To render one of three specific icons based on its `value`
attribute.
- **Implementation:**
- It utilizes the `lit-html` library for templating, allowing for
efficient rendering and updates.
- A `static template` function determines which icon to display
(`check-circle-icon-sk` for "positive", `cancel-icon-sk` for "negative",
and `help-icon-sk` for "untriaged" or any other value). This design
centralizes the icon selection logic.
- The `value` attribute is the primary interface for controlling the
displayed icon. Changes to this attribute trigger a re-render via
`attributeChangedCallback` and `_render()`.
- The `connectedCallback` ensures that the `value` property is properly
initialized if set before the element is attached to the DOM.
- **Dependencies:** It imports specific icon components
(`check-circle-icon-sk`, `cancel-icon-sk`, `help-icon-sk`) from the
`elements-sk` module, promoting modularity and reuse of existing icon
assets.
- **`tricon2-sk.scss`**: This file handles the styling of the `tricon2-sk`
element and its internal icons.
- **Purpose:** To define the colors of the icons based on their state and
to ensure they adapt correctly to different themes (e.g., light and dark
mode).
- **Implementation:**
- It uses SASS for more organized and maintainable styles.
- Crucially, it defines CSS variables (e.g., `--green`, `--red`,
`--brown`) for the icon fill colors. This allows themes (defined in
`themes.scss`) to override these colors easily.
- Specific styles are also provided for when the element is within a
`.body-sk` context and when `.darkmode` is applied to `.body-sk`. This
ensures the icons maintain appropriate contrast and visibility across
different UI themes. The fallback hardcoded colors (`#388e3c`, etc.)
provide a default styling if CSS variables are not defined by a theme.
- **`index.ts`**: This file serves as the main entry point for the module when
it's imported. Its sole responsibility is to import `tricon2-sk.ts`, which
in turn registers the `<tricon2-sk>` custom element. This is a common
pattern for organizing custom element definitions.
- **`tricon2-sk-demo.html` and `tricon2-sk-demo.ts`**: These files create a
demonstration page for the `<tricon2-sk>` element.
- **Purpose:** To showcase the different states of the `tricon2-sk`
element and how it appears in various theming contexts (default, with
`colors.css` theming, and with `themes.css` in both light and dark
modes). This is invaluable for development, testing, and documentation.
- **How it works:** The HTML file directly uses the `<tricon2-sk>` element
with different `value` attributes. The accompanying TypeScript file
simply imports the `index.ts` of the `tricon2-sk` module to ensure the
custom element is defined before the browser tries to render it.
- **`tricon2-sk_puppeteer_test.ts`**: This file contains automated UI tests
for the `tricon2-sk` element using Puppeteer.
- **Purpose:** To verify that the element renders correctly in different
states and to capture screenshots for visual regression testing.
- **How it works:** It loads the demo page (`tricon2-sk-demo.html`) in a
headless browser, checks if the expected number of `tricon2-sk` elements
are present (a basic smoke test), and then takes a screenshot of the
page. This ensures that changes to the component's appearance are caught
early.
**Workflow: Displaying a Triage Icon**
1. **Usage:** An application includes the `<tricon2-sk>` element in its HTML,
setting the `value` attribute:
```html
<tricon2-sk value="positive"></tricon2-sk>
```
2. **Element Initialization (`tricon2-sk.ts`):**
- The `TriconSk` class is instantiated.
- `connectedCallback` is called, ensuring the `value` property is
synchronized with the attribute.
- `_render()` is called.
3. **Template Selection (`tricon2-sk.ts`):**
- The `static template` function is invoked.
- Based on `this.value` (e.g., "positive"), it returns the corresponding
HTML template: `html<check-circle-icon-sk></check-circle-icon-sk>`.
4. **Icon Rendering:**
- The selected icon component (e.g., `<check-circle-icon-sk>`) renders
itself.
5. **Styling (`tricon2-sk.scss`):**
- CSS rules are applied. For example, if the value is "positive":
`tricon2-sk { check-circle-icon-sk { fill: var(--green); // Initially
attempts to use the CSS variable } }` - If themes are active (e.g.,`.body-sk.darkmode`), more specific rules
might override the fill color: `.body-sk.darkmode tricon2-sk {
check-circle-icon-sk { fill: #4caf50; // Specific dark mode color } }`
**Diagram: Attribute Change leading to Icon Update**
```
[User/Application sets/changes 'value' attribute on <tricon2-sk>]
|
v
[<tricon2-sk> element]
|
+---------------------+
| attributeChangedCallback() is triggered |
+---------------------+
|
v
[this._render()]
|
v
[TriconSk.template(this)] <-- Reads current 'this.value'
|
+-------------+-------------+
| (value is | (value is | (value is other)
| "positive") | "negative") |
v v v
[Returns [Returns [Returns
<check-...>] <cancel-...>] <help-...>]
|
v
[lit-html updates the DOM with the new icon template]
|
v
[Browser renders the new icon with appropriate CSS styles]
```
The design decision to use distinct, imported icon components
(`check-circle-icon-sk`, etc.) rather than, for example, a single SVG sprite or
dynamically generating SVG paths, promotes better separation of concerns. Each
icon can be managed and updated independently. The use of CSS variables for
theming is a standard and flexible approach, allowing consuming applications to
easily adapt the icon colors to their specific look and feel without modifying
the component's core logic or styles directly.
# Module: /modules/trybot
The `trybot` module provides utilities for processing and analyzing results from
Perf trybots. Trybots are automated systems that run performance tests on code
changes before they are submitted. This module focuses on calculating and
presenting metrics that help developers understand the performance impact of
their changes.
The core functionality revolves around aggregating and averaging `stddevRatio`
values across different parameter combinations. The `stddevRatio` is a key
metric representing the change in performance relative to the standard deviation
of the baseline. A positive `stddevRatio` generally indicates a performance
regression, while a negative value suggests an improvement.
The primary goal is to help developers quickly identify which aspects of their
change (represented by key-value parameters like `model=GCE` or
`test=MyBenchmark`) are contributing most significantly to performance changes,
both positive and negative. By grouping results by these parameters and
calculating average `stddevRatio`, the module provides a summarized view that
highlights potential problem areas or confirms expected improvements.
### Key Components and Files:
- **`calcs.ts`**: This file contains the logic for performing calculations on
trybot results.
- **`byParams(res: TryBotResponse): AveForParam[]`**: This is the central
function of the module.
- **Why**: Developers need a way to understand the overall performance
impact of their changes across various configurations (e.g., different
devices, tests, or operating systems). Simply looking at individual
trace results can be overwhelming. This function provides a summarized
view by grouping results by their parameters.
- **How**:
1. It takes a `TryBotResponse` object, which contains a list of
individual test results (`res.results`). Each result includes a
`stddevRatio` and a set of `params` (key-value pairs describing the
test configuration).
2. It iterates through each result and then through each key-value pair
within that result's `params`.
3. For each unique `key=value` string (e.g., "model=GCE"), it maintains
a running total of `stddevRatio` values, the count of traces
contributing to this total (`n`), and counts of traces with positive
(`high`) or negative (`low`) `stddevRatio`. This aggregation happens
in the `runningTotals` object.
```
Input TryBotResponse.results:
[
{ params: {arch: "arm", os: "android"}, stddevRatio: 1.5 },
{ params: {arch: "x86", os: "linux"}, stddevRatio: -0.5 },
{ params: {arch: "arm", os: "ios"}, stddevRatio: 2.0 }
]
-> runningTotals intermediate state (simplified):
"arch=arm": { totalStdDevRatio: 3.5, n: 2, high: 2, low: 0 }
"os=android": { totalStdDevRatio: 1.5, n: 1, high: 1, low: 0 }
"arch=x86": { totalStdDevRatio: -0.5, n: 1, high: 0, low: 1 }
"os=linux": { totalStdDevRatio: -0.5, n: 1, high: 0, low: 1 }
"os=ios": { totalStdDevRatio: 2.0, n: 1, high: 1, low: 0 }
```
4. After processing all results, it calculates the average
`stddevRatio` for each `key=value` pair by dividing
`totalStdDevRatio` by `n`.
5. It constructs an array of `AveForParam` objects. Each object
represents a `key=value` parameter and includes its calculated
average `stddevRatio`, the total number of traces (`n`) that matched
this parameter, and the counts of high and low `stddevRatio` traces.
6. Finally, it sorts this array in descending order based on the
`aveStdDevRatio`. This crucial step brings the parameters associated
with the largest (potentially negative) performance regressions to
the top, making them easy to identify.
- **`AveForParam` interface**: Defines the structure for the output of
`byParams`. It holds the aggregated average `stddevRatio` for a specific
`keyValue` pair, along with counts of traces.
- **`runningTotal` interface**: An internal helper interface used during
the aggregation process within `byParams` to keep track of sums and
counts before the final average is computed.
- **`calcs_test.ts`**: This file contains unit tests for the functions in
`calcs.ts`.
- **Why**: To ensure the correctness of the calculation logic, especially
for edge cases (e.g., empty input) and the core averaging and sorting
functionality.
- **How**: It uses `chai` for assertions. Tests cover scenarios like:
- Empty input to `byParams` should return an empty list.
- Correct calculation of average `stddevRatio` for multiple traces sharing
common parameters. For example, if two traces have `test=1`, their
`stddevRatio` values should be averaged for the `test=1` entry in the
output.
- Ensuring the output is correctly sorted by `aveStdDevRatio` in
descending order.
### Key Workflows/Processes:
**Calculating Average StdDevRatio by Parameter:**
```
TryBotResponse
|
v
byParams(response)
|
| 1. Initialize `runningTotals` (empty map)
|
| 2. For each `result` in `response.results`:
| |
| |-> For each `param` (key-value pair) in `result.params`:
| |
| |--> Generate `runningTotalsKey` (e.g., "model=GCE")
| |--> Retrieve or create `runningTotal` entry for `runningTotalsKey`
| |--> Update `totalStdDevRatio`, `n`, `high`, `low` in the entry
|
| 3. Initialize `ret` (empty array of AveForParam)
|
| 4. For each `runningTotalKey` in `runningTotals`:
| |
| |-> Calculate `aveStdDevRatio` = `runningTotal.totalStdDevRatio` / `runningTotal.n`
| |-> Create `AveForParam` object
| |-> Push to `ret`
|
| 5. Sort `ret` by `aveStdDevRatio` (descending)
|
v
Array of AveForParam
```
This workflow allows users to quickly pinpoint which configuration parameters
(like specific device models, operating systems, or test names) are associated
with the most significant average performance changes in a given trybot run. The
sorting ensures that the most impactful parameters are immediately visible.
# Module: /modules/trybot-page-sk
The `trybot-page-sk` module provides a user interface for analyzing performance
regressions. It allows users to select either a specific commit from the
repository or a trybot run (representing a potential code change) and then
analyze performance metrics associated with that selection. The core purpose is
to help developers identify and understand performance impacts before or after
code submission.
**Key Responsibilities and Components:**
- **User Input and Selection:**
- The page is organized into two main tabs: "Commit" and "TryBot". This
separation allows users to focus on either analyzing historical
performance data or evaluating the impact of pending changes.
- **Commit Analysis:** Users can select a specific commit using the
`commit-detail-picker-sk` element. This allows them to investigate
performance regressions that might have been introduced by a particular
code change.
- **TryBot Analysis:** (The "TryBot" tab is present in the UI template but
its functionality for selecting trybot runs, CLs, and patch numbers is
not fully detailed in the provided `trybot-page-sk.ts`. It appears to be
a planned feature or a more complex interaction than commit selection.)
The underlying `TryBotRequest` interface includes fields like `cl` and
`patch_number`, indicating the intent to support this.
- Once a commit (or eventually a trybot run) is selected, users define the
scope of the analysis by specifying a query using `query-sk`. This query
filters the performance traces to be considered (e.g., focusing on
specific benchmarks, configurations, or architectures).
- The `paramset-sk` and `query-count-sk` elements provide feedback on the
current query, showing the matching parameters and the number of traces
that fit the criteria. This helps users refine their query to target the
relevant data.
- **Data Fetching and Processing:**
- When the user clicks the "Run" button, the `run` method is invoked. This
method constructs a `TryBotRequest` object based on the user's
selections (commit number, query, or eventually CL/patch details).
- It sends this request to the `/_/trybot/load/` backend endpoint. This
endpoint is responsible for fetching the relevant performance data
(trace values, headers, parameter sets) for the specified commit/trybot
and query. The `startRequest` utility handles the asynchronous request
and displays progress using a `spinner-sk`.
- The response (`TryBotResponse`) contains the performance data,
including:
- `results`: An array of individual trace results, each containing
parameter values (`params`), actual metric values (`values`), and a
`stddevRatio` (how many standard deviations the trace's value is from
the median of its historical data).
- `paramset`: The complete set of parameters found across all returned
traces.
- `header`: Information about the data points in each trace, likely
including timestamps.
- The received data is then processed. Notably, the `byParams` function
(from `../trybot/calcs`) is used to aggregate results by parameter
key-value pairs, calculating average standard deviation ratios, counts,
and high/low values for each group. This helps identify which parameters
are most strongly correlated with performance changes.
- **Results Display and Visualization:**
- The results are presented in two tabs: "Individual" and "By Params".
- **Individual Tab:**
- Lists individual traces that match the query, showing their
parameters, standard deviation ratio, and an option to plot them.
- To avoid overwhelming the user, only the head and tail of long lists
are displayed.
- Clicking the plot icon (`timeline-icon-sk`) for a trace renders its
values over time on a `plot-simple-sk` element. Users can CTRL-click
to plot multiple traces on the same graph for comparison.
- The table intelligently displays parameter values, showing "〃" if a
value is the same as the row above it and "∅" if a parameter doesn't
exist for a trace.
- **By Params Tab:**
- Displays the aggregated results from the `byParams` calculation. For
each parameter key-value pair (e.g., "config=gles"), it shows the
average standard deviation ratio, the number of traces (N) in that
group, and the highest/lowest individual trace values.
- This view helps quickly identify which specific parameter values are
associated with significant performance deviations.
- Similar to the individual tab, users can click a plot icon to
visualize a group of traces. Up to `maxByParamsPlot` traces from the
selected group (sorted by `stddevRatio`) are plotted on a separate
`plot-simple-sk`.
- When a trace is focused on the "By Params" plot (e.g., by hovering),
its full trace ID and its parameter set are displayed below the plot
using `by-params-traceid` and `by-params-paramset` respectively.
`paramset-sk` is used to display the parameters, highlighting the
ones belonging to the focused trace.
- **State Management:**
- The component uses `stateReflector` to synchronize its internal state
(`this.state`, which is a `TryBotRequest` object) with the URL. This
means that the selected commit, query, and analysis type ("commit" or
"trybot") are reflected in the URL query parameters. This allows users
to bookmark or share specific analysis views.
- Changes to the commit selection, query, or tab selection trigger
`stateHasChanged()`, which updates the URL via `stateReflector` and
re-renders the component.
- **Styling and Structure:**
- The `trybot-page-sk.scss` file defines the visual appearance and layout
of the component, including styles for the query section, results
tables, and plot areas.
- The component is built using Lit templates, enabling reactive updates to
the DOM when the underlying state changes.
**Workflow Example (Commit Analysis):**
1. **User Selects Tab:** User ensures the "Commit" tab is selected. `[tabs-sk]
--selects index 0--> [TrybotPageSk.tabSelected] --> state.kind = "commit"
--> stateHasChanged()`
2. **User Selects Commit:** User interacts with `commit-detail-picker-sk`.
`[commit-detail-picker-sk] --commit-selected event-->
[TrybotPageSk.commitSelected] --> state.commit_number =
selected_commit_offset --> stateHasChanged() --> _render() (UI updates to
show query section)`
3. **User Enters Query:** User types into `query-sk`.
```
[query-sk] --query-change event--> [TrybotPageSk.queryChange]
--> state.query = new_query_string
--> stateHasChanged()
--> _render() (paramset-sk summary updates)
[query-sk] --query-change-delayed event--> [TrybotPageSk.queryChangeDelayed]
--> [query-count-sk].current_query = new_query_string (triggers count update)
```
4. **User Clicks "Run":** `[Run Button] --click--> [TrybotPageSk.run] -->
spinner-sk.active = true --> startRequest('/_/trybot/load/', state, ...) -->
HTTP POST to backend with { kind: "commit", commit_number: X, query: "Y" }
<-- Backend responds with TryBotResponse (trace data, paramset, header) -->
results = TryBotResponse --> byParams = byParams(results) -->
spinner-sk.active = false --> _render() (results tables and plot areas
become visible and populated)`
5. **User Interacts with Results:**
- **Plotting Individual Trace:** `[Timeline Icon in Individual Table]
--click--> [TrybotPageSk.plotIndividualTrace(event, index)] -->
individualPlot.addLines(...) --> displayedTrace = true --> \_render()
(individual plot becomes visible)` - **Plotting By Params Group:**`[Timeline Icon in By Params Table]
--click--> [TrybotPageSk.plotByParamsTraces(event, index)] --> Filters
results.results for matching key=value --> byParamsPlot.addLines(...)
--> byParamsParamSet.paramsets = [ParamSet of plotted traces] -->
displayedByParamsTrace = true --> \_render() (by params plot and its
paramset become visible)` - **Focusing Trace on By Params Plot:**`[by-params-plot] --trace_focused
event--> [TrybotPageSk.byParamsTraceFocused] -->
byParamsTraceID.innerText = focused_trace_name -->
byParamsParamSet.highlight = fromKey(focused_trace_name) --> \_render()
(updates highlighted params in by-params-paramset)`
The design emphasizes providing both a high-level overview of potential
regression areas (via "By Params") and the ability to drill down into individual
trace performance. The use of `stddevRatio` as a primary metric helps quantify
the significance of observed changes.
# Module: /modules/user-issue-sk
## User Issue Management Element (`user-issue-sk`)
The `user-issue-sk` module provides a custom HTML element for associating and
managing Buganizer issues with specific data points in the Perf application.
This allows users to directly link performance regressions or anomalies to their
corresponding bug reports, enhancing traceability and collaboration.
**Why:** Tracking issues related to performance data is crucial for effective
debugging and resolution. This element centralizes the issue linking process
within the Perf UI, providing a seamless experience for users to add, view, and
remove bug associations.
**How:**
The core functionality revolves around the `UserIssueSk` LitElement class. This
class manages the display and interaction logic for associating a Buganizer
issue with a data point identified by its `trace_key` and `commit_position`.
**Key Responsibilities and Components:**
- **User Authentication:** The element first checks if a user is logged in
using `alogin-sk`. This is essential because only logged-in users can add or
remove issue associations. If a user is not logged in, they can only view
existing issue links.
- **State Management:**
- `bug_id`: This property determines the element's display.
- `bug_id === 0`: Indicates no Buganizer issue is associated with the data
point. The element will display an "Add Bug" button (if the user is
logged in).
- `bug_id > 0`: An existing Buganizer issue is linked. The element will
display a link to the bug and, if the user is logged in, a "close" icon
to remove the association.
- `bug_id === -1`: This is a special state where the element renders
nothing, effectively hiding itself. This might be used in scenarios
where issue linking is not applicable.
- `_text_input_active`: A boolean flag that controls the visibility of the
input field for entering a new bug ID.
- **Rendering Logic:** The `render()` method dynamically chooses between two
main templates based on the `bug_id` and login status:
- `addIssueTemplate()`: Shown when `bug_id === 0` and the user is logged
in. It initially displays an "Add Bug" button. Clicking this button
reveals an input field for the bug ID and confirm/cancel icons.
- `showLinkTemplate()`: Shown when `bug_id > 0`. It displays a formatted
link to the Buganizer issue (using `AnomalySk.formatBug`). If the user
is logged in, a "close" icon is also displayed to allow removal of the
issue link.
- **API Interaction:**
- `addIssue()`: Triggered when a user submits a new bug ID. It makes a
POST request to the `/_/user_issue/save` endpoint with the `trace_key`,
`commit_position`, and the new `issue_id`.
- `removeIssue()`: Triggered when a logged-in user clicks the "close" icon
next to an existing bug link. It makes a POST request to the
`/_/user_issue/delete` endpoint with the `trace_key` and
`commit_position`.
- **Event Dispatching:** After successfully adding or removing an issue, the
element dispatches a custom event named `user-issue-changed`. This event
bubbles up and carries a `detail` object containing the `trace_key`,
`commit_position`, and the new `bug_id`. This allows parent components or
other parts of the application to react to changes in issue associations
(e.g., by refreshing a list of user-reported issues).
- **Error Handling:** Uses the `errorMessage` utility from
`perf/modules/errorMessage` to display feedback to the user in case of API
errors or invalid input.
**Key Files:**
- **`user-issue-sk.ts`**: This is the heart of the module. It defines the
`UserIssueSk` LitElement, including its properties, styles, templates, and
logic for interacting with the backend API and handling user input. The
design focuses on conditional rendering based on the `bug_id` and user login
status. The API calls are standard `fetch` requests.
- **`index.ts`**: A simple entry point that imports and registers the
`user-issue-sk` custom element, making it available for use in HTML.
- **`BUILD.bazel`**: Defines the build dependencies for the element, including
`alogin-sk` for authentication, `anomaly-sk` for bug link formatting, icon
elements for the UI, and Lit libraries for web component development.
**Workflows:**
1. **Adding a New Issue:** User (logged in) sees "Add Bug" button User clicks
"Add Bug" -> `activateTextInput()` is called -> `_text_input_active` becomes
`true` -> Element re-renders to show input field, check icon, close icon
User types bug ID into input field -> `changeHandler()` updates `_input_val`
User clicks check icon -> `addIssue()` is called -> Input validation (is
`_input_val` > 0?) -> POST request to `/_/user_issue/save` with trace_key,
commit_position, input_val -> On success: -> `bug_id` is updated with
`_input_val` -> `_input_val` reset to 0 -> `_text_input_active` set to
`false` -> `user-issue-changed` event is dispatched -> Element re-renders to
show the new bug link and remove icon -> On failure: -> `errorMessage` is
displayed -> `hideTextInput()` is called (resets state)
2. **Viewing an Existing Issue:** Element is initialized with `bug_id > 0` ->
`render()` calls `showLinkTemplate()` -> A link to `perf.bug_host_url +
bug_id` is displayed. -> If user is logged in, a "close" icon is also
displayed.
3. **Removing an Existing Issue:** User (logged in) sees bug link and "close"
icon User clicks "close" icon -> `removeIssue()` is called -> POST request
to `/_/user_issue/delete` with trace_key, commit_position -> On success: ->
`bug_id` is set to 0 -> `_input_val` reset to 0 -> `_text_input_active` set
to `false` -> `user-issue-changed` event is dispatched -> Element re-renders
to show "Add Bug" button -> On failure: -> `errorMessage` is displayed
The design prioritizes a clear separation of concerns: display logic is handled
by LitElement's templating system, state is managed through properties, and
backend interactions are encapsulated in dedicated asynchronous methods. The use
of custom events allows for loose coupling with other components that might need
to react to changes in issue associations.
# Module: /modules/window
The `window` module is designed to provide utility functions related to the
browser's `window` object, specifically focusing on parsing and interpreting
configuration data embedded within it. This approach centralizes the logic for
accessing and processing global configurations, making it easier to manage and
test.
A key responsibility of this module is to extract and process build tag
information. This information is often embedded in the `window.perf.image_tag`
global variable, which is expected to be an `SkPerfConfig` object (defined in
`//perf/modules/json:index_ts_lib`). The `getBuildTag` function is the primary
component for this task.
The `getBuildTag` function takes an image tag string as input (or defaults to
`window.perf?.image_tag`). Its core purpose is to parse this string and
categorize the build tag. The function employs a specific parsing logic based on
the structure of the image tag:
1. **Initial Validation**:
- The function first splits the input tag string by the `@` character.
- If there are fewer than two parts (i.e., no `@` or `@` is the first/last
character), it's considered an invalid tag.
- It then checks if the second part (after `@`) starts with `tag:`. If
not, it's also an invalid tag.
```
Input Tag String
|
V
Split by '@'
|
V
Check for at least 2 parts AND second part starts with "tag:"
|
+-- No --> Invalid Tag
|
V
Proceed to type determination
```
2. **Tag Type Determination**: Based on the prefix of the raw tag (the part
after `tag:`):
- **Git Tag**: If the raw tag starts with `tag:git-`, it's classified as a
'git' type. The function extracts the first 7 characters of the Git
hash. `rawTag starts with "tag:git-" | V Type: 'git' Tag: First 7 chars
of Git hash` - **Louhi Build Tag**: If the raw tag has a specific length (>= 38
characters) and contains`louhi`at a particular position (substring
from index 25 to 30), it's classified as a 'louhi' type. The function
extracts a 7-character identifier (substring from index 31 to 38) which
typically represents a hash or version.`rawTag length >= 38 AND
rawTag[25:30] == "louhi" | V Type: 'louhi' Tag: rawTag[31:38]` - **Regular Tag**: If neither of the above conditions is met, it's
considered a generic 'tag' type. The function returns the portion of the
string after`tag:`. `Neither Git nor Louhi | V Type: 'tag' Tag: rawTag
after "tag:"`
This structured approach ensures that different build tag formats can be
reliably identified and their relevant parts extracted. The decision to
differentiate between 'git', 'louhi', and generic 'tag' types allows downstream
consumers of this information to handle them appropriately. For instance, a
'git' tag might be used to link to a specific commit, while a 'louhi' tag might
indicate a specific build from an internal CI system.
The module also extends the global `Window` interface to declare the `perf:
SkPerfConfig` property. This is a TypeScript feature that provides type safety
when accessing `window.perf`, ensuring that developers are aware of its expected
structure.
The `window_test.ts` file provides unit tests for the `getBuildTag` function,
covering various scenarios including valid git tags, Louhi build tags, arbitrary
tags, and different forms of invalid tags. These tests are crucial for verifying
the correctness of the parsing logic and ensuring that changes to the function
do not introduce regressions. The use of `chai` for assertions is a standard
practice for testing in this environment.
# Module: /modules/word-cloud-sk
The `word-cloud-sk` module provides a custom HTML element designed to visualize
key-value pairs and their relative frequencies. This is particularly useful for
displaying data from clusters or other datasets where understanding the
distribution of different attributes is important.
The core idea is to present this frequency information in an easily digestible
format, combining textual representation with a simple bar graph for each item.
This allows users to quickly grasp the prevalence of certain key-value pairs
within a dataset.
**Key Components and Responsibilities:**
- **`word-cloud-sk.ts`**: This is the heart of the module, defining the
`WordCloudSk` custom element which extends `ElementSk`.
- **Why**: It encapsulates the logic for rendering the word cloud. By
extending `ElementSk`, it leverages common functionalities provided by
the `infra-sk` library for custom elements.
- **How**: It uses the `lit-html` library for templating. The `items`
property, an array of `ValuePercent` objects (defined in
`//perf/modules/json:index_ts_lib`), is the primary input. Each
`ValuePercent` object contains a `value` (the key-value string) and a
`percent` (its frequency).
- The rendering logic iterates through the `items` and creates a table row
for each. Each row displays the key-value string, its percentage as
text, and a horizontal bar whose width is proportional to the
percentage.
- The `connectedCallback` ensures that if the `items` property is set
before the element is fully connected to the DOM, it's properly upgraded
and the element is rendered.
- The `_render()` method is called whenever the `items` property changes,
ensuring the display is updated.
- **`word-cloud-sk.scss`**: This file contains the SASS styles for the
`word-cloud-sk` element.
- **Why**: It provides the visual appearance of the word cloud, ensuring
it's readable and visually distinct.
- **How**: It defines styles for the table, table cells, and the
percentage bar. It uses CSS variables for theming (e.g., `--light-gray`,
`--on-surface`, `--primary`), allowing the component to adapt to
different themes (like light and dark mode) defined in
`//perf/modules/themes:themes_sass_lib` and
`//elements-sk/modules:colors_sass_lib`.
- Specific styles are applied for font family, size, padding, borders, and
the background color and height of the percentage bar.
- **`word-cloud-sk-demo.html` and `word-cloud-sk-demo.ts`**: These files
provide a demonstration page for the `word-cloud-sk` element.
- **Why**: They serve as a live example of how to use the component and
allow for easy visual testing and development.
- **How**: `word-cloud-sk-demo.html` includes multiple instances of the
`<word-cloud-sk>` tag, some within sections with different theming
(e.g., dark mode). `word-cloud-sk-demo.ts` then selects these instances
and populates their `items` property with sample data. This demonstrates
how the component can be instantiated and how data is passed to it.
- **`index.ts`**: This file simply imports and thereby registers the
`word-cloud-sk` custom element.
- **Why**: It acts as the entry point for the element, ensuring it's
defined when the module is imported.
**Workflow: Data Display**
The primary workflow involves providing data to the `word-cloud-sk` element and
its subsequent rendering:
1. **Instantiation**: An instance of `<word-cloud-sk>` is created in HTML.
```
<word-cloud-sk></word-cloud-sk>
```
2. **Data Provision**: The `items` property of the element is set with an array
of `ValuePercent` objects.
```
// In JavaScript/TypeScript:
const wordCloudElement = document.querySelector('word-cloud-sk');
wordCloudElement.items = [
{ value: 'arch=x86', percent: 100 },
{ value: 'config=565', percent: 60 },
// ... more items
];
```
3. **Rendering (`_render()` called in `word-cloud-sk.ts`)**:
- The `WordCloudSk` element iterates through the `_items` array.
- For each item:
- A table row (`<tr>`) is generated.
- The `item.value` is displayed in the first cell (`<td>`).
- The `item.percent` is displayed as text (e.g., "60%") in the second
cell.
- A `<div>` element is created in the third cell. Its `width` style is
set to `item.percent` pixels, creating a visual bar representation
of the percentage.
The overall structure rendered looks like this (simplified):
```
<table>
<tr> <!-- For item 1 -->
<td class="value">[item1.value]</td>
<td class="textpercent">[item1.percent]%</td>
<td class="percent">
<div style="width: [item1.percent]px"></div>
</td>
</tr>
<tr> <!-- For item 2 -->
<td class="value">[item2.value]</td>
<td class="textpercent">[item2.percent]%</td>
<td class="percent">
<div style="width: [item2.percent]px"></div>
</td>
</tr>
<!-- ... more rows -->
</table>
```
This process ensures that whenever the input data changes, the visual
representation of the word cloud is automatically updated. The use of CSS
variables for styling allows the component to seamlessly integrate into
applications with different visual themes.
# Module: /nanostat
## Nanostat
`nanostat` is a command-line tool designed to compare and analyze the results of
Skia's nanobench benchmark. It takes two JSON files generated by nanobench as
input, representing "old" and "new" benchmark runs, and provides a statistical
summary of the performance changes between them. This is particularly useful for
developers to understand the performance impact of their code changes.
### Why it exists
When making changes to a codebase, especially one as performance-sensitive as a
graphics library like Skia, it's crucial to measure the impact on performance.
Nanobench produces detailed raw data, but interpreting this data directly can be
cumbersome. `nanostat` was created to:
1. **Automate Statistical Analysis:** Apply statistical tests (Mann-Whitney U
test or Welch's T-test) to determine if observed differences in benchmark
results are statistically significant or likely due to random variation.
2. **Summarize Changes:** Present a concise, human-readable summary of
performance changes, highlighting significant regressions or improvements.
3. **Facilitate Quick Comparisons:** Enable developers to quickly compare
benchmark runs before and after a code change, streamlining the performance
analysis workflow.
4. **Provide Filtering and Sorting:** Offer options to filter out insignificant
changes, remove outliers, and sort results based on various criteria (e.g.,
by the magnitude of change or by test name).
### How it works
The core workflow of `nanostat` involves several steps:
1. **Input:** It accepts two file paths as command-line arguments, pointing to
the "old" and "new" nanobench JSON output files.
```
nanostat [options] old.json new.json
```
2. **Parsing:** The `loadFileByName` function in `main.go` is responsible for
opening and parsing these JSON files. It uses the
`perf/go/ingest/format.ParseLegacyFormat` function to interpret the
nanobench output structure and then
`perf/go/ingest/parser.GetSamplesFromLegacyFormat` to extract the raw sample
values for each benchmark test. Each file's data is converted into a
`parser.SamplesSet`, which is a map where keys are test identifiers and
values are slices of performance measurements (samples).
3. **Statistical Analysis:** The `samplestats.Analyze` function (from the
`perf/go/samplestats` module) is the heart of the comparison. It takes the
two `parser.SamplesSet` (before and after samples) and a
`samplestats.Config` object as input. The configuration includes:
- `Alpha`: The significance level (default 0.05). A p-value below alpha
indicates a significant difference.
- `IQRR`: A boolean indicating whether to apply the Interquartile Range
Rule to remove outliers from the sample data before analysis.
- `All`: A boolean determining if all results (significant or not) should
be displayed.
- `Test`: The type of statistical test to perform (Mann-Whitney U test or
Welch's T-test).
- `Order`: The function used to sort the output rows.
For each common benchmark test found in both input files,
`samplestats.Analyze` calculates statistics for both sets of samples (mean,
percentage deviation) and then performs the chosen statistical test to
compare the two distributions. This yields a p-value.
4. **Filtering and Sorting:** Based on the `config`, `samplestats.Analyze`
filters out rows where the change is not statistically significant (if
`config.All` is false). The remaining rows are then sorted according to
`config.Order`.
5. **Output Formatting:** The `formatRows` function in `main.go` takes the
analyzed and sorted `samplestats.Row` data and prepares it for display.
- It identifies "important keys" from the benchmark parameters (e.g.,
`config`, `name`, `test`). These are keys whose values differ across the
benchmark results, helping to distinguish them.
- It constructs a header line for the output table.
- For each row of results, it formats the old and new means, standard
deviations, the percentage delta, the p-value, sample sizes, and the
important key values.
- If a change is not significant (p-value > alpha), the delta is shown as
"~" unless the `--all` flag is used.
- The formatted strings are then printed to `stdout` using
`text/tabwriter` to create a well-aligned table.
Example output line:
```
old new delta stats name
2.15 ± 5% 2.00 ± 2% -7% (p=0.001, n=10+ 8) tabl_digg.skp
```
### Key Components and Files
- **`main.go`**: This is the entry point of the application.
- **Responsibilities**:
- Parses command-line arguments and flags (`-alpha`, `-sort`, `-iqrr`,
`-all`, `-test`).
- Validates user input and displays usage information if necessary.
- Calls `loadFileByName` to load and parse the input JSON files.
- Constructs the `samplestats.Config` based on the provided flags.
- Invokes `samplestats.Analyze` to perform the statistical comparison.
- Calls `formatRows` to format the results for display.
- Uses `text/tabwriter` to print the formatted output to the console.
- **Key functions**:
- `actualMain(stdout io.Writer)`: Contains the main logic, allowing
`stdout` to be replaced for testing.
- `loadFileByName(filename string) parser.SamplesSet`: Reads a nanobench
JSON file, parses it, and extracts the performance samples. It leverages
`perf/go/ingest/format` and `perf/go/ingest/parser`.
- `formatRows(config samplestats.Config, rows []samplestats.Row)
[]string`: Takes the analysis results and formats them into a slice of
strings, ready for tabular display. It intelligently includes relevant
parameter keys in the output.
- **`main_test.go`**: Contains unit tests for `nanostat`.
- **Responsibilities**:
- Ensures that `nanostat` produces the expected output for various
command-line flag combinations and input files.
- Uses golden files (`testdata/*.golden`) to compare actual output against
expected output.
- **Key functions**:
- `TestMain_DifferentFlags_ChangeOutput(t *testing.T)`: The main test
function that sets up different test cases.
- `check(t *testing.T, name string, args ...string)`: A helper function
that runs `nanostat` with specified arguments, captures its output, and
compares it against a corresponding golden file.
- **`README.md`**: Provides user-facing documentation on how to install and
use `nanostat`, including examples and descriptions of command-line options.
- **`Makefile`**: Contains targets for building, testing, and regenerating
test data (golden files). The `regenerate-testdata` target is crucial for
updating the golden files when the tool's output format or logic changes.
- **`BUILD.bazel`**: Defines how to build and test the `nanostat` binary and
its library using the Bazel build system. It lists dependencies on other
Skia modules, such as:
- `//go/paramtools`: Used in `formatRows` to work with parameter sets from
benchmark results.
- `//perf/go/ingest/format`: Used for parsing the legacy nanobench JSON
format.
- `//perf/go/ingest/parser`: Used to extract sample data from the parsed
format.
- `//perf/go/samplestats`: Provides the core statistical analysis
functions (`samplestats.Analyze`, `samplestats.Order`,
`samplestats.Test`).
### Dependencies and Design Choices
- **`perf/go/samplestats`**: `nanostat` heavily relies on this module for the
actual statistical computations. This promotes code reuse and separation of
concerns, keeping `nanostat` focused on command-line parsing, file I/O, and
output formatting.
- **`perf/go/ingest/format` and `perf/go/ingest/parser`**: These modules
handle the complexities of interpreting the nanobench JSON structure,
abstracting this detail away from `nanostat`'s main logic.
- **Command-line Flags**: The tool offers a range of flags to customize its
behavior (`-alpha`, `-iqrr`, `-all`, `-sort`, `-test`). This flexibility
allows users to tailor the analysis to their specific needs. For example,
the `-iqrr` flag allows for more robust analysis by removing potential
outlier data points that could skew results. The `-test` flag allows users
to choose between parametric (T-test) and non-parametric (U-test)
statistical tests, depending on the assumptions they are willing to make
about their data's distribution.
- **Tabular Output**: Using `text/tabwriter` provides a clean, aligned, and
easy-to-read output format, which is essential for quickly scanning and
understanding the performance changes.
- **Golden File Testing**: The use of golden files in `main_test.go` is a good
practice for testing command-line tools. It makes it easy to verify that
changes to the code don't unintentionally alter the output format or the
results of the analysis. The `Makefile` target `regenerate-testdata`
simplifies updating these files when intended changes occur.
# Module: /pages
The `/pages` module is responsible for defining the HTML structure and initial
JavaScript and CSS for all the user-facing pages of the Skia Performance
application. Each page represents a distinct view or functionality within the
application, such as viewing alerts, exploring performance data, or managing
regressions.
The core design philosophy is to keep the HTML files minimal and delegate the
rendering and complex logic to custom HTML elements (Skia Elements). This
promotes modularity and reusability of UI components.
**Key Components and Responsibilities:**
- **HTML Files (e.g., `alerts.html`, `newindex.html`):**
- These files serve as the entry point for each page.
- They define the basic HTML structure (`<head>`, `<body>`).
- Crucially, they include a `perf-scaffold-sk` custom element. This
element acts as a common layout wrapper for all pages, providing
consistent navigation, header, footer, and potentially other shared UI
elements.
- Inside the `perf-scaffold-sk`, they embed the primary custom element
specific to that page's functionality (e.g., `<alerts-page-sk>`,
`<explore-sk>`).
- They include Go template placeholders like `{%- template
"googleanalytics" . -%}` and `{% .Nonce %}` for server-side rendering of
common snippets and security nonces.
- A `window.perf = {%.context %};` script tag is used to pass initial data
or configuration from the server (Go backend) to the client-side
JavaScript. This context likely contains information needed by the
page-specific custom element to initialize itself.
- **TypeScript Files (e.g., `alerts.ts`, `newindex.ts`):**
- These files are the JavaScript entry points for each page.
- Their primary responsibility is to import the necessary custom elements.
This ensures that the browser knows how to render elements like
`<perf-scaffold-sk>` and the page-specific custom element (e.g.,
`../modules/alerts-page-sk`).
- By importing these elements, their associated JavaScript logic is
executed, making them functional.
- **SCSS Files (e.g., `alerts.scss`, `newindex.scss`):**
- These files provide page-specific styling.
- Currently, they all primarily `@import 'body';`, which means they
inherit base body styles from `body.scss`.
- If a page required unique styling beyond what the custom elements or
`body.scss` provide, those styles would be defined here.
- **`body.scss`:**
- This file defines global, minimal styles for the `<body>` element, such
as removing default margins and padding. This ensures a consistent
baseline across all pages.
- **`BUILD.bazel`:**
- This file defines how each page is built using the `sk_page` rule from
`//infra-sk:index.bzl`.
- For each page, it specifies:
- `html_file`: The entry HTML file.
- `ts_entry_point`: The entry TypeScript file.
- `scss_entry_point`: The entry SCSS file.
- `sk_element_deps`: A list of dependencies on other modules that provide
the custom HTML elements used by the page. This is crucial for ensuring
that elements like `perf-scaffold-sk` and page-specific elements (e.g.,
`alerts-page-sk`) are compiled and available.
- `sass_deps`: Dependencies for SCSS, typically including `:body_sass_lib`
which refers to the `body.scss` file.
- Other build-related configurations like `assets_serving_path`, `nonce`,
and `production_sourcemap`.
**Workflow for a Page Request:**
1. User navigates to a URL (e.g., `/alerts`).
2. The server (Go backend) maps this URL to the corresponding HTML file (e.g.,
`alerts.html`).
3. The Go backend processes the HTML template, injecting data for `{% .context
%}`, the `{% .Nonce %}`, and other templates like "googleanalytics" and
"cookieconsent".
4. The processed HTML is sent to the browser. `User Request ----> Go Backend
----> Template Processing (alerts.html + context) ----> HTML Response (URL
Routing) (Injects window.perf data, nonce)`
5. The browser parses the HTML.
6. When the browser encounters `<script src="alerts.js"></script>` (or the
equivalent generated by the build system), it fetches and executes
`alerts.ts`.
7. `alerts.ts` imports `../modules/perf-scaffold-sk` and
`../modules/alerts-page-sk`. This registers these custom elements with the
browser. `Browser Receives HTML -> Parses HTML -> Encounters <script> for
alerts.ts | -> Fetches and Executes alerts.ts | -> import
'../modules/perf-scaffold-sk'; -> import '../modules/alerts-page-sk';
(Custom elements are now defined)`
8. The browser then renders the custom elements (`<perf-scaffold-sk>` and
`<alerts-page-sk>`). The JavaScript logic within these custom elements takes
over, potentially fetching more data via AJAX using the initial
`window.perf` context if needed, and populating the page content. `Custom
Elements Registered -> Browser renders <perf-scaffold-sk> and
<alerts-page-sk> | -> JavaScript within these elements executes (e.g., reads
window.perf, makes AJAX calls, builds UI)`
9. The SCSS file (`alerts.scss`) is also linked in the HTML (via the build
system), and its styles (including those from `body.scss`) are applied.
This structure allows for a clean separation of concerns:
- HTML provides the basic skeleton and server-side data injection points.
- TypeScript/JavaScript (via custom elements) handles all dynamic behavior, UI
rendering, and interaction logic.
- SCSS handles the styling.
The `help.html` page is slightly different as it directly embeds more static
content (help text and examples) within its HTML structure using Go templating
(`{% range ... %}`). However, it still utilizes the `perf-scaffold-sk` for
consistent page layout and imports its JavaScript for any scaffold-related
functionalities.
The `newindex.html` and `multiexplore.html` pages additionally include a `div`
with `id="sidebar_help"` within the `perf-scaffold-sk`. This suggests that the
`perf-scaffold-sk` might have a designated area or slot where page-specific help
content can be injected, or that the page-specific JavaScript (`explore-sk.ts`
or `explore-multi-sk.ts`) might dynamically populate or interact with this
sidebar content.
# Module: /res
## Resource Module (`/res`)
### High-Level Overview
The `/res` module serves as a centralized repository for static assets required
by the application. Its primary purpose is to provide a consistent and organized
location for resources such as images, icons, and potentially other static files
that are part of the user interface or overall application branding. By
co-locating these assets, the module simplifies resource management, facilitates
easier updates, and ensures that all parts of the application can reliably
access necessary visual or static elements.
### Design Decisions and Implementation Choices
The decision to have a dedicated `/res` module stems from the need to separate
static content from dynamic code. This separation offers several benefits:
1. **Organization:** Grouping all static assets in one place makes the project
structure cleaner and easier to navigate. Developers know exactly where to
look for or add new resources.
2. **Maintainability:** When assets need to be updated (e.g., a new logo, a
changed icon), modifications are localized to this module, reducing the risk
of inadvertently affecting other parts of the codebase.
3. **Build Process Optimization:** Build tools can often be configured to
handle static assets differently (e.g., copying them directly to the output
directory, optimizing images). Having a dedicated module simplifies the
configuration of such processes.
4. **Caching and Delivery:** Web servers and content delivery networks (CDNs)
can be more effectively configured to cache and serve static assets when
they are located in a well-defined directory.
The internal structure of `/res` is designed to categorize different types of
assets. For instance, images are placed within a dedicated `img` subdirectory.
This categorization aids in discoverability and allows for type-specific
processing or handling if needed in the future.
### Key Components/Files/Submodules
- **`/res/img` (Submodule/Directory):**
- **Responsibility:** This submodule is dedicated to storing all image
assets used by the application. This includes logos, icons, background
images, and any other visual elements that are not dynamically
generated.
- **Why:** Separating images into their own directory within `/res` keeps
the root of the resource module clean and allows for specific
image-related build optimizations or management strategies. For example,
image compression tools or sprite generation scripts could target this
directory specifically.
- **Key Files:**
- **`/res/img/favicon.ico`:**
- **Responsibility:** This specific file provides the "favorite icon"
or "favicon" for the application. Web browsers display this icon in
various places, such as the browser tab, bookmarks bar, and address
bar history. It's a small but important branding element that helps
users quickly identify the application among many open tabs or saved
links.
- **Why:** The `.ico` format is the traditional and most widely
supported format for favicons, ensuring compatibility across
different browsers and platforms. Placing it directly in the `img`
directory makes it easily discoverable by build tools and web
servers, which often look for `favicon.ico` in standard locations.
Its presence here ensures that the application has a visual
identifier in browser contexts.
### Workflows and Processes
A typical workflow involving the `/res` module might look like this:
1. **Asset Creation/Acquisition:** A designer creates a new icon or a new
version of the application logo.
```
Designer Developer
| |
[New Image Asset] --> [Receives Asset]
```
2. **Asset Placement:** The developer places the new image file (e.g.,
`new_icon.png`) into the appropriate subdirectory within `/res`, likely
`/res/img/`.
```
Developer
|
[Places new_icon.png into /res/img/]
```
3. **Referencing the Asset:** Application code (e.g., HTML, CSS, JavaScript)
that needs to display this icon will reference it using a path relative to
how the assets are served.
```
Application Code (e.g., HTML)
|
<img src="/path/to/res/img/new_icon.png">
```
_(Note: The exact `/path/to/` depends on how the web server or build system
exposes the `/res` directory.)_
4. **Build Process:** During the application build, files from the `/res`
module are typically copied to a public-facing directory in the build
output.
```
Build System
|
[Reads /res/img/new_icon.png] --> [Copies to /public_output/img/new_icon.png]
```
5. **Client Request:** When a user accesses the application, their browser
requests the asset. `User's Browser Web Server | | [Requests
/public_output/img/new_icon.png] ----> [Serves new_icon.png] | | [Displays
new_icon.png] <------------------------+`
This workflow highlights how the `/res` module acts as the source of truth for
static assets, which are then processed and served to the end-user. The
`favicon.ico` follows a similar, often more implicit, path as browsers
automatically request it from standard locations.
# Module: /samplevariance
The `samplevariance` module is a command-line tool designed to analyze the
variance of benchmark samples, specifically those generated by nanobench and
stored in Google Cloud Storage (GCS). Nanobench typically produces multiple
samples (e.g., 10) for each benchmark execution. This tool facilitates the
examination of these samples across a large corpus of historical benchmark runs.
The primary motivation for this tool is to identify benchmarks exhibiting high
variance in their results. High variance can indicate instability in the
benchmark itself, the underlying system, or the measurement process. By
calculating statistics like the ratio of the median to the minimum value for
each set of samples, `samplevariance` helps pinpoint traces that warrant further
investigation.
The core workflow involves:
1. **Initialization**: Parsing command-line flags to determine the GCS location
of benchmark data, output destination (stdout or a file), filtering criteria
for traces, and the number of top results to display.
2. **File Discovery**: Listing all relevant JSON files from the specified GCS
bucket and prefix.
3. **Data Processing (Concurrent)**: Distributing the discovered filenames to a
pool of worker goroutines. Each worker:
- Downloads a JSON file from GCS.
- Parses the legacy nanobench format to extract benchmark results.
- Filters traces based on the user-provided criteria.
- For each matching trace, calculates the median and minimum of its
samples.
- Computes the ratio of median to minimum.
- Stores this information as a `sampleInfo` struct.
4. **Aggregation and Sorting**: Collecting all `sampleInfo` structs from the
workers and sorting them in descending order based on the calculated
median/min ratio. This brings the traces with the highest variance to the
top.
5. **Output**: Writing the sorted results to a CSV file (or stdout), including
the trace identifier, minimum value, median value, and the median/min ratio.
```
[Flags] -> initialize() -> (ctx, bucket, objectPrefix, traceFilter, outputWriter)
|
v
filenamesFromBucketAndObjectPrefix(ctx, bucket, objectPrefix) -> [filenames]
|
v
samplesFromFilenames(ctx, bucket, traceFilter, [filenames])
|
|--> [gcsFilenameChannel] -> Worker Goroutine 1 -> traceInfoFromFilename() -> [sampleInfo] --\
| |
|--> [gcsFilenameChannel] -> Worker Goroutine 2 -> traceInfoFromFilename() -> [sampleInfo] ----> [aggregatedSamples] (mutex protected)
| |
|--> ... (up to workerPoolSize) |
| |
|--> [gcsFilenameChannel] -> Worker Goroutine N -> traceInfoFromFilename() -> [sampleInfo] --/
|
v
Sort([aggregatedSamples])
|
v
writeCSV([sortedSamples], topN, outputWriter) -> CSV Output
```
Key components and their responsibilities:
- **`main.go`**: This is the entry point of the application and orchestrates
the entire process.
- `main()`: Drives the overall workflow: initialization, fetching
filenames, processing samples, sorting, and writing the output.
- `initialize()`: Handles command-line argument parsing. It sets up the
GCS client, determines the input GCS path (defaulting to yesterday's
data if not specified), parses the trace filter query, and configures
the output writer (stdout or a specified file). The choice to default to
yesterday's data provides a convenient way to monitor recent benchmark
stability without requiring explicit date specification.
- `filenamesFromBucketAndObjectPrefix()`: Interacts with GCS to list all
object names (filenames) under the specified bucket and prefix. It uses
GCS client library features to efficiently retrieve only the names,
minimizing data transfer.
- `samplesFromFilenames()`: Manages the concurrent processing of benchmark
files. It creates a channel (`gcsFilenameChannel`) to distribute
filenames to a pool of worker goroutines (`workerPoolSize`). An
`errgroup` is used to manage these goroutines and propagate any errors.
A mutex protects the shared `samples` slice where results from workers
are aggregated. This concurrent design is crucial for performance when
dealing with a large number of benchmark files.
- `traceInfoFromFilename()`: This function is executed by each worker
goroutine. It takes a single GCS filename, reads the corresponding
object from the bucket, parses the JSON content using
`format.ParseLegacyFormat` (from `perf/go/ingest/format`) and
`parser.GetSamplesFromLegacyFormat` (from `perf/go/ingest/parser`). For
each trace that matches the `traceFilter` (a `query.Query` object from
`go/query`), it sorts the sample values, calculates the median (using
`stats.Sample.Quantile` from `go-moremath/stats`) and minimum, and then
computes their ratio. The use of established libraries for parsing and
statistical calculation ensures correctness and leverages existing,
tested code.
- `writeCSV()`: Formats the processed `sampleInfo` data into CSV format
and writes it to the designated output writer. It includes a header row
and then iterates through the `sampleInfo` slice, writing each entry. It
also handles the `--top` flag to limit the number of output rows.
- `sampleInfo`: A simple struct to hold the calculated statistics (trace
ID, median, min, ratio) for a single benchmark trace's samples.
- `sampleInfoSlice`: A helper type that implements `sort.Interface` to
allow sorting `sampleInfo` slices by the `ratio` field in descending
order. This is key to presenting the most variant traces first.
- **`main_test.go`**: Contains unit tests for the `writeCSV` function. These
tests verify that the CSV output is correctly formatted under different
conditions, such as when writing all samples, a limited number of top
samples, or when the number of samples is less than the requested top N.
This ensures the output formatting logic is robust.
The design decision to use a worker pool (`workerPoolSize`) for processing files
in parallel significantly speeds up the analysis, especially when dealing with
numerous benchmark result files often found in GCS. The use of
`golang.org/x/sync/errgroup` simplifies error handling in concurrent operations.
Filtering capabilities (via the `--filter` flag and `go/query`) allow users to
narrow down the analysis to specific subsets of benchmarks, making the tool more
flexible and targeted. The output as a CSV file makes it easy to import the
results into spreadsheets or other data analysis tools for further examination.
# Module: /scripts
The `/scripts` module provides tooling to support the data ingestion pipeline
for Skia Perf. The primary focus is on automating the process of transferring
processed data to the designated cloud storage location for further analysis and
visualization within the Skia performance monitoring system.
The key responsibility of this module is to ensure reliable and timely delivery
of performance data. This is achieved by interacting with Google Cloud Storage
(GCS) using the `gsutil` command-line tool.
The main component within this module is the `upload_extracted_json_files.sh`
script.
**`upload_extracted_json_files.sh`**
This shell script is responsible for uploading JSON files, which are assumed to
be the output of a preceding data extraction or processing phase, to a specific
Google Cloud Storage bucket (`gs://skia-perf/nano-json-v1/`).
**Design Rationale and Implementation Details:**
- **Why a shell script?** Shell scripting is a straightforward and widely
available tool for automating command-line operations, making it suitable
for tasks like file transfers to cloud storage. It avoids the need for more
complex programming language environments for this specific, relatively
simple task.
- **Why `gsutil`?** `gsutil` is the standard command-line tool for interacting
with Google Cloud Storage. It provides robust features for uploading,
downloading, and managing data in GCS buckets.
- **Why `-m` (parallel uploads)?** The `-m` flag in `gsutil cp` enables
parallel uploads. This is a crucial performance optimization, especially
when dealing with a potentially large number of JSON files. By uploading
multiple files concurrently, the overall time taken for the transfer is
significantly reduced.
- **Why `cp -r` (recursive copy)?** The `-r` flag ensures that the entire
directory structure under `downloads/` is replicated in the destination GCS
path. This is important for maintaining the organization of the data and
potentially for downstream processing that might rely on the file paths.
- **Why the specific GCS path structure (`gs://skia-perf/nano-json-v1/$(date
-u --date +1hour +%Y/%m/%d/%H)`)?**
- `gs://skia-perf/nano-json-v1/`: This is the base path in the GCS bucket
designated for "nano" format JSON files, version 1. This structured
naming helps in organizing different types and versions of data within
the bucket.
- `$(date -u --date +1hour +%Y/%m/%d/%H)`: This part dynamically generates
a timestamped subdirectory structure.
- `date -u`: Ensures the date is in UTC, providing a consistent timezone
regardless of where the script is run.
- `--date +1hour`: This is a deliberate choice to place the data into the
_next_ hour's ingestion slot. This likely provides a buffer, ensuring
that all data generated within a given hour is reliably captured and
processed for that hour, even if the script runs slightly before or
after the hour boundary. It helps prevent data from being missed or
attributed to the wrong time window due to minor timing discrepancies in
script execution.
- `+%Y/%m/%d/%H`: Formats the date and time into a hierarchical path
(e.g., `2023/10/27/15`). This organization is beneficial for:
- **Data partitioning:** Makes it easy to query or process data for
specific time ranges.
- **Data lifecycle management:** Facilitates policies for archiving or
deleting older data based on these time-based folders.
- **Browseability:** Improves human readability and navigation within
the GCS bucket.
**Workflow:**
The script executes a simple, linear workflow:
1. **Source:** Identifies the `downloads/` directory in the current working
directory as the source of JSON files. `[Local Filesystem] | ./downloads/
(contains *.json files)`
2. **Destination Path Generation:** Dynamically constructs the target GCS path
using the current UTC time, advanced by one hour, and formatted as
`YYYY/MM/DD/HH`. `date command ---> YYYY/MM/DD/HH (e.g., 2023/10/27/15) |
Target GCS Path: gs://skia-perf/nano-json-v1/YYYY/MM/DD/HH/`
3. **Upload:** Uses `gsutil` to recursively copy all contents from `downloads/`
to the generated GCS path, utilizing parallel uploads for efficiency.
`./downloads/* ---(gsutil -m cp -r)--->
gs://skia-perf/nano-json-v1/YYYY/MM/DD/HH/`
This script assumes that the `downloads/` directory exists in the location where
the script is executed and contains the JSON files ready for upload. It also
presumes that the user running the script has the necessary `gsutil` tool
installed and configured with appropriate permissions to write to the specified
GCS bucket.
# Module: /secrets
The `/secrets` module is responsible for managing the creation and configuration
of secrets required for various Skia Perf services to operate. These secrets
primarily involve Google Cloud service accounts and OAuth credentials for email
sending. The scripts in this module automate the setup of these credentials,
ensuring that services have the necessary permissions to interact with Google
Cloud APIs and other resources.
The design philosophy emphasizes secure and automated credential management.
Instead of manual creation and configuration of secrets, these scripts provide a
repeatable and version-controlled way to provision them. This reduces the risk
of human error and ensures that services are configured with the principle of
least privilege. For instance, service accounts are granted only the specific
roles they need to perform their tasks.
### Key Components and Scripts:
**1. Service Account Creation Scripts:**
- **`create-flutter-perf-service-account.sh`**: This script provisions a
Google Cloud service account specifically for the Flutter Perf instance. It
leverages a common script (`../../kube/secrets/add-service-account.sh`) to
handle the underlying `gcloud` commands.
- **Why**: Flutter Perf needs its own identity to interact with Google
Cloud services like Pub/Sub (for message queuing) and Cloud Trace (for
application performance monitoring). Separating this into its own
service account adheres to the principle of least privilege and allows
for more granular permission management.
- **How**: It calls the `add-service-account.sh` script, passing in
parameters like the project ID, the desired service account name
("flutter-perf-service-account"), a descriptive display name, and the
necessary IAM roles (`roles/pubsub.editor`, `roles/cloudtrace.agent`).
- **`create-perf-cockroachdb-backup-service-account.sh`**: This script creates
a dedicated service account for the Perf CockroachDB backup cronjob.
- **Why**: The backup process requires permissions to write data to Google
Cloud Storage. A dedicated service account ensures that only the backup
job has these specific permissions, enhancing security. If the backup
job's credentials were compromised, the blast radius would be limited to
storage object administration.
- **How**: Similar to the Flutter Perf service account, it utilizes
`../../kube/secrets/add-service-account.sh`. It specifies the service
account name ("perf-cockroachdb-backup") and the
`roles/storage.objectAdmin` role, which grants permissions to manage
objects in Cloud Storage buckets.
- **`create-perf-ingest-sa.sh`**: This script is responsible for creating the
`perf-ingest` service account. This account is used by the Perf ingestion
service, which processes and stores performance data.
- **Why**: The ingestion service needs to publish messages to Pub/Sub
topics, send trace data to Cloud Trace, and read data from specific
Google Cloud Storage buckets (`gs://skia-perf`,
`gs://cluster-telemetry-perf`). A dedicated service account with these
precise permissions is crucial for security and operational clarity. It
also leverages Workload Identity, a more secure way for Kubernetes
workloads to access Google Cloud services.
- **How**:
* It sources configuration (`../kube/config.sh`) and utility functions
(`../bash/ramdisk.sh`) for environment setup.
* Creates the service account (`perf-ingest`) using `gcloud iam
service-accounts create`.
* Assigns necessary IAM roles:
- `roles/pubsub.editor`: To publish messages to Pub/Sub.
- `roles/cloudtrace.agent`: To send trace data.
* Configures Workload Identity by binding the Kubernetes service account
(`default/perf-ingest` in the `skia-public` namespace) to the Google
Cloud service account. This allows pods running as `perf-ingest` in
Kubernetes to impersonate the `perf-ingest` Google Cloud service account
without needing to mount service account key files directly. `Kubernetes
Pod (default/perf-ingest) ----> Impersonates ----> Google Cloud SA
(perf-ingest@skia-public.iam.gserviceaccount.com) | +----> Accesses GCP
Resources (Pub/Sub, Cloud Trace, GCS)`
* Grants `objectViewer` permissions on specific GCS buckets using `gsutil
iam ch`.
* Creates a JSON key file for the service account (`perf-ingest.json`).
* Creates a Kubernetes secret named `perf-ingest` from this key file using
`kubectl create secret generic`. This secret can then be used by
deployments that might not be able to use Workload Identity directly or
for other specific use cases.
* Operations are performed in a temporary ramdisk (`/tmp/ramdisk`) to
avoid leaving sensitive key files on persistent storage.
- **`create-perf-sa.sh`**: This script creates the primary `skia-perf` service
account. This is a general-purpose service account for the main Perf
application.
- **Why**: The main Perf application requires permissions for Pub/Sub,
Cloud Trace, and reading from the `gs://skia-perf` bucket. Similar to
`perf-ingest`, this service account uses Workload Identity for enhanced
security when running within Kubernetes.
- **How**: The process is very similar to `create-perf-ingest-sa.sh`:
* Sources configuration and sets up a ramdisk.
* Creates the `skia-perf` service account.
* Assigns `roles/cloudtrace.agent` and `roles/pubsub.editor`.
* Configures Workload Identity, binding the Kubernetes service account
(`default/skia-perf`) to the `skia-perf` Google Cloud service account.
* Grants `objectViewer` on the `gs://skia-perf` GCS bucket.
* Creates a JSON key and stores it as a Kubernetes secret named
`skia-perf`.
**2. Email Secrets Creation:**
- **`create-email-secrets.sh`**: This script facilitates the creation of
Kubernetes secrets necessary for Perf to send emails via Gmail. This
typically involves an OAuth 2.0 flow.
- **Why**: Perf needs to send email notifications (e.g., for alerts).
Using Gmail programmatically requires proper authentication, which is
achieved through OAuth 2.0. Storing these credentials as Kubernetes
secrets makes them securely available to the Perf application pods.
- **How**: This script guides the user through a semi-automated process:
* It takes the email address to be authenticated as an argument (e.g.,
`alertserver@skia.org`).
* It converts the email address into a Kubernetes-friendly secret name
format (e.g., `alertserver-skia-org`).
* It prompts the user to download the `client_secret.json` file (obtained
from the Google Cloud Console after enabling the Gmail API and creating
OAuth 2.0 client credentials) to `/tmp/ramdisk`.
* It then instructs the user to run the `three_legged_flow` Go program
(which must be built and installed separately from
`../go/email/three_legged_flow`). This program initiates the OAuth 2.0
three-legged authentication flow. `User Action: Run three_legged_flow
--> Browser opens for Google Auth --> User authenticates as specified
email | v three_legged_flow generates client_token.json`
* Once `client_token.json` (containing the authorization token and refresh
token) is generated in `/tmp/ramdisk`, the script uses `kubectl create
secret generic` to create a Kubernetes secret named
`perf-${EMAIL}-secrets`. This secret contains both `client_secret.json`
and `client_token.json`.
* Crucially, it then removes the `client_token.json` file from the local
filesystem because it contains a sensitive refresh token. The source of
truth for this token becomes the Kubernetes secret.
* The use of `/tmp/ramdisk` ensures that sensitive downloaded and
generated files are stored in memory and are less likely to be
inadvertently persisted.
The common pattern across these scripts is the use of `gcloud` for Google Cloud
resource management and `kubectl` for interacting with Kubernetes to store the
secrets. The use of a ramdisk for temporary storage of sensitive files like
service account keys and OAuth tokens is a security best practice. Workload
Identity is preferred for service accounts running in GKE, reducing the need to
manage and distribute service account key files.