Project Objectives: Skia Perf is a performance monitoring system designed to ingest, store, analyze, and visualize performance data for various projects, with a primary focus on Skia and related systems (e.g., Flutter, Android, Chrome). Its core objectives are:
Functionality: Perf consists of several key components that work together:
perfserver
(to run the different services) and perf-tool
(for administrative tasks, data inspection, and database backups/restores).draw_a_circle
on arch=x86,config=8888
). Trace IDs are structured key-value strings like ,arch=x86,config=8888,test=draw_a_circle,units=ms,
.tile_size
(number of commits per tile) is configurable and affects how data is sharded and queried.FORMAT.md
) that Perf expects for input data files.Perf follows a services-oriented architecture, where the main perfserver
executable can run in different modes (frontend, ingest, cluster, maintenance). Data flows from external benchmark systems into Perf, where it's processed, stored, analyzed, and finally presented to users.
Data Flow and Main Components:
External Benchmark Systems | V [Data Files (JSON) in Perf Ingestion Format] | (Uploaded to Google Cloud Storage - GCS) V GCS Bucket (e.g., gs://skia-perf/nano-json-v1/) | (Pub/Sub event on new file arrival) V Perf Ingest Service(s) (`perfserver ingest` mode) | - Parses JSON files (see /go/ingest/parser) | - Validates data (see /go/ingest/format) | - Associates data with Git commits (see /go/git) | - Writes trace data to TraceStore (SQL, tiled) (see /go/tracestore) | - Updates ParamSets (for UI query builders) | - (Optionally) Emits Pub/Sub events for "Event Driven Alerting" V SQL Database (CockroachDB / Spanner) | - Trace Data (values, parameters, indexed by commit/tile) | - Commit Information (hashes, timestamps, messages) | - Alert Configurations | - Regression Records (details of detected regressions, triage status) | - Shortcuts, User Favorites, etc. | +<--> Perf Cluster Service(s) (`perfserver cluster` or `perfserver frontend --do_clustering` mode) | - Loads Alert configurations | - Queries TraceStore for relevant data | - Performs clustering (k-means) (see /go/clustering2, /go/ctrace2) | - Fits step functions to cluster centroids (see /go/stepfit) | - Calculates Regression statistic | - Stores "Interesting" clusters/regressions in the database | - Sends notifications (email, issue tracker) (see /go/notify) | +<--> Perf Frontend Service (`perfserver frontend` mode) | - Serves HTML, CSS, JS (see /pages, /modules) | - Handles API requests from the UI (see /go/frontend, /API.md) | - Queries database for trace data, alert configs, regressions | - Formats data for UI display (often as DataFrames) | - Manages user authentication (via X-WEBAUTH-USER header) | +<--> Perf Maintenance Service (`perfserver maintenance` mode) - Git repository synchronization - Database schema migrations (see /migrations) - Old data cleanup - Cache refreshing (e.g., ParamSet cache)
Rationale for Key Architectural Choices:
database/sql
package is used, with schema defined and managed by /go/sql
and migration scripts in /migrations
.TraceStore
(/go/tracestore) implementation uses SQL tables but structures them to represent tiles of commits. ParamSets
and Postings
tables act as inverted indexes for fast lookup of traces matching specific key-value parameters.perfserver
) with Modes:perfserver
uses command-line flags and subcommands to determine its operational mode. Configuration files (/configs/*.json
) further dictate behavior within each mode./go/clustering2
and /go/kmeans
. ctrace2
handles trace normalization.This section focuses on significant modules beyond simple file/directory descriptions.
/go/config
:
InstanceConfig
). This is the central place where all settings for a Perf deployment (database, ingestion sources, Git repo, UI features, notification settings) are specified.InstanceConfig
is a Go struct with fields for various aspects of the system. JSON files in /configs
are unmarshaled into this struct. The module provides functions to load and validate these configurations./go/ingest
:
format.Format
specification, extracting performance metrics and metadata, associating them with Git commits, and writing the data to the TraceStore
.ingest/format
: Defines the expected structure of input JSON files (format.Format
Go struct) and provides validation. This ensures data consistency.ingest/parser
: Contains logic to parse the format.Format
structure and extract individual trace measurements and their associated parameters.ingest/process
: Coordinates the steps: reading from a source (e.g., GCS via /go/file
), parsing, resolving commit information (via /go/git
), and writing to the TraceStore
.Source
(e.g., GCSSource
via PubSub) indicates a new file.process
reads the file.parser
and format
validate and extract Result
s.Result
, its git_hash
is resolved to a CommitNumber
using /go/git
./go/tracestore
./go/tracestore
:
TraceStore
is designed to efficiently retrieve trace values for specific parameter combinations over ranges of commits.TraceValues
table: Stores the actual metric values, often sharded by tile.ParamSets
table: Stores unique key-value pairs found in trace identifiers within each tile.Postings
table: An inverted index mapping (tile, param_key, param_value) to a list of trace IDs that contain that key-value pair within that tile. This structure allows queries like “get all traces where config=8888
and arch=x86
” to be resolved efficiently by intersecting posting lists. SQLTraceStore
is the primary implementation using the SQL database./go/git
:
git_hash
values in ingested data and Perf's internal CommitNumber
sequence.git
CLI) or a Gitiles service API. It maintains a Commits
table in the SQL database, mapping commit hashes to CommitNumber
s and storing other metadata. It periodically updates its local Git repository clone or queries Gitiles for new commits./go/regression
:
/go/clustering2
) and step-fit analysis (from /go/stepfit
) to identify “Interesting” clusters.Store
interface (implemented by sqlregression2store
): Persists information about detected regressions, including the cluster summary, owning alert, commit hash, regression statistic, and triage status (New
, Ignore
, Bug
).DESIGN.md
(comparing new interesting clusters with existing ones based on trace fingerprints) is implemented here to manage the lifecycle of a regression.Run clustering (e.g., hourly or event-driven) | V Identify "Interesting" new clusters (high |Regression| score) | V For each new Interesting Cluster: Compare fingerprint (top N traces) with existing relevant Clusters in DB | +-- No match? --> New Regression: Store in DB with status "New". | +-- Match found? --> Update existing Regression if new one has better |Regression| score. Keep triage status of existing.
/go/frontend
:
net/http
package to define HTTP handlers for various API endpoints (e.g., fetching data for plots, listing alerts, updating triage statuses). It authenticates users based on the X-WEBAUTH-USER
header. It often fetches data, converts it into DataFrame
structures, and then serializes these to JSON for the frontend./modules
(Frontend TypeScript):
plot-simple-sk
, alert-config-sk
, query-sk
). These elements handle rendering, user interaction, and making API calls to the Go backend.perf-scaffold-sk
: Provides the main page layout (header, sidebar, content area).explore-simple-sk
/ explore-sk
: Core components for querying data and displaying plots.json/index.ts
: Contains TypeScript interfaces mirroring Go backend structs for type-safe API communication. This is crucial for ensuring frontend and backend data structures are compatible. It's often generated from Go source using /go/ts/ts.go
./pages
:
perf-scaffold-sk
and the main page-specific custom element.alerts.html
) includes the perf-scaffold-sk
and the relevant page element (e.g., <alerts-page-sk>
). An associated TypeScript file (e.g., alerts.ts
) imports the necessary custom element definitions. Server-side Go templates inject initial context data (window.perf = {%.context %};
) into the HTML.DESIGN.md
:
clusters
table (though the actual schema is in /go/sql
and may have evolved to Regressions
table).FORMAT.md
:
git_hash
, key
(for global parameters), and results
(an array of measurements). Each result can have its own key
(for test-specific parameters like test
name and units
) and either a single measurement
or a more complex measurements
object for statistics (min, max, median). This document is crucial for data producers who need to integrate with Perf.BUILD.bazel
(Root):
perfserver
, backendserver
) that package the Go executables and necessary static resources (configs, frontend assets).skia_app_container
rules to assemble Docker images. It copies the perfserver
and perf-tool
binaries, configuration files from /configs
, and compiled frontend assets (HTML, JS, CSS from /pages
built output) into the image. The entrypoint
for the perfserver
image is the perfserver
executable itself.A. New Alert Creation via UI and API:
User (in Perf UI, e.g., on /alerts page) | | Fills out Alert configuration form (<alert-config-sk> element) | Clicks "Save" | V Frontend JS (<alert-config-sk>) | | 1. If new alert, GET /_/alert/new | (Server responds with a pre-populated Alert JSON with id: -1) | | 2. Modifies this Alert JSON based on form input | | 3. POST modified Alert JSON to /_/alert/update | (Authorization: Bearer token if auth is enabled) | V Perf Backend (`/go/frontend/service.go` - UpdateAlertHandler) | | Receives Alert JSON | If alert.ID == -1, it's a new alert. | Validates Alert configuration | Persists Alert to SQL Database (via `alerts.Store`) | Responds 200 OK | V SQL Database (Alerts Table) | | New Alert record is created or existing one updated.
Rationale:
GET /_/alert/new
step is a convenience. It provides the frontend with a valid Alert
structure, including any instance-default values, simplifying new alert creation logic on the client.id: -1
to signify a new alert during the POST
to /_/alert/update
is a common pattern to allow a single endpoint to handle both creation and updates. The backend inspects the ID to determine the correct action.API.md
.B. Data Ingestion and Event-Driven Regression Detection:
Benchmark System | | Produces performance_data.json (Perf Ingestion Format) | Uploads to GCS: gs://[bucket]/[path]/YYYY/MM/DD/HH/performance_data.json | V Google Cloud Storage | | File "OBJECT_FINALIZE" event | Publishes message to PubSub Topic (e.g., "perf-ingestion-topic") | V Perf Ingest Service(s) (Subscribed to "perf-ingestion-topic") | | 1. Receives PubSub message (contains GCS file path) | 2. Downloads performance_data.json from GCS | 3. Parses JSON, validates data (see /go/ingest/format, /go/ingest/parser) | 4. Looks up git_hash in /go/git to get CommitNumber | 5. Writes trace data to TraceStore (SQL tables) | 6. If Event Driven Alerting enabled for this instance: | Constructs a list of Trace IDs updated by this file | Publishes message (containing gzipped Trace IDs) to another PubSub Topic (e.g., "trace-update-topic") | V Perf Cluster Service(s) (Subscribed to "trace-update-topic") | | 1. Receives PubSub message (with updated Trace IDs) | 2. For each Alert Configuration (/go/alerts): | If Alert's query matches any of the updated Trace IDs: | Run clustering & regression detection for THIS Alert, | focusing on the commit range and data relevant to the updated traces. | (Reduces scope compared to full continuous clustering) | 3. If regressions found: | Store in SQL Database (Regressions table) | Send notifications (email, issue tracker)
Rationale:
FORMAT.md
and DESIGN.md
, GCS is the standard way data enters Perf. The YYYY/MM/DD/HH path structure is a convention.DESIGN.md
explicitly states this is for large/sparse datasets. Sending only updated Trace IDs significantly narrows the scope of clustering for each event, making it faster and less resource-intensive than re-clustering everything. PubSub's 10MB message limit is considered for gzipped trace ID lists.This documentation provides a comprehensive starting point for a software engineer to understand the Skia Perf project. It covers its purpose, architecture, core concepts, and the rationale behind key design and implementation choices, referencing existing documentation and source code structure where appropriate.
The /cockroachdb
module provides a set of shell scripts designed to facilitate interaction with a CockroachDB instance, specifically one named perf-cockroachdb
, which is presumed to be running within a Kubernetes cluster. These scripts abstract away some of the complexities of kubectl
commands, offering streamlined access for common database operations.
The primary motivation behind these scripts is to simplify development and administrative workflows. Instead of requiring users to remember and type lengthy kubectl
commands with specific flags and resource names, these scripts provide convenient, single-command access points.
Key Components and Responsibilities:
admin.sh
: This script focuses on providing access to the CockroachDB administrative web interface.
kubectl port-forward
can be cumbersome to set up repeatedly.kubectl port-forward
to map the local port 8080
to the port 8080
of the perf-cockroachdb-0
pod. Crucially, it then immediately attempts to open this local address in Google Chrome, providing an instant user experience. This assumes Google Chrome is installed and available in the system's PATH. User runs admin.sh | V Script executes: kubectl port-forward perf-cockroachdb-0 8080 | V Local port 8080 now forwards to CockroachDB pod's port 8080 | V Script executes: google-chrome http://localhost:8080 | V CockroachDB Admin UI opens in Chrome
connect.sh
: This script is designed to provide a SQL shell connection to the CockroachDB instance.
kubectl run
command with the correct image and arguments can be error-prone.kubectl run
to create a temporary, interactive pod named androidx-cockroachdb
. This pod uses the cockroachdb/cockroach:v19.2.5
Docker image. The --rm
flag ensures the pod is deleted after the session ends, and --restart=Never
prevents it from being restarted. The crucial part is the command passed to the pod: sql --insecure --host=perf-cockroachdb-public
. This starts the CockroachDB SQL client, connecting insecurely to the database service exposed at perf-cockroachdb-public
. User runs connect.sh | V Script executes: kubectl run androidx-cockroachdb -it --image=... --rm --restart=Never -- sql --insecure --host=perf-cockroachdb-public | V Temporary pod 'androidx-cockroachdb' is created | V CockroachDB SQL client starts inside the pod, connecting to 'perf-cockroachdb-public' | V User has an interactive SQL shell | V User exits shell -> Pod 'androidx-cockroachdb' is deleted
skia-infra-public-port-forward.sh
: This script sets up a port forward for direct database connections, typically for use with a local CockroachDB SQL client or other database tools.
connect.sh
provides an in-cluster SQL shell, sometimes a direct connection from the local machine is preferred, for instance, to use graphical SQL clients or specific client libraries that are not available within the temporary pod created by connect.sh
. The perf-cockroachdb
instance is likely within a private network in the Kubernetes cluster (namespace perf
), and this script makes it accessible locally.../../kube/attach.sh skia-infra-public
(the details of which are outside this module‘s scope but presumably handles Kubernetes context or authentication for the skia-infra-public
cluster). This helper script is then used to execute kubectl port-forward
specifically for the perf-cockroachdb-0
pod within the perf
namespace. It maps local port 25000
to the pod’s CockroachDB port 26257
. The script also helpfully prints instructions for the user on how to connect using the cockroach sql
command once the port forward is active. The set -e
command ensures the script exits immediately if any command fails, and set -x
enables command tracing for debugging. User runs skia-infra-public-port-forward.sh | V Script prints connection instructions | V Script executes: ../../kube/attach.sh skia-infra-public kubectl port-forward -n perf perf-cockroachdb-0 25000:26257 | V Port forward is established: local:25000 -> perf-cockroachdb-0:26257 (in 'perf' namespace) | V User can now run 'cockroach sql --insecure --host=127.0.0.1:25000' in another terminal
These scripts collectively aim to make interacting with the perf-cockroachdb
instance as straightforward as possible by encapsulating the necessary kubectl
commands and providing context-specific instructions or actions. They rely on the Kubernetes cluster being correctly configured and accessible, and on kubectl
and potentially google-chrome
being available on the user's system.
The /configs
directory houses JSON configuration files for various instances of the Perf performance monitoring system. Each file defines the specific behavior and data sources for a particular Perf deployment. These configurations are crucial for tailoring Perf to different projects and environments, enabling developers and performance engineers to monitor and analyze performance data effectively.
The core idea is to provide a declarative way to set up a Perf instance. Instead of hardcoding settings, these JSON files act as blueprints. Each file serializes to and from a Go struct named config.InstanceConfig
. This struct serves as the canonical schema for all instance configurations, and its Go documentation provides detailed explanations of each field. This approach ensures consistency and makes it easier to manage and evolve the configuration options.
Key Components and Responsibilities:
The primary responsibility of this module is to define and store these instance configurations. Each JSON file represents a distinct Perf instance, often corresponding to a specific project or a particular version of a project (e.g., a public vs. internal build, or a stable vs. experimental branch).
Instance-Specific Configuration Files (e.g., android2.json
, chrome-public.json
):
config.InstanceConfig
Go struct.URL
: The public-facing URL of the Perf instance.data_store_config
: Defines the backend database (e.g., CockroachDB, Spanner), connection strings, and parameters like tile_size
which can impact query performance and data retrieval efficiency. The choice between CockroachDB and Spanner often depends on scalability needs and existing infrastructure.ingestion_config
: Specifies how performance data is brought into Perf. This includes the source_type
(e.g., gcs
for Google Cloud Storage, dir
for local directories), the specific sources
(e.g., GCS bucket paths or local file paths), and Pub/Sub topics for real-time ingestion. This section is vital for connecting Perf to the data producers.git_repo_config
: Links Perf to the source code repository. This allows Perf to correlate performance data with specific code changes (commits). It includes the repository url
, the provider
(e.g., gitiles
, git
), and sometimes a commit_number_regex
to extract meaningful commit identifiers from commit messages.notify_config
: Configures how alerts and notifications are sent when regressions are detected. This can range from none
to html_email
, markdown_issuetracker
, or anomalygroup
. It often includes templates for notification subjects and bodies, leveraging placeholders like {{ .Alert.DisplayName }}
to include dynamic information.auth_config
: Defines the authentication mechanism, commonly using a header like X-WEBAUTH-USER
for integration with existing authentication systems.query_config
: Customizes how users can query and view data, including which parameters are available for filtering (include_params
), default selections, and URL value defaults to tailor the user experience. It can also include caching configurations (e.g., using Redis) to improve query performance by specifying cache_config
with level1_cache_key
and level2_cache_key
.anomaly_config
: Contains settings related to anomaly detection, such as settling_time
which defines how long Perf waits before considering new data for anomaly detection, helping to avoid flagging transient issues.contact
, ga_measurement_id
(for Google Analytics), feedback_url
, trace_sample_proportion
(to control the volume of detailed trace data collected), and favorites
(for pre-defined links on the Perf UI) further customize the instance.android2.json
):ingestion_config.source_config.sources
(e.g., gs://android-perf-2/android2
).perf-ingestion-android2-production
.git_repo_config
(e.g., https://android.googlesource.com/platform/superproject
), and stores it in the CockroachDB instance defined in data_store_config
.anomaly_config
.notify_config
. For android2.json
, this means an issue is filed in an issue tracker ("notifications": "markdown_issuetracker"
) with a subject and body formatted using the provided templates, including details like affected tests and devices.local.json
:
ingestion_config
to a local directory (integration/data
) that contains sample data. This data is often the same data used for unit tests, ensuring consistency between testing environments. The database connection will also point to a local instance.demo.json
and demo_spanner.json
:
local.json
, demo.json
uses a local directory for data ingestion ("./demo/data/"
) and a local CockroachDB instance. demo_spanner.json
is analogous but configured to use Spanner as the backend, demonstrating flexibility in data store choices. They often include simpler git_repo_config
pointing to public demo repositories (e.g., https://github.com/skia-dev/perf-demo-repo.git
). The favorites
section in demo.json
shows how to add curated links to the Perf UI./spanner
subdirectory:
spanner/chrome-public.json
, spanner/skia-public.json
) will have their data_store_config.datastore_type
set to "spanner"
. They often include Spanner-specific settings or optimizations. For example, enable_follower_reads
might be set to true
in data_store_config
for Spanner instances to distribute read load. Many of these configurations also define redis_config
within their query_config.cache_config
to further enhance query performance for frequently accessed data.optimize_sqltracestore
flag, often set to true
in Spanner configurations, indicates that specific optimizations for the SQL-based trace store are enabled, likely tailored to Spanner's characteristics.chrome-internal.json
and chrome-public.json
demonstrate sophisticated setups, including:commit_number_regex
in git_repo_config
to extract structured commit positions.temporal_config
for integrating with Temporal workflows for tasks like regression grouping and bisection.enable_sheriff_config
to integrate with sheriffing systems for managing alerts.trace_format: "chrome"
indicates that the performance data adheres to the Chrome trace event format.The choice of fields and their values within each JSON file reflects a series of design decisions aimed at balancing flexibility, performance, and operational manageability for each specific Perf instance. For instance, the tile_size
in data_store_config
is adjusted based on expected data characteristics and query patterns. Similarly, trace_sample_proportion
is set to manage storage costs and processing load while still capturing enough data for meaningful analysis. The notify_config
templates are crafted to provide actionable information to developers when regressions occur.
The csv2days
module is a command-line utility designed to process CSV files downloaded from the Perf performance monitoring system. Its primary purpose is to simplify time-series data by consolidating multiple data points from the same calendar day into a single representative value. This is particularly useful when analyzing performance trends over longer periods, where daily granularity is sufficient and finer-grained timestamps can introduce noise or unnecessary complexity.
The core problem this module solves is the overabundance of data points when Perf exports data at a high temporal resolution (e.g., multiple commits per day). For certain types of analysis, this level of detail is not required and can make it harder to discern broader trends. csv2days
transforms such CSVs by keeping only the first encountered data column for each unique day and aggregating subsequent values from the same day into that single column using a “max” aggregation strategy.
The module operates as a streaming processor. It reads the input CSV file row by row, processes the header to determine which columns to modify or drop, and then transforms each subsequent data row accordingly before writing it to standard output.
Key Design Choices:
--in
flag and outputs the transformed CSV to stdout
. This follows common Unix philosophies for tool interoperability.csv2days
processes the file line by line. This makes the tool memory-efficient.datetime
) to match RFC3339 formatted dates in the header row. The date part (YYYY-MM-DD) of these timestamps is used for grouping.csv2days
tool currently implements a “max” aggregation strategy: for the set of values corresponding to a single day, the maximum numerical value is chosen. If non-numerical values are encountered, the first value in the sequence is typically used.skipCols
) are sorted in reverse order. This is crucial because removing an element from a slice shifts the indices of subsequent elements. Processing removals from right-to-left (largest index to smallest) ensures that the indices remain valid throughout the removal process.Workflow:
The main workflow within transformCSV
can be visualized as follows:
Read Input CSV File (--in flag) | v Parse Header Row | +----------------------------------------------------------------------+ | Identify Timestamp Columns (using RFC3339 regex) | | For each timestamp: | | Extract Date (YYYY-MM-DD) | | If new date: | | Add Date to Output Header | | Record current column as start of a new "run" for this day | | Else (same date as previous timestamp): | | Mark current column for skipping (`skipCols`) | | Increment length of current day's "run" (`runLengths`) | | | | Non-timestamp columns are added to Output Header as-is | +----------------------------------------------------------------------+ | v Write Transformed Header to Output | v Sort `skipCols` in Reverse Order | v For each Data Row in Input CSV: | +----------------------------------------------------------------------+ | Apply "Max" Aggregation: | | For each "run" of columns belonging to the same day (from header): | | Find the maximum numerical value in the corresponding cells | | Replace the first cell of the run with this max value | +----------------------------------------------------------------------+ | v Remove Skipped Columns (based on `skipCols` from header processing) | v Write Transformed Data Row to Output | v Flush Output Buffer
main.go
: This is the heart of the module.main()
function: Handles command-line flag parsing (--in
for the input CSV file). It orchestrates the reading of the input file and calls transformCSV
to perform the core logic. Error handling and logging are also managed here.transformCSV(input io.Reader, output io.Writer) error
: This is the core function responsible for the CSV transformation.csv.Reader
for input and csv.Writer
for output.datetime = regexp.MustCompile(...)
) is used to identify columns containing RFC3339 timestamps.lastDate
to detect when a new day starts in the header sequence.skipCols
(a slice of integers) stores the indices of columns that represent subsequent entries for an already seen day and should thus be removed from the data rows.runLengths
(a map of int
to int
) stores, for each column that starts a sequence of same-day entries, how many columns belong to that day. This is used later for aggregation. For example, if columns 5, 6, and 7 are all for “2023-01-15”, runLengths[5]
would be 3
.outHeader
) is constructed by keeping the date part (YYYY-MM-DD) for the first occurrence of each day and omitting subsequent columns for the same day. Non-date columns are passed through unchanged.applyMaxToRuns(s []string, runLengths map[int]int) []string
: For each “run” of columns identified in the header as belonging to the same day, this function takes the corresponding values from the current data row and replaces the value in the first column of that run with the maximum of those values. The max(s []string) string
helper function is used here to find the maximum float value, falling back to the first string if parsing fails.removeAllIndexesFromSlices(s []string, skipCols []int) []string
: After aggregation, this function removes the data cells corresponding to the skipCols
identified during header processing. It uses removeValueFromSliceAtIndex
repeatedly. It's crucial that skipCols
is sorted in reverse order for this to work correctly.removeValueFromSliceAtIndex(s []string, index int) []string
: A utility to remove an element at a specific index from a string slice.max(s []string) string
: Iterates through a slice of strings, attempts to parse them as floats, and returns the string representation of the maximum float found. If no floats are found or parsing errors occur, it defaults to returning the first string in the input slice. This function underpins the aggregation logic.main_test.go
: Contains unit tests for the transformCSV
function.TestTransformCSV_HappyPath
: Provides a simple input CSV string and the expected output string. It then calls transformCSV
with these and asserts that the actual output matches the expected output. This serves as a concrete example of the module's behavior.BUILD.bazel
: Defines how the csv2days
Go binary and its associated library and tests are built using Bazel. It specifies source files, dependencies (like skerr
, sklog
, util
), and visibility.The design decision to use strconv.ParseFloat
and handle potential errors by continuing or defaulting implies that the tool is somewhat lenient with non-numeric data in columns expected to be numeric. The “max” operation will effectively ignore non-convertible strings unless all strings in a run are non-convertible, in which case the first string is chosen.
The demo
module provides the necessary data and tools to showcase the capabilities of the Perf performance monitoring system. Its primary purpose is to offer a tangible and reproducible example of how Perf ingests and processes performance data. This allows users and developers to understand Perf's functionality without needing to set up a complex real-world data pipeline.
The core of this module revolves around a set of pre-generated data files and a Go program to create them.
Key Components:
/demo/data/
(Directory): This directory houses the actual demo data files in JSON format. Each file represents performance measurements associated with a specific commit hash.
format.Format
specification (defined in perf/go/ingest/format
), which Perf understands. This allows for a simple and direct way to feed data into Perf for demonstration purposes.demo_data_commit_1.json
) contains a git_hash
, key
(identifying the test environment like architecture and configuration), and results
. The results
section includes measurements for various tests (like “encode” and “decode”) across different units (like “ms” and “kb”). Some files also include links
which can point to external resources relevant to the data point or the overall commit. The data in these files is designed to show some variation over commits to demonstrate Perf's ability to track changes and detect regressions/improvements. For instance, the decode
measurement and encodeMemory
show a deliberate shift in values starting from demo_data_commit_6.json
.generate_data.go
: This Go program is responsible for creating the JSON data files located in the /demo/data/
directory.
format.Format
evolves. It ensures the demo data remains relevant and can be adapted.skia-dev/perf-demo-repo
repository, establishing a direct link between the performance data and a version control history, a common scenario in real-world Perf usage.encode
, decode
, encodeMemory
). The generation includes some randomness (rand.Float32()
) to make the data appear more realistic. _ A deliberate change in the data generation logic is introduced for commits at index 5 and onwards (e.g., multiplier = 1.2
), which leads to a noticeable shift in decode
and encodeMemory
values in the corresponding JSON files. This is done to demonstrate how Perf can track and visualize such changes. _ It populates a format.Format
struct (from go.skia.org/infra/perf/go/ingest/format
) with the generated data, including the Git hash, environment keys, and the measurement results. _ The format.Format
struct is then marshaled into JSON with indentation for readability. * Finally, the JSON data is written to a file named according to the commit sequence (e.g., demo_data_commit_1.json
) within the data
subdirectory. The program uses the runtime.Caller(0)
function to determine its own location, ensuring that the data
directory is created relative to the Go file itself, making the script more portable.Workflow for Demo Data Usage:
generate_data.go --(generates)--> /demo/data/*.json files | V Perf Ingester (type 'dir', configured to read from /demo/data/) | V Perf System (stores, analyzes, and visualizes the data)
The demo data is specifically designed to be used in conjunction with the perf/configs/demo.json
configuration file and the https://github.com/skia-dev/perf-demo-repo.git
repository. This linkage provides a complete, albeit simplified, end-to-end scenario for demonstrating Perf.
This main module, located at /go
, serves as the root for all Go language components of the Perf performance monitoring system. It encompasses a wide array of functionalities, from data ingestion and storage to analysis, alerting, and user interface backend logic. The design promotes modularity, with specific responsibilities delegated to sub-modules.
The system is designed to handle large volumes of performance data, track it against code revisions, detect regressions automatically, and provide tools for developers and performance engineers to investigate and manage performance.
/go/alerts
, /go/ingest
, /go/regression
, /go/frontend
), each with a well-defined responsibility. This promotes separation of concerns, making the system easier to develop, test, and maintain.tracestore.Store
, alerts.Store
, regression.Store
). This allows for different implementations to be swapped in (e.g., SQL-based stores vs. in-memory mocks for testing) and promotes loose coupling./go/config
module defines a comprehensive InstanceConfig
structure, which is loaded from a JSON file. This configuration dictates many aspects of an instance's behavior, including database connections, data sources, alert settings, and UI features. This allows for flexible deployment and customization of Perf instances./go/progress
module provides a mechanism for tracking and reporting the status of such tasks to the UI./go/workflows
module utilizes Temporal to orchestrate complex, multi-step processes like triggering bisections and processing their results. Temporal provides resilience and fault tolerance for these critical operations./go/alerts
), regression details (/go/regression
), commit information (/go/git
), user favorites (/go/favorites
), subscriptions (/go/subscription
), and more. The /go/sql
module manages the database schema./go/tracestore
): Performance trace data is stored in a tiled fashion, with inverted indexes to allow for efficient querying. This specialized storage approach is optimized for time-series performance metrics./go/file
and /go/filestore
modules provide abstractions for interacting with these files./go/git
, /go/progress
)./go/tracecache
for trace IDs./go/psrefresh
module manages caching of ParamSet
s (used for UI query builders), potentially using Redis (/go/redis
)./go/graphsshortcut
offers an in-memory cache for graph shortcuts, especially for development./go/git
module interacts with Git repositories (via local CLI or Gitiles API) to fetch commit information./go/issuetracker
and /go/culprit
integrate with issue tracking systems (e.g., Buganizer) for automated bug filing./go/chromeperf
module allows communication with the Chrome Performance Dashboard for reporting regressions or fetching anomaly data./go/pinpoint
module provides a client for the Pinpoint bisection service./go/sheriffconfig
module integrates with LUCI Config for managing alert configurations./go/perfserver
: The main executable for running different Perf services (frontend, ingestion, clustering, maintenance)./go/perf-tool
: A CLI for various administrative and data inspection tasks./go/initdemo
: A tool to initialize a database for demo or development./go/ts
: A utility to generate TypeScript definitions from Go structs for frontend type safety.Data Ingestion:
External Data Source (e.g., GCS event) | V /go/file (Source Interface: DirSource, GCSSource) --> Raw File Data | V /go/ingest/process (Orchestrator) | +--> /go/ingest/parser (Parses file based on /go/ingest/format) --> Extracted Traces & Metadata | +--> /go/git (Resolves Git hash to CommitNumber) | V /go/tracestore (Writes traces, updates inverted index & ParamSets) | V /go/ingestevents (Publishes event: "File Ingested")
Regression Detection (Event-Driven Example):
/go/ingestevents (Receives "File Ingested" event) | V /go/regression/continuous (Controller) | +--> /go/alerts (Loads matching Alert configurations) | +--> /go/dfiter & /go/dataframe & /go/dfbuilder (Prepare DataFrames for analysis) | V /go/regression/detector (Core detection logic) | +--> /go/clustering2 (KMeans clustering) | +--> /go/stepfit (Individual trace step detection) | V Detected Regressions | +--> /go/regression (Store results using Store interface, e.g., sqlregression2store) | +--> /go/notify (Format & send notifications via Email, IssueTracker, Chromeperf) | +--> /go/workflows (MaybeTriggerBisectionWorkflow for potential bisection)
User Interaction (Frontend Request for Graph):
User in Browser (Requests graph) | V /go/frontend (HTTP Handlers, e.g., graphApi) | +--> /go/ui/frame (ProcessFrameRequest) | | | +--> /go/dataframe/dfbuilder (Builds DataFrame based on query) | | | | | +--> /go/tracestore (Fetch trace data) | | +--> /go/git (Fetch commit data) | | | +--> /go/calc (If formulas are used) | | | +--> /go/pivot (If pivot table requested) | | | +--> /go/anomalies (Fetch anomaly data to overlay) | V FrameResponse (JSON data for UI) --> User in Browser
Automated Bisection via Temporal Workflow: /go/workflows.MaybeTriggerBisectionWorkflow (Triggered by significant regression) | +--> Waits for related anomalies to group | +--> /go/anomalygroup (Loads anomaly group details) | +--> If GroupAction == BISECT: | | | +--> /go/gerrit (Activity: Get commit hashes from positions) | | | +--> Executes Pinpoint.CulpritFinderWorkflow (Child Workflow) | | (Pinpoint performs bisection) | V | Pinpoint calls back to /go/workflows.ProcessCulpritWorkflow | | | +--> /go/culprit (Activity: Persist culprit & Notify user) | +--> If GroupAction == REPORT: | +--> /go/culprit (Activity: Notify user of anomaly group)
ALL
, OWNER
). Ensures consistent filter definitions.Alert
configurations, their storage (sqlalertstore
), and efficient retrieval (ConfigProvider
with caching). Defines how performance regressions are detected.InstanceConfig
structure (loaded from JSON) that governs a Perf instance.DataFrame
structure for handling performance data in a tabular, commit-centric way, inspired by R's dataframes.DataFrame
objects from TraceStore
, handling query logic and data aggregation.DataFrame
s, typically by slicing a larger fetched frame. Used in regression detection.File
and Source
interfaces for abstracting file access from different origins (local, GCS via Pub/Sub).fs.FS
for local and GCS file access, providing a unified way to read files.TraceStore
.DataFrame
based on specified grouping criteria (like pivot tables).paramtools.ParamSet
instances (used for UI query builders) to improve performance.TraceSet
and ReadOnlyParamSet
objects from multiple, potentially disparate chunks of trace data using a worker pool.CommitNumber
, TileNumber
, Trace
).This comprehensive suite of modules works together to provide the Skia Perf performance monitoring system.
This module, go/alertfilter
, provides constants that define different filtering modes for alerts. These constants are used throughout the Perf application to control which alerts are displayed or processed.
The primary motivation behind this module is to centralize the definition of alert filtering options. By having these constants in a dedicated module, we avoid scattering magic strings like “ALL” or “OWNER” throughout the codebase. This improves maintainability, reduces the risk of typos, and makes it easier to understand and modify the filtering logic. If new filtering modes are needed in the future, they can be added here, providing a single source of truth.
Key Components/Files:
alertfilter.go
: This is the sole file in this module. It defines the string constants used for alert filtering.ALL
: This constant represents a filter that includes all alerts, irrespective of their owner or other properties. It is used when a user or a system process needs to view or operate on the entire set of active alerts.OWNER
: This constant represents a filter that includes only alerts assigned to a specific owner. This is crucial for user-specific views where individuals only want to see alerts relevant to their responsibilities.Workflow/Usage Example:
Imagine a user interface for viewing alerts. The user might have a dropdown to select how they want to filter the alerts.
User Interface: [Alert List] Filter: [Dropdown: "ALL", "OWNER"] Backend Logic: func GetAlerts(filterMode string, userID string) []Alert { if filterMode == alertfilter.ALL { // Fetch all alerts from the database. return database.GetAllAlerts() } else if filterMode == alertfilter.OWNER { // Fetch alerts owned by the current user. return database.GetAlertsByOwner(userID) } // ... other filter modes or error handling }
In this scenario, the backend uses the constants from the alertfilter
module to determine the correct query to execute against the database. This ensures consistency and clarity in how filtering is applied.
The /go/alerts
module is responsible for managing alert configurations within the Perf application. These configurations define the conditions under which users or systems should be notified about performance regressions. The module handles the definition, storage, retrieval, and caching of these alert configurations.
A core design principle is the separation of concerns between defining an alert's structure (config.go
), providing access to these configurations (configprovider.go
), and persisting them (store.go
and its SQL implementation in sqlalertstore
). This modularity allows for flexibility in how alerts are stored (e.g., potentially different database backends) and accessed.
Key Components and Responsibilities:
config.go
: This file defines the Alert
struct, which is the central data structure representing a single alert configuration.
Alert
struct includes fields for:IDAsString
: A string representation of the alert's unique identifier. This is used for JSON serialization to avoid potential issues with large integer handling in JavaScript. The BadAlertID
and BadAlertIDAsAsString
constants represent an invalid/uninitialized ID.Query
: A URL-encoded string that defines the criteria for selecting traces from the performance data.GroupBy
: A comma-separated list of parameter keys. If specified, the Query
is expanded into multiple sub-queries, one for each unique combination of values for the GroupBy
keys found in the data. This allows for more granular alerting. The GroupCombinations
and QueriesFromParamset
methods handle this expansion.Alert
: The email address for notifications.IssueTrackerComponent
: The ID of the issue tracker component to file bugs against. A custom SerializesToString
type is used for this field to handle JSON serialization of the int64 component ID as a string, with 0
serializing to ""
.DirectionAsString
: Specifies whether to alert on upward (UP
), downward (DOWN
), or both (BOTH
) changes in performance. This replaces the deprecated StepUpOnly
boolean.StateAsString
: Indicates if the alert is ACTIVE
or DELETED
. This is managed internally and affects whether an alert is processed.Action
: Defines what action to take when an anomaly is detected (e.g., types.AlertActionReport
, types.AlertActionBisect
).Interesting
, Algo
, Step
, Radius
, K
, Sparse
, MinimumNum
, Category
control the specifics of regression detection and reporting.Direction
and ConfigState
and helper functions for ID conversion and validation (Validate
). The Validate
function ensures consistency, for example, that GroupBy
keys do not also appear in the main Query
.store.go
: This file defines the Store
interface, which abstracts the persistence mechanism for Alert
configurations.
Store
interface specifies methods for:Save
: Saving a new or updating an existing alert. It takes a SaveRequest
which includes the Alert
configuration and an optional SubKey
(linking the alert to a subscription).ReplaceAll
: Atomically replacing all existing alerts with a new set. This is useful for bulk updates, often tied to configuration subscriptions. It requires a pgx.Tx
to ensure transactional integrity.Delete
: Marking an alert as deleted.List
: Retrieving alerts, with an option to include deleted ones. Alerts are typically sorted by DisplayName
.ListForSubscription
: Retrieving all active alerts associated with a specific subscription name.configprovider.go
: This file implements a ConfigProvider
that serves Alert
configurations, incorporating a caching layer.
Store
for every request would be inefficient.configProviderImpl
implements the ConfigProvider
interface.cache_active
for active alerts and cache_all
for all alerts including deleted ones) using the configCache
struct.NewConfigProvider
), it performs an initial refresh and starts a background goroutine that periodically calls Refresh
to update the caches from the Store
.GetAllAlertConfigs
and GetAlertConfig
serve data from these caches.sync.RWMutex
is used to protect concurrent access to the caches.Refresh
method explicitly fetches data from the alertStore
and updates both caches.Submodule sqlalertstore
: This submodule provides a SQL-based implementation of the alerts.Store
interface.
sqlalertstore.go
:SQLAlertStore
struct holds a database connection pool (pool.Pool
) and a map of SQL statements.Alerts
table (schema defined in sqlalertstore/schema/schema.go
). This simplifies schema evolution of the Alert
struct itself, as changes to the struct don't always require immediate SQL schema migrations, though it makes querying based on specific alert fields harder directly in SQL.Save
: For new alerts (ID is BadAlertIDAsAsString
), it performs an INSERT
and retrieves the generated ID. For existing alerts, it performs an UPSERT
(or an INSERT ... ON CONFLICT DO UPDATE
for Spanner).Delete
: Marks an alert as deleted by setting its config_state
to 1
(representing alerts.DELETED
) and updates last_modified
. It doesn't physically remove the row.ReplaceAll
: Within a transaction, it first marks all existing active alerts as deleted, then inserts the new set of alerts.List
and ListForSubscription
: Query the Alerts
table, deserialize the JSON alert
column into alerts.Alert
structs, and sort them by DisplayName
.spanner.go
: Contains Spanner-specific SQL statements. This is necessary because CockroachDB and Spanner have slightly different SQL syntax for certain operations like UPSERTs and RETURNING clauses. The correct set of statements is chosen in sqlalertstore.New
based on the dbType
.sqlalertstore/schema/schema.go
: Defines the Go struct AlertSchema
representing the Alerts
table in the SQL database. Key fields include id
, alert
(TEXT, storing the JSON serialized alerts.Alert
), config_state
(INT), last_modified
(INT, Unix timestamp), sub_name
, and sub_revision
.Key Workflows:
Creating/Updating an Alert:
alerts.Alert
struct.alerts.Store.Save()
is called.sqlalertstore.Save()
serializes the Alert
to JSON.IDAsString
is BadAlertIDAsAsString
, an INSERT
statement is executed, and the new ID is populated back into the Alert
struct.UPSERT
or INSERT ... ON CONFLICT DO UPDATE
statement is executed.ConfigProvider
's cache will eventually be updated during its next refresh cycle.[Client/Service] -- Alert Data --> [alerts.Store.Save()] | v [sqlalertstore.Save()] -- Serializes Alert to JSON --> [Database] | (If new, DB returns ID) <--------------------------------------- | (Updates Alert struct with ID) v [ConfigProvider.Refresh() periodically] --> [alerts.Store.List()] | v [sqlalertstore.List()] --> [Database] | (Reads & deserializes) v [ConfigProvider Cache Update]
Retrieving All Active Alerts:
alerts.ConfigProvider.GetAllAlertConfigs(ctx, false)
.configProviderImpl.GetAllAlertConfigs()
checks its cache_active
.[]*Alert
.Refresh
call) would have populated it by:alerts.Store.List(ctx, false)
.sqlalertstore.List(ctx, false)
.sqlalertstore
queries the database for alerts where config_state = 0
(ACTIVE), deserializes them, and returns the list.[Service] -- Request All Active Alerts --> [ConfigProvider.GetAllAlertConfigs(includeDeleted=false)] | (Checks cache_active) | +-- [Cache Hit] ----> Returns cached []*Alert | +-- [Cache Miss/Stale (via periodic Refresh)] | v [alerts.Store.List(includeDeleted=false)] | v [sqlalertstore.List(includeDeleted=false)] -- SQL Query (WHERE config_state=0) --> [Database] | (Reads & deserializes) v [Updates & Returns from Cache]
Expanding GroupBy
Queries:
GroupBy
clause is processed (e.g., by the regression detection system), Alert.QueriesFromParamset(paramset)
is called.Alert.GroupCombinations(paramset)
is invoked to find all unique combinations of values for the keys specified in GroupBy
from the provided paramtools.ReadOnlyParamSet
.Alert.Query
and appending the key-value pairs from the combination.[Alert Processing System] -- Has Alert with GroupBy="config,arch", Query="metric=latency" & ParamSet --> [Alert.QueriesFromParamset()] | v [Alert.GroupCombinations()] | (e.g., finds {config:A, arch:X}, {config:B, arch:X}) v [Generates specific queries:] - "metric=latency&config=A&arch=X" - "metric=latency&config=B&arch=X" | <-- Returns []string (list of queries)
The use of SerializesToString
for IssueTrackerComponent
highlights a common challenge when interfacing Go backend systems with JavaScript frontends: JavaScript's limitations with handling large integer IDs. Serializing them as strings is a robust workaround.
The existence of a mock
subdirectory with generated mocks for Store
and ConfigProvider
(using stretchr/testify/mock
) is standard Go practice, facilitating unit testing of components that depend on these interfaces without needing a real database or complex setup.
The /go/anomalies
module is responsible for retrieving anomaly data. Anomalies represent significant deviations in performance metrics. This module acts as an intermediary between the application and the chromeperf
service, which is the source of truth for anomaly data. It provides an abstraction layer, potentially including caching, to optimize anomaly retrieval.
1. anomalies.go
:
Store
interface. This interface dictates the contract for any component that aims to provide anomaly data. It ensures that different implementations (e.g., a cached store or a direct passthrough store) can be used interchangeably.GetAnomalies
: Retrieves anomalies for a list of trace names within a specific commit position range. This is useful for analyzing performance regressions or improvements tied to code changes.GetAnomaliesInTimeRange
: Fetches anomalies within a given time window. This is helpful for time-based analysis, independent of specific commit versions.GetAnomaliesAroundRevision
: Finds anomalies that occurred near a particular revision (commit). This helps pinpoint performance changes related to a specific code submission.2. impl.go
:
Store
interface. It directly forwards requests to the chromeperf.AnomalyApiClient
.chromeperf
service. It can be used when caching is not desired or not yet implemented.store
struct (the implementation of Store
) makes a corresponding call to the ChromePerf
client. For example, GetAnomalies
calls ChromePerf.GetAnomalies
. Error handling is included to log failures from the chromeperf
service. Trace names are sorted before being passed to chromeperf
which might be a requirement or an optimization for the chromeperf
API.3. /go/anomalies/cache/cache.go
:
Store
interface. This is designed to improve performance by reducing the number of direct calls to the chromeperf
service, which can be network-intensive.chromeperf
service.testsCache
for anomalies queried by trace names and commit ranges, and revisionCache
for anomalies queried around a specific revision. LRU ensures that the least accessed items are evicted when the cache reaches its cacheSize
limit.cacheItemTTL
. A periodic cleanupCache
goroutine removes entries older than this TTL. This ensures that stale data doesn't persist indefinitely.invalidationMap
: This map tracks trace names for which anomalies have been modified (e.g., an alert was updated). If a trace name is in this map, any cached anomalies for that trace are considered invalid and will be re-fetched from chromeperf
.invalidationMap
itself is cleared periodically (invalidationCleanupPeriod
) to prevent it from growing too large. This is a trade-off: it's simpler and has lower memory overhead but can lead to inaccuracies if a trace is invalidated and then the map is cleared before the next fetch for that trace.numEntriesInCache
to monitor cache utilization.store
struct in cache.go
):GetAnomalies
:testsCache
.invalidationMap
. If a trace is marked invalid, it's treated as a cache miss.as.ChromePerf.GetAnomalies
.testsCache
with newly fetched data. Client Request (traceNames, startCommit, endCommit) | v [Cache Store] -- GetAnomalies() | +---------------------------------+ | For each traceName: | | 1. Check testsCache | ----> Cache Hit? -----> Add to Result | (Key: trace:start:end) | | | 2. Check invalidationMap | No (Cache Miss or Invalidated) +---------------------------------+ | | (traceNamesMissingFromCache) | v | [ChromePerf Client] -- GetAnomalies() -----------+ | v [Cache Store] -- Add new data to testsCache | v Return Combined Result
GetAnomaliesInTimeRange
: This method currently bypasses the cache and directly calls as.ChromePerf.GetAnomaliesTimeBased
. The decision to not cache time-based queries might be due to the potentially large and less frequently reused nature of such requests, or it might be a feature planned for later.GetAnomaliesAroundRevision
: Similar to GetAnomalies
, it first checks revisionCache
. If it's a miss, it fetches from as.ChromePerf.GetAnomaliesAroundRevision
and updates the cache.InvalidateTestsCacheForTraceName
: Adds a traceName
to the invalidationMap
. This is likely called when an external event (e.g., user updating an anomaly in Chrome Perf) indicates that the cached data for this trace is no longer accurate.4. /go/anomalies/mock/Store.go
:
Store
interface, generated using the testify/mock
library.anomalies.Store
to be tested in isolation, without needing a real chromeperf
instance or a fully functional cache. Developers can define expected calls and return values for the mock store.mock.Mock
struct from stretchr/testify
is embedded, providing methods like On()
, Return()
, and AssertExpectations()
to control and verify the mock’s behavior during tests.anomalies.Store
): This is a common and robust pattern in Go. It allows for flexibility in how anomalies are fetched and managed. For example, a new caching strategy or a different backend data source could be implemented without affecting code that consumes anomalies, as long as the new implementation adheres to the Store
interface.cache.go
):invalidationMap
: A pragmatic approach to handling external data modifications. While not perfectly accurate (invalidates all anomalies for a trace even if only one changed, and susceptible to the invalidationCleanupPeriod
timing), it's simpler and less memory-intensive than more granular invalidation schemes. This suggests a balance was struck between accuracy, complexity, and resource usage.testsCache
, revisionCache
): Likely done because the query patterns and cache keys for these two types of requests are different. testsCache
uses a composite key (traceName:startCommit:endCommit
), while revisionCache
uses the revision
number as the key.chromeperf
but often return an empty AnomalyMap
or nil
slice to the caller in case of an error from the underlying service. This design choice means that callers might receive no data instead of an error, simplifying the caller's error handling logic but potentially obscuring issues if not monitored through logs.chromeperf.GetAnomalies
or chromeperf.GetAnomaliesTimeBased
, the list of traceNames
is sorted. This could be a requirement of the chromeperf
API for deterministic behavior, or an optimization to improve chromeperf
's internal processing or caching.go.opencensus.io/trace
): Spans are added to some methods (GetAnomaliesInTimeRange
, GetAnomaliesAroundRevision
). This is crucial for observability, allowing developers to track the performance and flow of requests through the system, especially in a distributed environment.Typical Anomaly Retrieval (with Cache):
GetAnomalies*
methods on an anomalies.Store
instance (which is likely the cached store
from cache.go
).store
first checks its internal LRU cache(s) (testsCache
or revisionCache
) for the requested data.GetAnomalies
, it also consults the invalidationMap
to see if any relevant traces have been marked as stale.Caller -> anomalies.Store.GetAnomalies(traces, range) | v Cache.GetAnomalies() | +--> Check testsCache (e.g., trace1:100:200) -> Found & Valid | +--> Check testsCache (e.g., trace2:100:200) -> Not Found or Invalid | Return cached data for trace1
store
makes a network request to the chromeperf.AnomalyApiClient
. - The response from chromeperf
is received. - This new data is added to the LRU cache for future requests. - The data is returned to the caller. Caller -> anomalies.Store.GetAnomalies(traces, range) | v Cache.GetAnomalies() | +--> Check testsCache (e.g., trace1:100:200) -> Found & Valid | +--> Check testsCache (e.g., trace2:100:200) -> Not Found or Invalid | | | (Data for trace1) v +---------------------------> [ ChromePerf API ] -- GetAnomalies(trace2, range) | v Cache.Add(trace2_data) | v Combine trace1_data & trace2_data | v Return to Caller
Cache Invalidation Workflow:
cache.store.InvalidateTestsCacheForTraceName(ctx, "affected_trace_name")
.affected_trace_name
is added to the invalidationMap
in the cache.store
.GetAnomalies
call for affected_trace_name
:testsCache
contains an entry for this trace and range, the presence of affected_trace_name
in invalidationMap
will cause a cache miss.chromeperf
.invalidationMap
entry for affected_trace_name
typically remains until the invalidationMap
is periodically cleared.This module effectively decouples the rest of the Perf application from the direct complexities of interacting with chromeperf
for anomaly data, offering performance benefits through caching and a consistent interface for data retrieval.
The anomalygroup
module is designed to group related anomalies (regressions in performance metrics) together. This grouping allows for consolidated actions like filing a single bug report for multiple related regressions or triggering a single bisection job to find the common culprit for a set of anomalies. This approach aims to reduce noise and improve the efficiency of triaging performance regressions.
The core idea is to identify anomalies that share common characteristics, such as the subscription (alert configuration), benchmark, and commit range. When a new anomaly is detected, the system attempts to find an existing group that matches these criteria. If a suitable group is found, the new anomaly is added to it. Otherwise, a new group is created.
The module defines a gRPC service for managing anomaly groups, a storage interface for persisting group data, and utilities for processing regressions and interacting with the grouping logic.
store.go
: Anomaly Group Storage InterfaceThe store.go
file defines the Store
interface, which outlines the contract for persisting and retrieving anomaly group data. This abstraction allows for different storage backends (e.g., SQL databases) to be used.
Key Responsibilities:
The Store
interface ensures that the core logic for anomaly grouping is decoupled from the specific implementation of data persistence.
sqlanomalygroupstore/sqlanomalygroupstore.go
: SQL-backed Anomaly Group StoreThis file provides a concrete implementation of the Store
interface using a SQL database (specifically designed with CockroachDB and Spanner in mind).
Implementation Details:
sqlanomalygroupstore/schema/schema.go
. It includes fields for the group ID, creation time, list of anomaly IDs, metadata (stored as JSONB), common commit range, action type, and associated IDs for bisections, issues, and culprits.Create
: Inserts a new row into the AnomalyGroups
table. It takes parameters like subscription details, benchmark, commit range, and action, and stores them. The group metadata (subscription name, revision, domain, benchmark) is marshaled into a JSON string before insertion.LoadById
: Selects an anomaly group from the database based on its ID. It retrieves core attributes of the group.UpdateBisectID
, UpdateReportedIssueID
, AddAnomalyID
, AddCulpritIDs
: These methods execute SQL UPDATE statements to modify specific fields of an existing anomaly group record. They handle array appends for lists like anomaly_ids
and culprit_ids
, with specific syntax considerations for different SQL databases (e.g., Spanner's COALESCE
for array concatenation).FindExistingGroup
: Constructs a SQL SELECT query with WHERE clauses to match the provided criteria (subscription, revision, domain, benchmark, commit range overlap, and action). This allows finding groups that a new anomaly might belong to.Design Choices:
group_meta_data
as JSONB provides flexibility in the metadata stored without requiring schema changes for minor additions.anomaly_ids
and culprit_ids
as array types in the database is a natural way to represent lists of associated entities.dbType
checks.service/service.go
: gRPC Service ImplementationThis file implements the AnomalyGroupServiceServer
interface defined by the protobuf definitions in proto/v1/anomalygroup_service.proto
. It acts as the entry point for external systems to interact with the anomaly grouping functionality.
Responsibilities:
anomalygroup.Store
interface. For example, CreateNewAnomalyGroup
calls anomalygroupStore.Create
.FindTopAnomalies
Logic: This method involves more than a simple store passthrough.regression.Store
.median_before
to median_after
).ag.Anomaly
protobuf message format, extracting relevant paramset values (bot, benchmark, story, measurement, stat).FindIssuesFromCulprits
Logic:culprit.Store
to get the details of these culprits.GroupIssueMap
to find any issue IDs that are specifically associated with the given anomaly group ID. This allows correlation between a group (potentially containing multiple anomalies that led to a bisection) and the issues filed for the culprits found by that bisection.Design Choices:
anomalygroup.Store
, culprit.Store
, and regression.Store
as dependencies, promoting testability and decoupling.newGroupCounter
) whenever a new group is created, allowing for monitoring of the system's behavior.proto/v1/anomalygroup_service.proto
: Protocol Buffer DefinitionsThis file defines the gRPC service AnomalyGroupService
and the message types used for requests and responses. This is the contract for how clients interact with the anomaly grouping system.
Key Messages:
AnomalyGroup
: Represents a group of anomalies, including its ID, the action to take, lists of associated anomaly and culprit IDs, reported issue ID, and metadata like subscription and benchmark names.Anomaly
: Represents a single regression, including its start and end commit positions, a paramset
(key-value pairs describing the test), improvement direction, and median values before and after the regression.GroupActionType
: An enum defining the possible actions for a group (NOACTION, REPORT, BISECT).CreateNewAnomalyGroupRequest
, FindExistingGroupsResponse
).Purpose:
notifier/anomalygroupnotifier.go
: Anomaly Group NotifierThis component implements the notify.Notifier
interface. It's invoked when a new regression is detected by the alerting system. Its primary role is to integrate the regression detection with the anomaly grouping logic.
Workflow when RegressionFound
is called:
paramset
from the trace data.paramset
to ensure it contains required keys (e.g., master, bot, benchmark, test, subtest_1). This is important because the grouping and subsequent actions (like bisection) rely on these parameters.testPath
from the paramset
. This path is used in finding or creating anomaly groups.grouper.ProcessRegressionInGroup
(which eventually calls utils.ProcessRegression
) to handle the grouping logic for this new regression.Design Choices:
notify.Notifier
interface, allowing it to be plugged into the existing notification pipeline of the performance monitoring system.AnomalyGrouper
: It delegates the core grouping logic to an AnomalyGrouper
instance (typically utils.AnomalyGrouperImpl
). This keeps the notifier focused on the integration aspect.utils/anomalygrouputils.go
: Anomaly Grouping UtilitiesThis file contains the core logic for processing a new regression and integrating it into an anomaly group.
ProcessRegression
Function - Key Steps:
sync.Mutex
(groupingMutex
). This is a critical point: it aims to prevent race conditions when multiple regressions are processed concurrently, especially around creating new groups. However, the comment notes that with multiple containers, this mutex might not be sufficient and needs review.AnomalyGroupServiceClient
to communicate with the gRPC service.FindExistingGroups
gRPC method to see if the new anomaly fits into any current groups based on subscription, revision, action type, commit range overlap, and test path.CreateNewAnomalyGroup
to create a new group. - Calls UpdateAnomalyGroup
to add the current anomalyID
to this newly created group. - Triggers a Temporal Workflow: Initiates a MaybeTriggerBisection
workflow. This workflow is responsible for deciding whether to start a bisection or file a bug based on the group‘s action type and other conditions. Regression Detected --> FindExistingGroups | +-- No Group Found --> CreateNewAnomalyGroup --> UpdateAnomalyGroup (add anomaly) --> Start Temporal Workflow (MaybeTriggerBisection)
- If existing group(s) are found: - For each matching group: - Calls UpdateAnomalyGroup
to add the current anomalyID
to that group. - Calls FindIssuesToUpdate
to determine if any existing bug reports (either the group’s own ReportedIssueId
or issues linked via culprits) should be updated with information about this new anomaly. - If issues are found, it uses the issuetracker
to add a comment to each relevant issue. Regression Detected --> FindExistingGroups | +-- Group(s) Found --> For each group: | +-- UpdateAnomalyGroup (add anomaly) +-- FindIssuesToUpdate --> If issues exist --> Add Comment to Issue(s)
FindIssuesToUpdate
Function:
This helper determines which existing issue tracker IDs should be updated with information about a new anomaly being added to a group.
group_action
is REPORT
and reported_issue_id
is set on the group, that issue ID is returned.group_action
is BISECT
, it calls the FindIssuesFromCulprits
gRPC method. This method looks up culprits associated with the group and then checks if those culprits have specific issues filed for them in the context of this particular group. This is important because a single culprit (commit) might be associated with multiple anomaly groups, and each might have its own context or bug report.Design Choices:
The module extensively uses mocks for testing:
mocks/Store.go
: A mock implementation of the anomalygroup.Store
interface, generated by testify/mock
. Used in service/service_test.go
.proto/v1/mocks/AnomalyGroupServiceServer.go
: A mock for the gRPC server interface AnomalyGroupServiceServer
, generated by testify/mock
(with manual adjustments noted in the file). Used by clients or other services that might call this gRPC service.utils/mocks/AnomalyGrouper.go
: A mock for the AnomalyGrouper
interface, used in notifier/anomalygroupnotifier_test.go
.This approach allows for unit testing components in isolation by providing controlled behavior for their dependencies.
AnomalyGroupNotifier.RegressionFound
is called.paramset
, validates it, and derives testPath
.utils.ProcessRegression
):AnomalyGroupService.FindExistingGroups
using the anomaly's properties (subscription, commit range, test path, action type).AnomalyGroupService.CreateNewAnomalyGroup
is called.AnomalyGroupService.UpdateAnomalyGroup
.MaybeTriggerBisection
) is started for this new group.AnomalyGroupService.UpdateAnomalyGroup
.utils.FindIssuesToUpdate
is called for each group.REPORT
and it has a ReportedIssueId
, that issue is updated.BISECT
, AnomalyGroupService.FindIssuesFromCulprits
is called. If it returns issue IDs associated with this group’s culprits, those issues are updated.MaybeTriggerBisection
- not detailed here but implied):GroupActionType
:BISECT
: It might check conditions (e.g., number of anomalies in the group) and then trigger a bisection job (e.g., Pinpoint) using AnomalyGroupService.FindTopAnomalies
to pick the most significant anomaly. The bisection ID is then saved to the group.REPORT
: It might check conditions and then file a bug using AnomalyGroupService.FindTopAnomalies
to gather details. The issue ID is saved to the group.This system aims to automate and streamline the handling of performance regressions by intelligently grouping them and initiating appropriate follow-up actions.
The /go/backend
module implements a gRPC-based backend service for Perf. This service is designed to host API endpoints that are not directly user-facing, promoting a separation of concerns and enabling better scalability and maintainability.
Core Purpose and Design Philosophy:
The primary motivation for this backend service is to create a stable, internal API layer. This decouples user-facing components (like the frontend) from the direct implementation details of various backend tasks. For instance, if Perf needs to trigger a Pinpoint job, the frontend doesn't interact with Pinpoint or a workflow engine like Temporal directly. Instead, it makes a gRPC call to an endpoint on this backend service. The backend service then handles the interaction with the underlying system (e.g., Temporal).
This design offers several advantages:
Key Components and Responsibilities:
backend.go
: This is the heart of the backend service.
Backend
struct: Encapsulates the state and configuration of the backend application, including gRPC server settings, ports, and loaded configuration.BackendService
interface: Defines a contract for any service that wishes to be hosted by this backend. Each such service must provide its gRPC service descriptor, registration logic, and an authorization policy. This interface-based approach allows for modular addition of new functionalities.GetAuthorizationPolicy()
method returns a shared.AuthorizationPolicy
which specifies whether unauthenticated access is allowed and which user roles are authorized to call the service or specific methods within it.RegisterGrpc()
is responsible for registering the specific gRPC service implementation with the main gRPC server.GetServiceDescriptor()
provides metadata about the gRPC service.initialize()
function: This is a crucial setup function. It:demo.json
).NotifyConfig.Notifications
is set to AnomalyGrouper
, as this indicates that anomaly grouping workflows managed by Temporal are in use.BackendService
implementations. This involves setting up authorization rules based on the policy defined by each service and then registering their gRPC handlers.configureServices()
and registerServices()
: These helper functions iterate over the list of BackendService
implementations to set up authorization and register them with the main gRPC server.configureAuthorizationForService()
: This function applies the authorization policies defined by each individual service to the gRPC server's authorization policy. It uses grpcsp.ServerPolicy
to define which roles can access the service or specific methods.New()
constructor: Creates and initializes a new Backend
instance. It takes various store implementations and a notifier as arguments, allowing for dependency injection, particularly useful for testing. If these are nil
, they are typically created within initialize()
based on the configuration.ServeGRPC()
and Serve()
: These methods start the gRPC server and block until it's shut down.Cleanup()
: Handles graceful shutdown of the gRPC server.pinpoint.go
: This file defines a wrapper for the actual Pinpoint service implementation (which resides in pinpoint/go/service
).
pinpointService
struct: Implements the BackendService
interface.NewPinpointService()
: Creates a new instance, taking a Temporal provider and a rate limiter as arguments. This indicates that Pinpoint operations might be rate-limited and potentially involve Temporal workflows.roles.Editor
to access Pinpoint functionalities. This is a good example of how specific services define their own access control rules.shared/authorization.go
:
AuthorizationPolicy
struct: A simple struct used by BackendService
implementations to declare their authorization requirements. This includes whether unauthenticated access is permitted, a list of roles authorized for the entire service, and a map for method-specific role authorizations. This promotes a consistent way for services to define their security posture.client/backendclientutil.go
: This utility file provides helper functions for creating gRPC clients to connect to the backend service itself (or specific services hosted by it).
getGrpcConnection()
: Abstracts the logic for establishing a gRPC connection. It handles both insecure (typically for local development/testing) and secure connections. For secure connections, it uses TLS (with InsecureSkipVerify: true
as it's intended for internal GKE cluster communication) and OAuth2 for authentication, obtaining tokens for the service account running the client process.NewPinpointClient()
, NewAnomalyGroupServiceClient()
, NewCulpritServiceClient()
: These are factory functions that simplify the creation of typed gRPC clients for the specific services hosted on the backend. They first check if the backend service is configured/enabled before attempting to create a connection. This pattern makes it easy for other internal services to consume the APIs provided by this backend.backendserver/main.go
: This is the entry point for the backend server executable.
urfave/cli
library to define a command-line interface.run
command initializes and starts the Backend
service using the backend.New()
constructor and then calls b.Serve()
.config.BackendFlags
) and passes them to the backend
package. It doesn't instantiate stores or notifiers directly, relying on the backend.New
(and subsequently initialize
) to create them based on the loaded configuration if nil
is passed.Workflow Example: Handling a gRPC Request
client/backendclientutil.go
) to make a call to a specific method on a service hosted by the backend (e.g., Pinpoint.ScheduleJob
).Backend
server's listener (b.lisGRPC
).grpc.Server
routes the request to the appropriate service implementation (e.g., pinpointService
).grpcsp.ServerPolicy
): Before the service method is executed, the UnaryInterceptor
configured in backend.go
(which uses b.serverAuthPolicy
) intercepts the call. Incoming gRPC Request --> UnaryInterceptor (grpcsp) | V Check Auth Policy for Service/Method (defined by pinpointService.GetAuthorizationPolicy()) | V Allow/Deny ----> Yes: Proceed to service method No: Return error
pinpointService
(which delegates to the actual pinpoint_service.PinpointServer
implementation) is invoked.Configuration and Initialization:
The system relies heavily on a configuration file (specified by flags.ConfigFilename
, often demo.json
for local development as seen in backend_test.go
and testdata/demo.json
). This file dictates:
data_store_config
).notify_config
).backend_host_url
), which it might use if it needs to call itself or if other components need to discover it.temporal_config
- though not explicitly in demo.json
, it's checked in backend.go
).The initialize
function in backend.go
is responsible for parsing this configuration and setting up all necessary dependencies like database connections, the Temporal client, and the culprit notifier. The use of builder functions (e.g., builders.NewAnomalyGroupStoreFromConfig
) allows the system to be flexible with regard to the actual implementations of these components, as long as they conform to the required interfaces.
This backend module serves as a crucial intermediary, enhancing the robustness and maintainability of the Perf system by providing a well-defined internal API layer.
The go/bug
module is designed to facilitate the creation of URLs for reporting bugs or regressions identified within the Skia performance monitoring system. Its primary purpose is to dynamically generate these URLs based on a predefined template and specific details about the identified issue. This approach allows for flexible integration with various bug tracking systems, as the URL structure can be configured externally.
Core Functionality and Design:
The module centers around the concept of URI templates. Instead of hardcoding URL formats for specific bug trackers, it uses a template string that contains placeholders for relevant information. This makes the system adaptable to changes in bug tracker URL schemes or the adoption of new trackers without requiring code modifications.
The key function, Expand
, takes a URI template and populates it with details about the regression. These details include:
clusterLink
: A URL pointing to the specific performance data cluster that exhibits the regression. This provides direct context for anyone investigating the bug.c provider.Commit
: Information about the specific commit suspected of causing the regression. This includes the commit's URL, allowing for easy navigation to the code change. The use of the provider.Commit
type from perf/go/git/provider
indicates an integration with a system that can furnish commit details.message
: A user-provided message describing the regression. This allows the reporter to add specific observations or context.The Expand
function utilizes the gopkg.in/olivere/elastic.v5/uritemplates
library to perform the actual substitution of placeholders in the template string with the provided values. This library handles URL encoding of the substituted values, ensuring the generated URL is valid.
Key Components/Files:
bug.go
: This file contains the core logic for expanding URI templates.
Expand(uriTemplate string, clusterLink string, c provider.Commit, message string) string
: This is the primary function responsible for generating the bug reporting URL. It takes the template and the contextual information as input and returns the fully formed URL. If the template expansion fails (e.g., due to a malformed template), it logs an error using go.skia.org/infra/go/sklog
and returns an empty string or a partially formed URL depending on the nature of the error.ExampleExpand(uriTemplate string) string
: This function serves as a utility or example for demonstrating how to use the Expand
function. It calls Expand
with pre-defined example data for the cluster link, commit, and message. This can be useful for testing the template expansion logic or for providing a quick way to see how a given template would be populated.bug_test.go
: This file contains unit tests for the functionality in bug.go
.
TestExpand(t *testing.T)
: This test function verifies that the Expand
function correctly substitutes the provided values into the URI template and produces the expected URL. It uses the github.com/stretchr/testify/assert
library for assertions, ensuring that the generated URL matches the anticipated output, including proper URL encoding.Workflow:
A typical workflow involving this module would be:
Configuration: An external system (e.g., the Perf frontend) is configured with a URI template for the desired bug tracking system. This template will contain placeholders like {cluster_url}
, {commit_url}
, and {message}
. Example Template: https://bugtracker.example.com/new?summary=Regression%20Found&description=Regression%20details:%0ACluster:%20{cluster_url}%0ACommit:%20{commit_url}%0AMessage:%20{message}
Regression Identification: A user or an automated system identifies a performance regression.
Information Gathering: The system gathers the necessary information:
URL Generation: The Expand
function in go/bug
is called with the configured URI template and the gathered information.
template := "https://bugtracker.example.com/new?summary=Regression%20Found&description=Cluster:%20{cluster_url}%0ACommit:%20{commit_url}%0AMessage:%20{message}" clusterURL := "https://perf.skia.org/t/?some_params" commitData := provider.Commit{URL: "https://skia.googlesource.com/skia/+show/abcdef123"} userMessage := "Significant drop in frame rate on TestXYZ." bugReportURL := bug.Expand(template, clusterURL, commitData, userMessage)
Redirection/Display: The generated bugReportURL
is then presented to the user, who can click it to navigate to the bug tracker with the pre-filled information.
This design decouples the bug reporting logic from the specifics of any single bug tracking system, promoting flexibility and maintainability. The use of a standard URI template expansion library ensures robustness in URL generation.
The builders
module is responsible for constructing various core components of the Perf system based on instance configuration. This centralized approach to object creation prevents cyclical dependencies that could arise if configuration objects were directly responsible for building the components they configure. The module acts as a factory, taking an InstanceConfig
and returning fully initialized and operational objects like data stores, file sources, and caches.
The primary design goal is to decouple the configuration of Perf components from their instantiation. This allows for cleaner dependencies and makes it easier to manage the lifecycle of different parts of the system. For example, a TraceStore
needs a database connection, but the InstanceConfig
that defines the database connection string shouldn't also be responsible for creating the TraceStore
itself. The builders
module bridges this gap.
Key components and their instantiation logic:
builders.go
: This is the central file containing all the builder functions.NewDBPoolFromConfig
): This function is crucial as many other components rely on a database connection. It establishes a connection pool to the configured database (e.g., CockroachDB, Spanner).InstanceConfig
, configures pool parameters like maximum and minimum connections, and sets up a logging adapter (pgxLogAdaptor
) to integrate database logs with the application's logging system.singletonPool
. This ensures that only one database connection pool is created per application instance, preventing resource exhaustion and ensuring consistent database interaction. A mutex (singletonPoolMutex
) protects the creation of this singleton.timeout.New
wrapper. This enforces that all database operations are performed within a context that has a timeout, preventing indefinite blocking. InstanceConfig --> NewDBPoolFromConfig --> pgxpool.ParseConfig | +-> pgxpool.ConnectConfig --> rawPool | +-> timeout.New(rawPool) --> singletonPool (if schema check passes)
NewPerfGitFromConfig
): Constructs a perfgit.Git
object, which provides an interface to Git repository data.getDBPool
(which in turn uses NewDBPoolFromConfig
) and then instantiates perfgit.New
with this pool and the instance configuration.NewTraceStoreFromConfig
): Creates a tracestore.TraceStore
for managing performance trace data.TraceParamStore
(for managing trace parameter sets) and then instantiates the appropriate sqltracestore
.NewMetadataStoreFromConfig
): Creates a tracestore.MetadataStore
for managing metadata associated with traces.TraceStore
, it obtains a database pool and then creates an sqltracestore.NewSQLMetadataStore
.getDBPool
and then instantiate their respective SQL-backed store implementations (e.g., sqlalertstore
, sqlregression2store
).NewRegressionStoreFromConfig
has a conditional logic based on instanceConfig.UseRegression2
to instantiate either sqlregression2store
or sqlregressionstore
. This allows for migrating to a new regression store implementation controlled by configuration.NewGraphsShortcutStoreFromConfig
can return a cached version (graphsshortcutstore.NewCacheGraphsShortcutStore
) if localToProd
is true, indicating a local development or testing environment where a simpler in-memory cache might be preferred over a database-backed store.NewSourceFromConfig
): Creates a file.Source
which defines where Perf ingests data from (e.g., Google Cloud Storage, local directories).switch
statement based on instanceConfig.IngestionConfig.SourceConfig.SourceType
to instantiate either a gcssource
or a dirsource
.NewIngestedFSFromConfig
): Creates a fs.FS
(file system interface) that provides access to already ingested files.NewSourceFromConfig
, it switches on the source type to return a GCS or local file system implementation.GetCacheFromConfig
): Returns a cache.Cache
instance (either Redis-backed or local in-memory).instanceConfig.QueryConfig.CacheConfig.Type
to determine whether to create a redisCache
(connecting to a Google Cloud Redis instance) or a localCache
.The getDBPool
helper function is used internally by many builder functions. It acts as a dispatcher based on instanceConfig.DataStoreConfig.DataStoreType
, calling NewDBPoolFromConfig
with appropriate schema checking flags. This abstracts the direct call to NewDBPoolFromConfig
and centralizes the logic for selecting the database type.
The test file (builders_test.go
) ensures that these builder functions correctly instantiate objects and handle different configurations, including invalid ones. A notable aspect of the tests is the management of the singletonPool
. Since NewDBPoolFromConfig
creates a singleton, tests that require fresh database instances must explicitly clear this singleton (singletonPool = nil
) before calling the builder to avoid reusing a connection from a previous test. This is handled in newDBConfigForTest
.
The chromeperf
module facilitates interaction with the Chrome Perf backend, which is the system of record for performance data for Chromium. This module allows Perf to send and receive data from Chrome Perf.
The primary responsibility of this module is to abstract the communication details with the Chrome Perf API. It provides a typed Go interface to various Chrome Perf endpoints, handling request formatting, authentication, and response parsing.
This interaction is crucial for:
sqlreversekeymapstore
submodule, helps manage these differences.chromeperfClient.go
This file defines the generic ChromePerfClient
interface and its implementation, chromePerfClientImpl
. This is the core component responsible for making HTTP GET and POST requests to the Chrome Perf API.
Why: Abstracting the HTTP client allows for easier testing (by mocking the client) and centralizes the logic for handling authentication (using OAuth2 Google default token source) and constructing target URLs.
How:
google.DefaultTokenSource
for authentication.generateTargetUrl
constructs the correct API endpoint URL, differentiating between the Skia-Bridge proxy (https://skia-bridge-dot-chromeperf.appspot.com
) and direct calls to the legacy Chrome Perf endpoint (https://chromeperf.appspot.com
). The Skia-Bridge is generally preferred.SendGetRequest
and SendPostRequest
handle the actual HTTP communication, JSON marshalling/unmarshalling, and basic error handling, including checking for accepted HTTP status codes.Example workflow for a POST request:
Caller -> chromePerfClient.SendPostRequest(ctx, "anomalies", "add", requestBody, &responseObj, []int{200}) | | (Serializes requestBody to JSON) v |--------------------------------------------------------------------------------------------------------| | generateTargetUrl("https://skia-bridge-dot-chromeperf.appspot.com/anomalies/add") | |--------------------------------------------------------------------------------------------------------| | v httpClient.Post(targetUrl, "application/json", jsonBody) | v (HTTP Request to Chrome Perf API) | v (Receives HTTP Response) | v (Checks if response status code is in acceptedStatusCodes) | v (Deserializes response body into responseObj) | v Caller (receives populated responseObj or error)
anomalyApi.go
This file builds upon chromeperfClient.go
to provide a specialized client for interacting with the /anomalies
endpoint in Chrome Perf. It defines the AnomalyApiClient
interface and its implementation anomalyApiClientImpl
.
Why: This client encapsulates the logic specific to anomaly-related operations, such as formatting requests for reporting regressions or fetching anomaly details, and parsing the specific JSON structures returned by these endpoints. It also handles the translation between Perf‘s trace identifiers and Chrome Perf’s test_path
format.
How:
ReportRegression
: Constructs a ReportRegressionRequest
and sends it to the anomalies/add
endpoint. This is how Perf informs Chrome Perf about a new regression.GetAnomalyFromUrlSafeKey
: Fetches details for a specific anomaly using its key from the anomalies/get
endpoint.GetAnomalies
: Retrieves anomalies for a list of tests within a specific commit range (min_revision
, max_revision
) by calling the anomalies/find
endpoint.traceNameToTestPath
converts Perf‘s comma-separated key-value trace names (e.g., ,benchmark=Blazor,bot=MacM1,...
) into Chrome Perf’s slash-separated test_path
(e.g., ChromiumPerf/MacM1/Blazor/...
).perfGit.CommitNumberFromGitHash
to resolve these.GetAnomaliesTimeBased
: Similar to GetAnomalies
, but fetches anomalies based on a time range (start_time
, end_time
) by calling the anomalies/find_time
endpoint.GetAnomaliesAroundRevision
: Fetches anomalies that occurred around a specific revision number.traceNameToTestPath
: This function is key for interoperability. It parses a Perf trace name (which is a string of key-value pairs) and constructs the corresponding test_path
string that Chrome Perf expects. It also handles an experimental feature (EnableSkiaBridgeAggregation
) which can modify how test paths are generated, particularly for aggregated statistics (e.g., ensuring testName_avg
is used if the stat
is value
).statToSuffixMap
and hasSuffixInTestValue
addresses historical inconsistencies where test names in Perf might or might not include statistical suffixes (like _avg
, _max
). The goal is to derive the correct Chrome Perf test_path
.Workflow for fetching anomalies:
Perf UI/Backend -> anomalyApiClient.GetAnomalies(ctx, ["trace_A,key=val", "trace_B,key=val"], 100, 200) | v (For each traceName) traceNameToTestPath("trace_A,key=val") -> "chromeperf/test/path/A" | v chromeperfClient.SendPostRequest(ctx, "anomalies", "find", {Tests: ["path/A", "path/B"], MinRevision: "100", MaxRevision: "200"}, &anomaliesResponse, ...) | v (Parses anomaliesResponse, potentially resolving commit hashes to commit numbers) | v Perf UI/Backend (receives AnomalyMap)
alertGroupApi.go
This file provides a client for interacting with Chrome Perf's /alert_group
API, specifically to get details about alert groups. An alert group in Chrome Perf typically corresponds to a set of related anomalies (regressions).
Why: When Perf displays information about an alert (which might have originated from Chrome Perf), it needs to fetch details about the associated alert group, such as the specific anomalies included, the commit range, and other metadata.
How:
GetAlertGroupDetails
: Takes an alert group key and calls the alert_group/details
endpoint on Chrome Perf.AlertGroupDetails
struct holds the response, including a map of Anomalies
(where the value is the Chrome Perf test_path
) and start/end commit numbers/hashes.GetQueryParams
and GetQueryParamsPerTrace
: These methods are utilities to transform the AlertGroupDetails
into query parameters that can be used to construct URLs for Perf's own explorer page. This allows users to easily navigate from a Chrome Perf alert to viewing the corresponding data in Perf.GetQueryParams
aggregates all test path components (masters, bots, benchmarks, etc.) from all anomalies in the group into a single set of parameters.GetQueryParamsPerTrace
generates a separate set of query parameters for each individual anomaly in the alert group.test_path
from Chrome Perf back into individual components.Workflow for getting alert group details:
Perf Backend (e.g., when processing an incoming alert from Chrome Perf) | v alertGroupApiClient.GetAlertGroupDetails(ctx, "chrome_perf_group_key") | v chromeperfClient.SendGetRequest(ctx, "alert_group", "details", {key: "chrome_perf_group_key"}, &alertGroupResponse) | v (alertGroupResponse is populated) | v alertGroupResponse.GetQueryParams(ctx) -> Perf Explorer URL query params
store.go
and the sqlreversekeymapstore
submodulestore.go
defines the ReverseKeyMapStore
interface. The sqlreversekeymapstore
directory and its schema
subdirectory provide an SQL-based implementation of this interface.
Why: Test paths in Chrome Perf can contain characters that are considered “invalid” or are handled differently by Perf‘s parameter parsing (e.g., Perf’s trace keys are comma-separated key-value pairs, and the values themselves should ideally not interfere with this). When data is ingested into Perf from Chrome Perf, or when Perf constructs test paths to query Chrome Perf, these “invalid” characters in Chrome Perf test path components (like subtest names) might be replaced (e.g., with underscores).
This creates a problem: if Perf has test/foo_bar
and Chrome Perf has test/foo?bar
, Perf needs a way to know that foo_bar
corresponds to foo?bar
when querying Chrome Perf. The ReverseKeyMapStore
is designed to store these mappings.
How:
sqlreversekeymapstore/schema/schema.go
defines the SQL table schema ReverseKeyMapSchema
with columns:ModifiedValue
: The value as it appears in Perf (e.g., foo_bar
).ParamKey
: The parameter key this value belongs to (e.g., subtest_1
).OriginalValue
: The original value as it was in Chrome Perf (e.g., foo?bar
).ModifiedValue
and ParamKey
.sqlreversekeymapstore/sqlreversekeymapstore.go
implements the ReverseKeyMapStore
interface using a SQL database (configurable for CockroachDB or Spanner via different SQL statements).Create
: Inserts a new mapping. If a mapping for the ModifiedValue
and ParamKey
already exists (conflict), it does nothing. This is important because the mapping should be stable.Get
: Retrieves the OriginalValue
given a ModifiedValue
and ParamKey
.This store is likely used during the process of converting between Perf trace parameters and Chrome Perf test paths, especially when generating requests to Chrome Perf. If a parameter value in Perf might have been modified from its Chrome Perf original, this store can be queried to get the original value needed for the Chrome Perf API call. The exact point of integration for creating these mappings (i.e., when are Create
calls made) is not explicitly detailed within this module but would typically happen when Perf first encounters/ingests a test path from Chrome Perf that requires modification.
For example, if anomalyApi.go
needs to construct a test_path
to query Chrome Perf based on parameters from Perf:
test=my_test, subtest_1=value_with_question_mark
test_path
segment for subtest_1
: - Call reverseKeyMapStore.Get(ctx, "value_with_question_mark", "subtest_1")
. - If it returns an original value like "value?with?question?mark"
, use that for the Chrome Perf API call. - Otherwise, use "value_with_question_mark"
.The store.go
file simply defines the interface, allowing for different backend implementations of this mapping store if needed, though sqlreversekeymapstore
is the provided concrete implementation.
The clustering2
module is responsible for grouping similar performance traces together using k-means clustering. This helps in identifying patterns and regressions in performance data by analyzing the collective behavior of traces rather than individual ones. The core idea is to represent each trace as a point in a multi-dimensional space and then find k
clusters of these points.
K-means is a well-understood and relatively efficient clustering algorithm suitable for the scale of performance data encountered. It partitions data into k
distinct, non-overlapping clusters. Each data point belongs to the cluster with the nearest mean (cluster centroid). This approach allows for the summarization of large numbers of traces into a smaller set of representative “shapes” or behaviors.
clustering.go
This file contains the primary logic for performing k-means clustering on performance traces.
ClusterSummary
: This struct represents a single cluster found by the k-means algorithm.
Centroid
: The average shape of all traces in this cluster. This is the core representation of the cluster's behavior.Keys
: A list of identifiers for the traces belonging to this cluster. These are sorted by their distance to the Centroid
, allowing users to quickly see the most representative traces. This is not serialized to JSON to keep the payload manageable, as it can be very large.Shortcut
: An identifier for a pre-computed set of Keys
, used for efficient retrieval and display in UIs.ParamSummaries
: A breakdown of the parameter key-value pairs present in the cluster and their prevalence (see valuepercent.go
). This helps in understanding what distinguishes this cluster (e.g., “all traces in this cluster are for arch=x86
”).StepFit
: Contains information about how well the Centroid
fits a step function. This is crucial for identifying regressions or improvements that manifest as sudden shifts in performance.StepPoint
: The specific data point (commit/timestamp) where the step (if any) in the Centroid
is detected.Num
: The total number of traces in this cluster.Timestamp
: Records when the cluster analysis was performed.NotificationID
: Stores the ID of any alert or notification sent regarding a significant step change detected in this cluster.ClusterSummaries
: A container for all the ClusterSummary
objects produced by a single clustering run, along with metadata like the K
value used and the StdDevThreshold
.
CalculateClusterSummaries
function: This is the main entry point for the clustering process.
dataframe.DataFrame
(which holds traces and their metadata) and converts each trace into a kmeans.Clusterable
object. The ctrace2.NewFullTrace
function is used here, which likely involves some form of normalization or feature extraction to make traces comparable. The stddevThreshold
parameter is used during this conversion, potentially to filter out noisy or flat traces.chooseK
): K-means requires an initial set of k
centroids. This function randomly selects k
traces from the input data to serve as the initial centroids. Random selection is a common and simple initialization strategy.kmeans.Do
function performs one iteration of the k-means algorithm:ctrace2.CalculateCentroid
function is likely responsible for computing the mean of a set of traces.MAX_KMEANS_ITERATIONS
or until the change in totalError
(sum of squared distances from each point to its centroid) between iterations falls below KMEAN_EPSILON
. This convergence criterion prevents unnecessary computations once the clusters stabilize.Progress
callback can be provided to monitor the clustering process, reporting the totalError
at each iteration.getClusterSummaries
): After the k-means algorithm converges, this function takes the final centroids and the original observations to generate ClusterSummary
objects for each cluster.ParamSummaries
(see valuepercent.go
) to describe the common characteristics of traces in that cluster.stepfit.GetStepFitAtMid
) on the cluster's centroid to identify significant performance shifts. The interesting
parameter likely defines a threshold for what constitutes a noteworthy step change, and stepDetection
specifies the algorithm or method used for step detection.ClusterSummary.Keys
lists the most representative traces first. A limited number of sample keys (config.MaxSampleTracesPerCluster
) are stored.ClusterSummary
objects are sorted, likely by the magnitude or significance of the detected step (StepFit.Regression
), to highlight the most impactful changes first.Constants:
K
: The default number of clusters to find. 50 is chosen as a balance between granularity and computational cost.MAX_KMEANS_ITERATIONS
: A safeguard against non-converging k-means runs.KMEAN_EPSILON
: A threshold to determine convergence, balancing precision with computation time.valuepercent.go
This file defines how to summarize and present the parameter distributions within a cluster.
ValuePercent
struct: Represents a specific parameter key-value pair (e.g., “config=8888”) and the percentage of traces in a cluster that have this pair. This provides a quantitative measure of how characteristic a parameter is for a given cluster.
SortValuePercentSlice
function: This is crucial for making the ParamSummaries
in ClusterSummary
human-readable and informative. The goal is to:
This complex sorting logic ensures that the most dominant and distinguishing parameters for a cluster are presented prominently. For example:
config=8888 90% config=565 10% arch=x86 80% arch=arm 20%
Here, “config” is listed before “arch” because its top value (“config=8888”) has a higher percentage (90%) than the top value for “arch” (“arch=x86” at 80%).
Input: DataFrame (traces, headers), K, StdDevThreshold, ProgressCallback, InterestingThreshold, StepDetectionMethod 1. [clustering.go: CalculateClusterSummaries] a. Initialize empty list of observations. b. For each trace in DataFrame.TraceSet: i. Create ClusterableTrace (ctrace2.NewFullTrace) using trace data and StdDevThreshold. ii. Add to observations list. c. If no observations, return error. d. [clustering.go: chooseK] i. Randomly select K observations to be initial centroids. e. Initialize lastTotalError = 0.0 f. Loop MAX_KMEANS_ITERATIONS times OR until convergence: i. [kmeans.Do] -> new_centroids 1. Assign each observation to its closest centroid (from previous iteration or initial). 2. Recalculate centroids (ctrace2.CalculateCentroid) based on assigned observations. ii. [kmeans.TotalError] -> currentTotalError iii. If ProgressCallback provided, call it with currentTotalError. iv. If |currentTotalError - lastTotalError| < KMEAN_EPSILON, break loop. v. lastTotalError = currentTotalError g. [clustering.go: getClusterSummaries] -> clusterSummaries i. [kmeans.GetClusters] -> allClusters (list of observations per centroid) ii. For each cluster in allClusters and its corresponding centroid: 1. Create new ClusterSummary. 2. [clustering.go: getParamSummaries] (using cluster members) -> ParamSummaries a. [clustering.go: GetParamSummariesForKeys] i. Count occurrences of each param=value in cluster keys. ii. Convert counts to ValuePercent structs. iii. [valuepercent.go: SortValuePercentSlice] -> sorted ParamSummaries. 3. [stepfit.GetStepFitAtMid] (on centroid values, StdDevThreshold, InterestingThreshold, StepDetectionMethod) -> StepFit, StepPoint. 4. Set ClusterSummary.Num = number of members in cluster. 5. Sort cluster members by distance to centroid. 6. Populate ClusterSummary.Keys with top N sorted member keys. 7. Populate ClusterSummary.Centroid with centroid values. iii. Sort all ClusterSummary objects (e.g., by StepFit.Regression). h. Populate ClusterSummaries struct with results, K, and StdDevThreshold. i. Return ClusterSummaries. Output: ClusterSummaries object or error.
This process effectively transforms raw trace data into a structured summary that highlights significant patterns and changes, facilitating performance analysis and regression detection.
The /go/config
module defines the configuration structure for Perf instances and provides utilities for loading, validating, and managing these configurations. It plays a crucial role in customizing the behavior of a Perf instance, from data ingestion and storage to alert notifications and UI presentation.
Core Responsibilities and Design:
The primary responsibility of this module is to define and manage the InstanceConfig
struct. This struct is a comprehensive container for all settings that govern a Perf instance. The design emphasizes:
InstanceConfig
struct (config.go
), the module provides a single source of truth. This simplifies understanding the state of an instance and reduces the chances of configuration drift.encoding/json
for this, making it easy to create, read, and modify configurations./go/config/validate/validate.go
, /go/config/validate/instanceConfigSchema.json
).instanceConfigSchema.json
) formally defines the structure and types of the InstanceConfig
. This schema is automatically generated from the Go struct definition using the /go/config/generate/main.go
program, ensuring the schema stays in sync with the code.validate.InstanceConfigFromFile
function uses this schema to validate a configuration file before attempting to deserialize it. This allows for early detection of malformed or incomplete configurations.BackendFlags
, FrontendFlags
, IngestFlags
, and MaintenanceFlags
(config.go
). These structs group related command-line flags and provide methods (AsCliFlags
) to convert them into cli.Flag
slices, compatible with the github.com/urfave/cli/v2
library. This design keeps flag definitions organized and associated with the components they configure.InstanceConfig
is designed to be extensible. New configuration options can be added as new fields to the relevant sub-structs. The JSON schema generation and validation mechanisms will automatically adapt to these changes.Key Components and Files:
config.go
: This is the heart of the module.InstanceConfig
struct, which aggregates various sub-configuration structs like AuthConfig
, DataStoreConfig
, IngestionConfig
, GitRepoConfig
, NotifyConfig
, IssueTrackerConfig
, AnomalyConfig
, QueryConfig
, TemporalConfig
, and DataPointConfig
. Each of these sub-structs groups settings related to a specific aspect of the Perf system (e.g., authentication, data storage, data ingestion).DataStoreType
, SourceType
, GitAuthType
, GitProvider
, TraceFormat
) to provide clear and constrained options for certain configuration values.DurationAsString
, a custom type for handling time.Duration
serialization and deserialization as strings in JSON, which is more human-readable than nanosecond integers. It also provides a custom JSON schema for this type.MaxSampleTracesPerCluster
, MinStdDev
, GotoRange
, and QueryMaxRunTime
are defined here, providing default values or limits used across the application./go/config/validate/validate.go
:InstanceConfig
beyond what the JSON schema can enforce. This includes semantic checks, such as ensuring that required fields are present based on the values of other fields (e.g., API keys for issue tracker notifications).InstanceConfigFromFile
function is the primary entry point for loading and validating a configuration file. It first performs schema validation and then calls the Validate
function for further business logic checks.NotifyConfig
by attempting to format them with sample data. This helps catch template syntax errors early./go/config/validate/instanceConfigSchema.json
:InstanceConfig
JSON files. It is used by validate.go
to perform initial validation of configuration files./go/config/generate/main.go
:instanceConfigSchema.json
file based on the InstanceConfig
struct definition in config.go
. This ensures that the schema is always up-to-date with the Go code. The //go:generate
directive at the top of the file allows for easy regeneration of the schema.config_test.go
and /go/config/validate/validate_test.go
:DurationAsString
), and validation logic. The tests for validate.go
include checks against actual configuration files used in production (//perf:configs
), ensuring that the validation logic is robust and correctly handles real-world scenarios.Workflows:
1. Loading and Validating a Configuration File:
User provides config file path (e.g., "configs/nano.json") | V Application calls validate.InstanceConfigFromFile("configs/nano.json") | V validate.go: Reads the JSON file content. | V validate.go: Validates content against instanceConfigSchema.json (using jsonschema.Validate). | \ | (If schema violation) \ V V Error returned with schema violations. Deserializes JSON into config.InstanceConfig struct. | V validate.go: Calls Validate(instanceConfig) for further business logic checks. | (e.g., API key presence, template validity) | | (If validation error) V Error returned. | V (If all valid) Returns the populated config.InstanceConfig struct. | V Application sets config.Config = returnedInstanceConfig | V Perf instance uses config.Config for its operations.
2. Generating the JSON Schema:
This is typically done during development when the InstanceConfig
struct changes.
Developer modifies config.InstanceConfig struct in config.go | V Developer runs `go generate` in the /go/config/generate directory (or via bazel) | V /go/config/generate/main.go: Calls jsonschema.GenerateSchema("../validate/instanceConfigSchema.json", &config.InstanceConfig{}) | V jsonschema library: Introspects the config.InstanceConfig struct and its fields. | V jsonschema library: Generates a JSON Schema definition. | V /go/config/generate/main.go: Writes the generated schema to /go/config/validate/instanceConfigSchema.json.
The design prioritizes robustness through schema and semantic validation, maintainability through structured Go types and centralized configuration, and ease of use through standard JSON format and command-line flag integration. The separation of schema generation (generate
subdirectory) and validation (validate
subdirectory) keeps concerns distinct.
The ctrace2
module provides the functionality to adapt trace data (represented as a series of floating-point values) for use with k-means clustering algorithms. The primary goal is to transform raw trace data into a format that is suitable for distance calculations and centroid computations, which are fundamental operations in k-means. This involves normalization and handling of missing data points.
In performance analysis, traces often represent measurements over time or across different configurations. Clustering these traces helps identify groups of similar performance characteristics. However, raw trace data might have issues that hinder effective clustering:
The ctrace2
module addresses these by:
vec32.Norm
function from the go/vec32
module is leveraged for this. Before normalization, any missing data points (vec32.MissingDataSentinel
) are filled in using vec32.Fill
, which likely interpolates or uses a similar strategy to replace them.minStdDev
parameter is used during normalization. If the calculated standard deviation of a trace is below this minimum, the minStdDev
value is used instead. This is a practical approach to handle traces with very little variation without excluding them from clustering.ClusterableTrace
Structure: This structure wraps the trace data (Key
and Values
) and implements the kmeans.Clusterable
and kmeans.Centroid
interfaces from the perf/go/kmeans
module. This makes ClusterableTrace
instances directly usable by the k-means algorithm.ctrace.go
: This is the core file of the module.ClusterableTrace
struct:Key
(a string identifier for the trace) and Values
(a slice of float32
representing the normalized data points).Distance(c kmeans.Clusterable) float64
method: Calculates the Euclidean distance between the current ClusterableTrace
and another ClusterableTrace
. This is crucial for the k-means algorithm to determine how similar two traces are. The calculation assumes that both traces have the same number of data points (a guarantee maintained by NewFullTrace
). For each point i in trace1 and trace2: diff_i = trace1.Values[i] - trace2.Values[i] squared_diff_i = diff_i * diff_i Sum all squared_diff_i Distance = Sqrt(Sum)
AsClusterable() kmeans.Clusterable
method: Returns the ClusterableTrace
itself, satisfying the kmeans.Centroid
interface requirement.Dup(newKey string) *ClusterableTrace
method: Creates a deep copy of the ClusterableTrace
with a new key. This is useful when you need to manipulate a trace without affecting the original.NewFullTrace(key string, values []float32, minStdDev float32) *ClusterableTrace
function:ClusterableTrace
instances from raw trace data.key
(string identifier), raw values
([]float32
), and a minStdDev
. 2. Creates a copy of the input values
to avoid modifying the original slice. 3. Calls vec32.Fill()
on the copied values. This step handles missing data points by filling them, likely through interpolation or a similar imputation technique provided by the go/vec32
module. 4. Calls vec32.Norm()
on the filled values, using minStdDev
. This normalizes the trace data so that its standard deviation is effectively 1.0 (or adjusted if the original standard deviation was below minStdDev
). 5. Returns a new ClusterableTrace
with the provided key
and the processed (filled and normalized) values
. Input: key, raw_values, minStdDev ------------------------------------ copied_values = copy(raw_values) filled_values = vec32.Fill(copied_values) normalized_values = vec32.Norm(filled_values, minStdDev) Output: ClusterableTrace{Key: key, Values: normalized_values}
CalculateCentroid(members []kmeans.Clusterable) kmeans.Centroid
function:kmeans.CalculateCentroid
function type. Given a slice of ClusterableTrace
instances (which are members of a cluster), it computes their centroid.float32
(mean
) with the same length as the Values
of the first member trace. 2. It iterates through each member trace in the members
slice. 3. For each member, it iterates through its Values
and adds each value to the corresponding element in the mean
slice. 4. After summing up all values component-wise, it divides each element in the mean
slice by the total number of members
to get the average value for each dimension. 5. It returns a new ClusterableTrace
representing the centroid. The key for this centroid trace is set to CENTROID_KEY
(“special_centroid”). Input: members (list of ClusterableTraces) ------------------------------------------ Initialize mean_values = [0.0, 0.0, ..., 0.0] (same length as members[0].Values) For each member_trace in members: For each i from 0 to len(member_trace.Values) - 1: mean_values[i] = mean_values[i] + member_trace.Values[i] For each i from 0 to len(mean_values) - 1: mean_values[i] = mean_values[i] / len(members) Output: ClusterableTrace{Key: CENTROID_KEY, Values: mean_values}
CENTROID_KEY
constant:The interaction with the go/vec32
module is crucial for data preprocessing (filling missing values and normalization), while the perf/go/kmeans
module provides the interfaces that ctrace2
implements to be compatible with k-means clustering algorithms.
The culprit
module is responsible for identifying, storing, and notifying about commits that are likely causes of performance regressions. It integrates with anomaly detection and subscription systems to automate the process of pinpointing culprits and alerting relevant parties.
store.go
& sqlculpritstore/sqlculpritstore.go
store.go
defines the Store
interface, which outlines the contract for culprit data operations like Get
, Upsert
, and AddIssueId
.sqlculpritstore/sqlculpritstore.go
provides a SQL-based implementation of this interface. It uses a SQL database (configured via pool.Pool
) to store culprit information.Upsert
method is crucial. It either inserts a new culprit record or updates an existing one if a commit has already been identified as a culprit for a different anomaly group. This prevents duplicate culprit entries for the same commit. It also links the culprit to the anomaly_group_id
.AddIssueId
method updates a culprit record to include the ID of an issue (e.g., a bug tracker ticket) that was created for it, and also maintains a map between the anomaly group and the issue ID. This is important for tracking and referencing.sqlculpritstore/schema/schema.go
) includes fields for commit details (host, project, ref, revision), associated anomaly group IDs, and associated issue IDs. An index on (revision, host, project, ref)
helps in efficiently querying for existing culprits.Store
) decouples the rest of the module from the specific database implementation, allowing for easier testing and potential future changes in the storage backend.Upsert
logic is designed to handle cases where the same commit might be identified as a culprit for multiple regressions (different anomaly groups). Instead of creating duplicate entries, it appends the new anomaly_group_id
to the existing record.group_issue_map
as JSONB allows flexible storage of the mapping between anomaly groups and the specific issue filed for that group in the context of this culprit.formatter/formatter.go
Formatter
interface with methods GetCulpritSubjectAndBody
(for new culprit notifications) and GetReportSubjectAndBody
(for new anomaly group reports).MarkdownFormatter
is the concrete implementation. It uses Go's text/template
package to render notification messages.InstanceConfig
. If not provided, default templates are used.TemplateContext
and ReportTemplateContext
provide the data that can be used within the templates (e.g., commit details, subscription information, anomaly group details).buildCommitURL
, buildAnomalyGroupUrl
, and buildAnomalyDetails
are available within the templates to construct URLs and format anomaly details.formatter/noop.go
: Provides a NoopFormatter
that generates empty subjects and bodies, useful for disabling notifications or for testing scenarios where actual formatting is not needed.transport/transport.go
Transport
interface with the SendNewNotification
method.IssueTrackerTransport
is the concrete implementation for interacting with an issue tracker (e.g., Google Issue Tracker/Buganizer).go.skia.org/infra/go/issuetracker/v1
client library.secret
package.SendNewNotification
is called, it constructs an issuetracker.Issue
object based on the provided subject, body, and subscription details (like component ID, priority, CCs, hotlists).SendNewNotificationSuccess
, SendNewNotificationFail
) are recorded to monitor the success rate of sending notifications.Transport
interface allows for different notification mechanisms to be plugged in (e.g., email, Slack) in the future.transport/noop.go
: Provides a NoopTransport
that doesn't actually send any notifications, useful for disabling notifications or for testing.notify/notify.go
Formatter
and a Transport
.CulpritNotifier
interface with methods NotifyCulpritFound
and NotifyAnomaliesFound
.DefaultCulpritNotifier
implements this interface. It takes a formatter.Formatter
and a transport.Transport
as dependencies.GetDefaultNotifier
factory function determines which Formatter
and Transport
to use based on the InstanceConfig.IssueTrackerConfig.NotificationType
. If NoneNotify
, it uses NoopFormatter
and NoopTransport
. If IssueNotify
, it sets up MarkdownFormatter
and IssueTrackerTransport
.NotifyCulpritFound
:GetCulpritSubjectAndBody
to get the message content.SendNewNotification
to send the message.NotifyAnomaliesFound
:GetReportSubjectAndBody
.SendNewNotification
.service/service.go
culprit.proto
. This is the main entry point for external systems (like a bisection service or an anomaly detection pipeline) to interact with the culprit module.pb.CulpritServiceServer
interface.anomalygroup.Store
, culprit.Store
, subscription.Store
, and notify.CulpritNotifier
.PersistCulprit
RPC:culpritStore.Upsert
to save the identified culprit commits and associate them with the anomaly_group_id
.anomalygroupStore.AddCulpritIDs
to link the newly created/updated culprit IDs back to the anomaly group. [Client (e.g., Bisection Service)] | v [PersistCulpritRequest {Commits, AnomalyGroupID}] | v [culpritService.PersistCulprit] | \ | `-> [culpritStore.Upsert(AnomalyGroupID, Commits)] -> Returns CulpritIDs | | | v `<----------------------- [anomalygroupStore.AddCulpritIDs(AnomalyGroupID, CulpritIDs)] | v [PersistCulpritResponse {CulpritIDs}] | v [Client]
GetCulprit
RPC:culpritStore.Get
to retrieve culprit details by their IDs.NotifyUserOfCulprit
RPC:culpritStore.Get
.AnomalyGroup
using anomalygroupStore.LoadById
.Subscription
associated with the anomaly group using subscriptionStore.GetSubscription
.notifier.NotifyCulpritFound
for each culprit to send a notification (e.g., file a bug).culpritStore.AddIssueId
to store the generated issue ID with the culprit and the specific anomaly group. [Client (e.g., Bisection Service after PersistCulprit)] | v [NotifyUserOfCulpritRequest {CulpritIDs, AnomalyGroupID}] | v [culpritService.NotifyUserOfCulprit] |-> [culpritStore.Get(CulpritIDs)] -> Culprits |-> [anomalygroupStore.LoadById(AnomalyGroupID)] -> AnomalyGroup | | -> [subscriptionStore.GetSubscription(AnomalyGroup.SubName, AnomalyGroup.SubRev)] -> Subscription | (For each Culprit in Culprits) | | | `-> [notifier.NotifyCulpritFound(Culprit, Subscription)] -> Returns IssueID | | | v | `-> [culpritStore.AddIssueId(Culprit.ID, IssueID, AnomalyGroupID)] | v [NotifyUserOfCulpritResponse {IssueIDs}] | v [Client]
NotifyUserOfAnomaly
RPC:AnomalyGroup
and its associated Subscription
.notifier.NotifyAnomaliesFound
to send a notification about the group of anomalies (e.g., file a summary bug). [Client (e.g., Anomaly Detection Service)] | v [NotifyUserOfAnomalyRequest {AnomalyGroupID, Anomalies[]}] | v [culpritService.NotifyUserOfAnomaly] |-> [anomalygroupStore.LoadById(AnomalyGroupID)] -> AnomalyGroup | | -> [subscriptionStore.GetSubscription(AnomalyGroup.SubName, AnomalyGroup.SubRev)] -> Subscription | `-> [notifier.NotifyAnomaliesFound(AnomalyGroup, Subscription, Anomalies[])] -> Returns IssueID | v [NotifyUserOfAnomalyResponse {IssueID}] | v [Client]
PrepareSubscription
is a helper function used to potentially override or mock subscription details for testing or during transitional phases before full sheriff configuration is active. This is a temporary measure.GetAuthorizationPolicy
) is currently set to allow unauthenticated access, which might need to be revisited for production environments.proto/v1/culprit_service.proto
Commit
: Represents a source code commit.Culprit
: Represents an identified culprit commit, including its ID, the commit details, associated anomaly group IDs, and issue IDs. It also includes group_issue_map
to track which issue was filed for which anomaly group in the context of this culprit.Anomaly
: Represents a detected performance anomaly (duplicated from anomalygroup service for potential independent evolution).PersistCulpritRequest
/Response
: For storing new culprits.GetCulpritRequest
/Response
: For retrieving existing culprits.NotifyUserOfAnomalyRequest
/Response
: For triggering notifications about a new set of anomalies (anomaly group).NotifyUserOfCulpritRequest
/Response
: For triggering notifications about newly identified culprits.Anomaly
message is duplicated from the anomalygroup
service. This choice was made to allow the culprit
service and anomalygroup
service to evolve their respective Anomaly
definitions independently if needed in the future, avoiding tight coupling.group_issue_map
in the Culprit
message is important for scenarios where a single culprit might be associated with multiple anomaly groups, and each of those (culprit, group) pairs might result in a distinct bug being filed.mocks/
subdirectories)culprit
module (e.g., Store
, Formatter
, Transport
, CulpritNotifier
, CulpritServiceServer
).mockery
.AnomalyGroup
.AnomalyGroup
.CulpritService.PersistCulprit
RPC with the identified Commit
(s) and the AnomalyGroupID
.culpritService
uses culpritStore.Upsert
to save these commits as Culprit
records, linking them to the AnomalyGroupID
.anomalygroupStore.AddCulpritIDs
to update the AnomalyGroup
record with the IDs of these new culprits.CulpritService.NotifyUserOfCulprit
RPC with the CulpritID
(s) and the AnomalyGroupID
.culpritService
retrieves the full Culprit
details and the associated Subscription
.DefaultCulpritNotifier
is invoked:MarkdownFormatter
generates the subject and body for the notification.IssueTrackerTransport
sends this formatted message to the issue tracker, creating a new bug.culpritService
calls culpritStore.AddIssueId
to associate this bug ID with the specific Culprit
and AnomalyGroupID
.This flow ensures that culprits are stored, linked to their regressions, and users are notified through the configured channels. The modular design allows for flexibility in how each step (storage, formatting, transport) is implemented.
The dataframe
module provides the DataFrame
data structure and related functionality for handling and manipulating performance trace data. It is a core component for querying, analyzing, and visualizing performance metrics within the Skia Perf system.
Key Design Principles:
DataFrame
encapsulates a types.TraceSet
, which is a map of trace keys to their corresponding performance values. It also maintains a paramtools.ReadOnlyParamSet
, which describes the unique parameter key-value pairs present in the TraceSet
. This allows for efficient filtering and aggregation based on trace characteristics.DataFrame
are defined by ColumnHeader
structs, each containing a commit offset and a timestamp. This ties the performance data directly to specific points in the codebase's history.DataFrameBuilder
interface decouples the DataFrame
creation logic from the underlying data source. This allows for different implementations to fetch data (e.g., from a database) while providing a consistent API for consumers.Join
), filtering traces (FilterOut
), slicing data (Slice
), and compressing data by removing empty columns (Compress
). These operations are designed with performance considerations in mind.Key Components and Files:
dataframe.go
: This is the central file defining the DataFrame
struct and its associated methods.
DataFrame
struct:TraceSet
: Stores the actual performance data, mapping trace keys (strings representing parameter combinations like “,arch=x86,config=8888,”) to types.Trace
(slices of float32 values).Header
: A slice of *ColumnHeader
pointers, defining the columns of the DataFrame. Each ColumnHeader
links a column to a specific commit (Offset
) and its Timestamp
.ParamSet
: A paramtools.ReadOnlyParamSet
that contains all unique key-value pairs from the keys in TraceSet
. This is crucial for understanding the dimensions of the data and for building UI controls for filtering. It's rebuilt by BuildParamSet()
.Skip
: An integer indicating if any commits were skipped during data retrieval to keep the DataFrame size manageable (related to MAX_SAMPLE_SIZE
).DataFrameBuilder
interface: Defines the contract for objects that can construct DataFrame
instances. This allows for different data sources or retrieval strategies. Key methods include:NewFromQueryAndRange
: Creates a DataFrame based on a query and a time range.NewFromKeysAndRange
: Creates a DataFrame for specific trace keys over a time range.NewNFromQuery
/ NewNFromKeys
: Creates a DataFrame with the N most recent data points for matching traces or specified keys.NumMatches
/ PreflightQuery
: Used to estimate the size of the data that a query will return, often for UI feedback or to refine queries.ColumnHeader
struct: Represents a single column in the DataFrame, typically corresponding to a commit. It contains:Offset
: A types.CommitNumber
identifying the commit.Timestamp
: The timestamp of the commit in seconds since the Unix epoch.NewEmpty()
: Creates an empty DataFrame.NewHeaderOnly()
: Creates a DataFrame with populated headers (commits within a time range) but no trace data. This can be useful for setting up the structure before fetching actual data.FromTimeRange()
: Retrieves commit information (headers and commit numbers) for a given time range from a perfgit.Git
instance. This is a foundational step in populating the Header
of a DataFrame.MergeColumnHeaders()
: A utility function that takes two slices of ColumnHeader
and merges them into a single sorted slice, returning mapping indices to reconstruct traces. This is essential for the Join
operation.Join()
: Combines two DataFrames into a new DataFrame. It merges their headers and trace data. If traces exist in one DataFrame but not the other for a given key, missing data points (vec32.MissingDataSentinel
) are inserted. The ParamSet
of the resulting DataFrame is the union of the input ParamSets. DataFrame A (Header: [C1, C3], TraceX: [v1, v3]) | V DataFrame B (Header: [C2, C3], TraceX: [v2', v3']) | V Joined DataFrame (Header: [C1, C2, C3], TraceX: [v1, v2', v3/v3']) (TraceY from A or B padded with missing data)
BuildParamSet()
: Recalculates the ParamSet
for a DataFrame based on the current keys in its TraceSet
. This is called after operations like FilterOut
that might change the set of traces.FilterOut()
: Removes traces from the TraceSet
based on a provided TraceFilter
function. It then calls BuildParamSet()
to update the ParamSet
.Slice()
: Returns a new DataFrame that is a view into a sub-section of the original DataFrame's columns. The underlying trace data is sliced, not copied, for efficiency.Compress()
: Creates a new DataFrame by removing any columns (and corresponding data points in traces) that contain only missing data sentinels across all traces. This helps in reducing data size and focusing on relevant data points.dataframe_test.go
: Contains unit tests for the functionality in dataframe.go
. These tests cover various scenarios, including empty DataFrames, different merging and joining cases, filtering, slicing, and compression. The tests often use gittest
for creating mock Git repositories to test time range queries.
/go/dataframe/mocks/DataFrameBuilder.go
: This file contains a mock implementation of the DataFrameBuilder
interface, generated using the testify/mock
library. This mock is used in tests of other packages that depend on DataFrameBuilder
, allowing them to simulate DataFrame creation without needing a real data source or Git repository.
Workflows:
Fetching Data for Display/Analysis:
DataFrameBuilder
(e.g., one that queries a CockroachDB instance) uses NewFromQueryAndRange
.FromTimeRange
(which calls perfgit.Git.CommitSliceFromTimeRange
). This populates the Header
.TraceSet
.ParamSet
using BuildParamSet()
.DataFrame
is returned.Client Request (Query, TimeRange) | V DataFrameBuilder.NewFromQueryAndRange(ctx, begin, end, query, ...) | +-> FromTimeRange(ctx, git, begin, end, ...) // Get commit headers | | | V | perfgit.Git.CommitSliceFromTimeRange() | | | V | [ColumnHeader{Offset, Timestamp}, ...] | +-> DataSource.QueryTraces(query, commit_numbers) // Fetch trace data | | | V | types.TraceSet | +-> DataFrame.BuildParamSet() // Populate ParamSet | V DataFrame{Header, TraceSet, ParamSet}
Joining DataFrames (e.g., from different sources or queries):
DataFrame
instances, dfA
and dfB
, are available.Join(dfA, dfB)
is called.MergeColumnHeaders(dfA.Header, dfB.Header)
creates a unified header and maps to align traces.TraceSet
is built. For each key:dfA
but not dfB
, its trace is copied, padded with missing values for columns unique to dfB
.dfB
but not dfA
, its trace is copied, padded with missing values for columns unique to dfA
.ParamSet
s of dfA
and dfB
are combined.DataFrame
is returned.Filtering Data:
DataFrame
df
exists.TraceFilter
function myFilter
is defined (e.g., to remove traces with all zero values).df.FilterOut(myFilter)
is called.df.TraceSet
. If myFilter
returns true
for a trace, that trace is deleted from the TraceSet
.df.BuildParamSet()
is called to reflect the potentially reduced set of parameters.Constants:
DEFAULT_NUM_COMMITS
: Default number of commits to fetch when using methods like NewNFromQuery
. Set to 50.MAX_SAMPLE_SIZE
: A limit on the number of commits (columns) a DataFrame might contain, especially when downsampling. Set to 5000. (Note: The downsample
parameter in FromTimeRange
is currently ignored, meaning this might not be strictly enforced by that specific function directly but could be a target for other parts of the system or future enhancements.)The dfbuilder
module is responsible for constructing DataFrame
objects. DataFrames
are fundamental data structures in Perf, representing a collection of performance traces (time series data) along with their associated parameters and commit information. This module acts as an intermediary between the raw trace data stored in a TraceStore
and the higher-level analysis and visualization components that consume DataFrames
.
The core design revolves around efficiently fetching and organizing trace data based on various querying criteria. This involves interacting with a perfgit.Git
instance to resolve commit ranges and timestamps, and a tracestore.TraceStore
to retrieve the actual trace data.
Key Responsibilities and Components:
dfbuilder.go
: This is the central file implementing the DataFrameBuilder
interface.builder
struct: This struct holds the necessary dependencies like perfgit.Git
, tracestore.TraceStore
, tracecache.TraceCache
, and configuration parameters (e.g., tileSize
, numPreflightTiles
, QueryCommitChunkSize
). It also maintains metrics for various DataFrame construction operations.NewDataFrameBuilderFromTraceStore
): Initializes a builder
instance. An important configuration here is filterParentTraces
. If enabled, the builder will attempt to remove redundant parent traces when child traces (more specific traces) exist. For example, if traces for test=foo,subtest=bar
and test=foo
both exist, the latter might be filtered out if filterParentTraces
is true.NewFromQueryAndRange
):config=8888
) within a given time period.dataframe.FromTimeRange
(which internally queries perfgit.Git
) to get a list of ColumnHeader
(commit information) and CommitNumber
s within the specified time range. It also handles downsampling if requested. 2. It then determines the relevant tiles to query from the TraceStore
based on the commit numbers (sliceOfTileNumbersFromCommits
). 3. The core data fetching happens in the new
method. This method queries the TraceStore
for matching traces per tile concurrently using errgroup.Group
for parallelism. This is a key optimization to speed up data retrieval, especially over large time ranges spanning multiple tiles. 4. A tracesetbuilder.TraceSetBuilder
is used to efficiently aggregate the traces fetched from different tiles into a single types.TraceSet
and paramtools.ParamSet
. 5. Finally, it constructs and returns a compressed DataFrame
. NewFromQueryAndRange | -> dataframe.FromTimeRange (get commits in time range from Git) | -> sliceOfTileNumbersFromCommits (determine tiles to query) | -> new (concurrently query TraceStore for each tile) | -> TraceStore.QueryTraces (for each tile) | -> tracesetbuilder.Add (aggregate results) | -> tracesetbuilder.Build | -> DataFrame.Compress
NewFromKeysAndRange
):NewFromQueryAndRange
in terms of getting commit information for the time range. However, instead of querying by a query.Query
object, it directly calls TraceStore.ReadTraces
for each relevant tile, providing the list of trace keys. Results are then aggregated. This is generally faster if the exact trace keys are known as it avoids the overhead of query parsing and matching within the TraceStore
.NewNFromQuery
, NewNFromKeys
):QueryCommitChunkSize
if configured), until N
data points are collected for the matching traces. 1. It starts from a given end
time (or the latest commit if end
is zero). 2. It determines an initial beginIndex
and endIndex
for commit numbers. The QueryCommitChunkSize
can influence this beginIndex
to fetch a larger chunk of commits at once, potentially improving parallelism in the new
method. 3. In a loop: - It fetches commit headers and indices for the current beginIndex
-endIndex
range. - It calls the new
method (for NewNFromQuery
) or a similar tile-based fetching logic (for NewNFromKeys
) to get a DataFrame
for this smaller range. - It counts non-missing data points in the fetched DataFrame
. If no data is found for maxEmptyTiles
consecutive attempts, it stops to prevent searching indefinitely through sparse data. - It appends the data from the fetched DataFrame
to the result DataFrame
, working backward from the N
th slot. - It then adjusts beginIndex
and endIndex
to move to the previous chunk of commits/tiles. 4. If filterParentTraces
is enabled, it calls filterParentTraces
to remove redundant parent traces from the final TraceSet
. 5. The resulting DataFrame
might have traces of length less than N
if not enough data points were found. It trims the traces if necessary. NewNFromQuery (or NewNFromKeys) | -> findIndexForTime (get commit number for 'end' time) | -> Loop (until N points are found or maxEmptyTiles reached): | -> fromIndexRange (get commits for current chunk) | -> new (or similar logic for keys) (fetch data for this chunk) | -> Aggregate data into result DataFrame | -> Update beginIndex/endIndex to previous chunk | -> [Optional] filterParentTraces | -> Trim traces if fewer than N points found
PreflightQuery
):DataFrame
, it's useful to get an estimate of how many traces will match and what the resulting ParamSet
will look like. This allows UIs to present filter options dynamically.TraceStore
. 2. It queries the numPreflightTiles
most recent tiles (concurrently) for trace IDs matching the query q
. This uses getTraceIds
, which first attempts to fetch from tracecache
and falls back to TraceStore.QueryTracesIDOnly
. 3. The trace IDs (which are paramtools.Params
) found are used to build up a ParamSet
. 4. The count of matching traces from the tile with the most matches is taken as the estimated count. 5. Crucially, for parameter keys present in the input query q
, it replaces the values in the computed ParamSet
with all values for those keys from the referenceParamSet
. This ensures that the UI can still offer all possible filter options for parameters the user has already started filtering on. 6. The resulting ParamSet
is normalized. PreflightQuery | -> TraceStore.GetLatestTile | -> Loop (for numPreflightTiles, concurrently): | -> getTraceIds (TileN, query) // Checks tracecache first, then TraceStore.QueryTracesIDOnly | -> [If cache miss] TraceStore.QueryTracesIDOnly | -> [If cache miss & tracecache enabled] tracecache.CacheTraceIds | -> Aggregate Params into a new ParamSet -> Update max count | -> Update ParamSet with values from referenceParamSet for keys in the original query | -> Normalize ParamSet
NumMatches
):PreflightQuery
that only returns the estimated number of matching traces.TraceStore.QueryTracesIDOnly
and returns the higher of the two counts.filterParentTraces
function):tracefilter.NewTraceFilter()
. For each trace key in the input TraceSet
:paramtools.Params
.traceFilter
.traceFilter.GetLeafNodeTraceKeys()
returns only the keys corresponding to the most specific (leaf) traces in the hierarchical structure implied by the paths.TraceSet
is built containing only these leaf node traces.getTraceIds
, cacheTraceIdsIfNeeded
):QueryTracesIDOnly
can still be somewhat expensive if performed frequently on the same tiles and queries (e.g., during PreflightQuery
). Caching the results (the list of matching trace IDs/params) can significantly speed this up.getTraceIds
function first attempts to retrieve trace IDs from the tracecache.TraceCache
. If there's a cache miss or the cache is not configured, it queries the TraceStore
. If a database query was performed and the cache is configured, cacheTraceIdsIfNeeded
is called to store the results in the cache for future requests. The cache key is typically a combination of the tile number and the query string.Design Choices and Trade-offs:
TraceStore
organizes data into tiles. Most dfbuilder
operations that involve fetching data across a range of commits are designed to process these tiles concurrently. This improves performance by parallelizing I/O and computation.tracesetbuilder
: This utility is used to efficiently merge trace data coming from different tiles (which might have different sets of commits) into a coherent TraceSet
and ParamSet
.QueryCommitChunkSize
: This parameter in NewNFromQuery
allows fetching data in larger chunks than a single tile. This can increase parallelism in the underlying new
method call, but fetching too large a chunk might lead to excessive memory usage or longer latency for the first chunk.maxEmptyTiles
/ newNMaxSearch
: When searching backward for N data points, these constants prevent indefinite searching if the data is very sparse or the query matches very few traces.singleTileQueryTimeout
: This guards against queries on individual tiles taking too long, which could happen with “bad” tiles containing excessive data or due to backend issues. This is particularly important for operations like NewNFromQuery
or PreflightQuery
which might issue many such single-tile queries.PreflightQuery
: PreflightQuery
is often called by UIs to populate filter options. Caching the results of QueryTracesIDOnly
(which provides the raw data for ParamSet
construction in preflight) via tracecache
helps make these UI interactions faster.The dfbuilder_test.go
file provides comprehensive unit tests for these functionalities, covering various scenarios including empty queries, queries matching data in single or multiple tiles, N-point queries, and preflight operations with and without caching. It uses gittest
for creating a mock Git history and sqltest
(for Spanner) or mock implementations for the TraceStore
and TraceCache
.
dfiter
Module DocumentationThe dfiter
module is responsible for efficiently creating and providing dataframe.DataFrame
objects, which are fundamental data structures used in regression detection within the Perf application. It acts as an iterator, allowing consuming code to process DataFrames one by one. This is particularly useful for performance reasons, as constructing and holding all possible DataFrames in memory simultaneously could be resource-intensive.
The core purpose of dfiter
is to abstract away the complexities of fetching and structuring data from the underlying trace store and Git history. It ensures that DataFrames are generated with the correct dimensions and data points based on user-defined queries, commit ranges, and alert configurations.
The dfiter
module employs a “slicing” strategy for generating DataFrames. This means it typically fetches a larger, encompassing DataFrame from the dataframe.DataFrameBuilder
and then yields smaller, overlapping sub-DataFrames.
Why this approach?
DataFrameBuilder
) is often more efficient than making numerous small queries. The slicing operation itself is a relatively cheap in-memory operation.Key Components and Responsibilities:
DataFrameIterator
Interface:Next() bool
: Advances the iterator to the next DataFrame. Returns true
if a next DataFrame is available, false
otherwise.Value(ctx context.Context) (*dataframe.DataFrame, error)
: Returns the current DataFrame.dataframeSlicer
struct:DataFrameIterator
. It embodies the slicing strategy described above.dataframe.DataFrame
(df
), the desired size
of the sliced DataFrames (determined by alert.Radius
), and the current offset
for slicing. The Next()
method checks if another slice of the specified size
can be made, and Value()
performs the actual slicing using df.Slice()
.NewDataFrameIterator
Function:DataFrameIterator
instances. It encapsulates the logic for determining how the initial, larger DataFrame should be fetched based on the input parameters.queryAsString
into a query.Query
object.domain.Offset
: - domain.Offset == 0
(Continuous/Sliding Window Mode): - This mode is typically used for ongoing regression detection across a range of recent commits. - It fetches a DataFrame of domain.N
commits ending at domain.End
. - Settling Time: If anomalyConfig.SettlingTime
is configured, it adjusts domain.End
to exclude very recent data points that might not have “settled” (e.g., due to data ingestion delays or pending backfills). This prevents alerts on potentially incomplete or volatile fresh data. - The dataframeSlicer
will then produce overlapping DataFrames of size 2*alert.Radius + 1
. - domain.Offset != 0
(Specific Commit/Exact DataFrame Mode): - This mode is used when analyzing a specific commit or a small, fixed window around it (e.g., when a user clicks on a specific point in a chart to see its details or re-runs detection for a particular regression). - It aims to return a single DataFrame. - The size of this DataFrame is 2*alert.Radius + 1
. - To determine the End
time for fetching data, it calculates the commit alert.Radius
positions after the domain.Offset
. This ensures the commit at domain.Offset
is centered within the radius. For example, if domain.Offset
is commit 21 and alert.Radius
is 3, it will fetch data up to commit 24 (21 + 3
). The resulting DataFrame will then contain commits [18, 19, 20, 21, 22, 23, 24]
. This is a specific requirement to ensure consistency with how different step detection algorithms expect their input DataFrames.dataframe.DataFrameBuilder
(dfBuilder
) to construct the initial DataFrame (dfBuilder.NewNFromQuery
). This involves querying the trace store and potentially Git history.2*alert.Radius + 1
commits). If not, it returns ErrInsufficientData
. This is crucial because regression detection algorithms require a minimum amount of data to operate correctly.metrics2.GetCounter("perf_regression_detection_floats")
. This helps in monitoring the data processing load.dataframeSlicer
instance initialized with the fetched DataFrame and the calculated slice size.ErrInsufficientData
:1. Continuous Regression Detection (Sliding Window):
This typically happens when domain.Offset
is 0.
[Caller] [NewDataFrameIterator] [DataFrameBuilder] | -- Request with query, domain (N, End), alert (Radius) --> | | | | -- Parse query | | | -- (If anomalyConfig.SettlingTime > 0) Adjust domain.End --> | | | -- dfBuilder.NewNFromQuery(ctx, domain.End, q, domain.N) --> | | | | -- Query TraceStore | | | -- Build large DataFrame | | | <----- DataFrame (df) | | -- Check if len(df.Header) >= 2*Radius+1 | | | -- (If insufficient) Return ErrInsufficientData ----------- | | | -- Create dataframeSlicer(df, size=2*Radius+1, offset=0) | | <----------------- DataFrameIterator (slicer) ------------- | | [Caller] [dataframeSlicer] | | | -- it.Next() ---------------------------------------------> | | | -- return offset+size <= len(df.Header) | <------------------------------ true ---------------------- | | -- it.Value() --------------------------------------------> | | | -- subDf = df.Slice(offset, size) | | -- offset++ | <-------------------------- subDf, nil -------------------- | | -- (Process subDf) | | ... (loop Next()/Value() until Next() returns false) ... |
2. Specific Commit Analysis (Exact DataFrame):
This typically happens when domain.Offset
is non-zero.
[Caller] [NewDataFrameIterator] [Git] [DataFrameBuilder] | -- Request with query, domain (Offset), alert (Radius) --> | | | | | -- Parse query | | | | -- targetCommitNum = domain.Offset + alert.Radius | | | | -- perfGit.CommitFromCommitNumber(targetCommitNum) ------> | | | | | -- Lookup commit | | | <----------------------------- commitDetails, nil --------- | | | | -- dfBuilder.NewNFromQuery(ctx, commitDetails.Timestamp, | | | | q, n=2*Radius+1) ------------> | | | | | | -- Query TraceStore | | | | -- Build DataFrame (size 2*R+1) | | <-------------------------------------------------------- DataFrame (df) ----- | | | -- Check if len(df.Header) >= 2*Radius+1 | | | | -- (If insufficient) Return ErrInsufficientData --------- | | | | -- Create dataframeSlicer(df, size=2*Radius+1, offset=0) | | | <----------------------- DataFrameIterator (slicer) ------ | | | [Caller] [dataframeSlicer] | | | -- it.Next() ---------------------------------------------> | | | -- return offset+size <= len(df.Header) (true for the first call) | <------------------------------ true ---------------------- | | -- it.Value() --------------------------------------------> | | | -- subDf = df.Slice(offset, size) (returns the whole df) | | -- offset++ | <-------------------------- subDf, nil -------------------- | | -- (Process subDf) | | -- it.Next() ---------------------------------------------> | | | -- return offset+size <= len(df.Header) (false for subsequent calls) | <------------------------------ false --------------------- |
This design allows for flexible and efficient generation of DataFrames tailored to the specific needs of regression detection, whether it's scanning a wide range of recent commits or focusing on a particular point in time. The use of an iterator pattern also helps manage memory consumption by processing DataFrames sequentially.
The dryrun
module provides the capability to test an alert configuration and preview the regressions it would identify without actually creating an alert or sending notifications. This is a crucial tool for developers and performance engineers to fine-tune alert parameters and ensure they accurately capture relevant performance changes.
The core idea is to simulate the regression detection process for a given alert configuration over a historical range of data. This allows users to iterate on alert definitions, observe the potential impact of those definitions, and avoid alert fatigue caused by poorly configured alerts.
The primary responsibility of the dryrun
module is to handle HTTP requests for initiating and reporting the progress of these alert simulations.
dryrun.go
: This is the heart of the dryrun
module. It defines the Requests
struct, which manages the state and dependencies required for processing dry run requests. It also contains the HTTP handler (StartHandler
) that orchestrates the dry run process.
Requests
struct:
Why: Encapsulates all necessary dependencies (like perfgit.Git
for Git interactions, shortcut.Store
for shortcut lookups, dataframe.DataFrameBuilder
for data retrieval, progress.Tracker
for reporting progress, and regression.ParamsetProvider
for accessing parameter sets) into a single unit. This promotes modularity and makes it easier to manage and test the dry run functionality.
How: It is instantiated via the New
function, which takes these dependencies as arguments. This allows for dependency injection, making the component more testable and flexible.
StartHandler
function:
Why: This is the entry point for initiating a dry run. It handles the incoming HTTP request, validates the alert configuration, and kicks off the asynchronous regression detection process.
How:
progress.Tracker
to allow clients to monitor the status of the long-running dry run operation.detectorResponseProcessor
callback function. This function is invoked by the underlying regression.ProcessRegressions
function whenever potential regressions are found.regression
module to focus on detection, while the dryrun
module handles the presentation and progress updates for the dry run scenario.ClusterResponse
objects from the regression detection, converts them into user-friendly RegressionAtCommit
structures (which include commit details and the detected regression), and updates the Progress
object with these results. This enables real-time feedback to the user as regressions are identified.regression.ProcessRegressions
function is then called in the goroutine, passing the alert request, the callback, and other necessary dependencies. This function iterates through the relevant data, applies the alert's clustering and detection logic, and invokes the callback for each identified cluster.Progress
object, allowing the client to start polling for updates.RegressionAtCommit
struct:
Why: Provides a structured way to represent a regression found at a specific commit. This includes both the commit information (CID
) and the details of the regression itself (Regression
).
How: It's a simple struct used for marshalling the results into JSON for the client.
Client (UI/API) --HTTP POST /dryrun/start with AlertConfig--> Requests.StartHandler | V [Validate AlertConfig] | +----------------------------------+----------------------------------+ | (Validation Fails) | (Validation Succeeds) V V [Update Progress with Error] [Add to Progress Tracker] | | V V Respond to Client with Error Progress Launch Goroutine: regression.ProcessRegressions(...) | V [Iterate through data, detect regressions] | V For each potential regression cluster: Invoke `detectorResponseProcessor` callback | V Callback: [Convert ClusterResponse to RegressionAtCommit] [Update Progress with new RegressionAtCommit] | V (Client polls for Progress updates) | V When ProcessRegressions completes: [Update Progress: Finished or Error]
The StartHandler
effectively acts as a controller that receives the request, performs initial setup and validation, and then delegates the heavy lifting of regression detection to the regression.ProcessRegressions
function, ensuring the HTTP request can return quickly while the background processing continues. The callback mechanism allows the dryrun
module to react to findings from the regression
module in a way that's specific to the dry run use case (i.e., accumulating and formatting results for client display).
The favorites
module provides functionality for users to save and manage “favorite” configurations or views within the Perf application. This allows users to quickly return to specific data explorations or commonly used settings.
The core design philosophy is to provide a persistent storage mechanism for user-specific preferences related to application state (represented as URLs). This is achieved through a Store
interface, which abstracts the underlying data storage, and a concrete SQL-based implementation.
store.go
: This file defines the central Store
interface.
List
operation to retrieve all favorites for a specific user and a Liveness
check.Favorite
: This struct represents a single favorite item, containing fields like ID
, UserId
, Name
, Url
, Description
, and LastModified
. The Url
is a key piece of data, as it allows the application to reconstruct the state the user wants to save.SaveRequest
: This struct is used for creating and updating favorites, encapsulating the data needed for these operations, notably excluding the ID
(which is generated or already known) and LastModified
(which is handled by the store).Liveness
method is a bit of an outlier. It's used to check the health of the database connection. It was placed in this store “arbitrarily because of its lack of essential function” in other more critical stores, making it a relatively safe place to perform this check without impacting core performance data operations.sqlfavoritestore/sqlfavoritestore.go
: This file provides the SQL implementation of the Store
interface.
Store
interface. These statements interact with a Favorites
table.FavoriteStore
struct holds a database connection pool (pool.Pool
).Get
, Create
, Update
, Delete
, and List
execute their corresponding SQL statements against the database.LastModified
) are handled automatically during create and update operations to track when a favorite was last changed.skerr.Wrapf
to provide context to any database errors.sqlfavoritestore/schema/schema.go
: This file defines the SQL schema for the Favorites
table.
FavoriteSchema
struct uses struct tags (sql:"..."
) to define column names, types, constraints (like PRIMARY KEY
, NOT NULL
), and indexes. The byUserIdIndex
is crucial for efficiently listing favorites for a specific user.mocks/Store.go
: This file contains a generated mock implementation of the Store
interface.
Store
interface. They allow tests to simulate different store behaviors (e.g., successful operations, errors) without requiring an actual database connection.mockery
tool. It provides a Store
struct that embeds mock.Mock
from the testify
library. Each method of the interface has a corresponding mock function that can be configured to return specific values or errors.1. Creating a New Favorite:
User Action (e.g., clicks "Save as Favorite" in UI) | V Application Handler | V [favorites.Store.Create] is called with user ID, name, URL, description | V [sqlfavoritestore.FavoriteStore.Create] | V Generates current timestamp for LastModified | V Executes INSERT SQL statement: INSERT INTO Favorites (user_id, name, url, description, last_modified) VALUES (...) | V Database stores the new favorite record | V Returns success/error to Application Handler
2. Listing User's Favorites:
User navigates to "My Favorites" page | V Application Handler | V [favorites.Store.List] is called with the current user's ID | V [sqlfavoritestore.FavoriteStore.List] | V Executes SELECT SQL statement: SELECT id, name, url, description FROM Favorites WHERE user_id=$1 | V Database returns rows matching the user ID | V [sqlfavoritestore.FavoriteStore.List] scans rows into []*favorites.Favorite | V Returns list of favorites to Application Handler | V UI displays the list
3. Retrieving a Specific Favorite (e.g., when a user clicks on a favorite to load it):
User clicks on a specific favorite in their list | V Application Handler (obtains favorite ID) | V [favorites.Store.Get] is called with the favorite ID | V [sqlfavoritestore.FavoriteStore.Get] | V Executes SELECT SQL statement: SELECT id, user_id, name, url, description, last_modified FROM Favorites WHERE id=$1 | V Database returns the single matching favorite row | V [sqlfavoritestore.FavoriteStore.Get] scans row into a *favorites.Favorite struct | V Returns the favorite object to Application Handler | V Application uses the `Url` from the favorite object to restore the application state
The file
module and its submodules are responsible for providing a unified interface for accessing files from different sources, such as local directories or Google Cloud Storage (GCS). This abstraction allows the Perf ingestion system to treat files consistently regardless of their origin.
The central idea is to define a file.Source
interface that abstracts the origin of files. Implementations of this interface are then responsible for monitoring their respective sources (e.g., a GCS bucket via Pub/Sub notifications, or a local directory) and emitting file.File
objects through a channel when new files become available.
The file.File
struct encapsulates the essential information about a file: its name, an io.ReadCloser
for its contents, its creation timestamp, and optionally, the associated pubsub.Message
if the file originated from a GCS Pub/Sub notification. This optional field is crucial for acknowledging the message after successful processing, or nack'ing it if an error occurs, ensuring reliable message handling in a distributed system.
file.go
This file defines the core File
struct and the Source
interface.
File
struct: Represents a single file.
Name
: The identifier for the file (e.g., gs://bucket/object
or a local path).Contents
: An io.ReadCloser
to read the file's content. This design allows for streaming file data, which is memory-efficient, especially for large files. The consumer is responsible for closing this reader.Created
: The timestamp when the file was created or last modified (depending on the source).PubSubMsg
: A pointer to a pubsub.Message
. This is populated if the file notification came from a Pub/Sub message (e.g., GCS object change notifications). It's used to Ack
or Nack
the message, indicating successful processing or a desire to retry/dead-letter.Source
interface: Defines the contract for file sources.
Start(ctx context.Context) (<-chan File, error)
: This method initiates the process of watching for new files. It returns a read-only channel (<-chan File
) through which File
objects are sent as they are discovered. The method is designed to be called only once per Source
instance. This design ensures that the resource setup and monitoring logic (like starting a Pub/Sub subscription listener or initiating a directory walk) is done once.file.Source
dirsource
The dirsource
submodule provides an implementation of file.Source
that reads files from a local filesystem directory.
New(dir string)
: Constructs a DirSource
for a given directory path. It resolves the path to an absolute path.Start(_ context.Context)
: When called, it initiates a filepath.Walk
over the specified directory.file.File
object.ModTime
of the file is used as the Created
timestamp, which is a known simplification for its intended use cases.file.File
objects are sent to an unbuffered channel.New(directory) -> DirSource instance | V DirSource.Start() --> Goroutine starts | V filepath.Walk(directory) | +----------------------+ | | V V For each file: For each directory: os.Open(path) (skip) Create file.File{Name, Contents, ModTime} Send file.File to channel | V Caller receives file.File from channel
gcssource
The gcssource
submodule implements file.Source
for files stored in Google Cloud Storage, using Pub/Sub notifications for new file events.
New(ctx context.Context, instanceConfig *config.InstanceConfig)
:Subscription
name from instanceConfig
or generate one based on the Topic
(often adding a suffix like -prod
or using a round-robin scheme for load distribution if multiple ingester instances are running).sub.Subscription
object to manage receiving messages from the configured Pub/Sub topic/subscription. A key configuration here is ReceiveSettings.MaxExtension = -1
. This disables automatic ack deadline extension by the Pub/Sub client library. The rationale is that the gcssource
itself will explicitly Ack
or Nack
messages. If automatic extension were enabled and the processing of a file took longer than the extension period, the message might be redelivered while still being processed, leading to duplicate processing or other issues. By disabling it, the ingester has full control over the message lifecycle.filter.Filter
based on AcceptIfNameMatches
and RejectIfNameMatches
regular expressions provided in the instanceConfig
. This allows for fine-grained control over which files are processed based on their GCS object names.Start(ctx context.Context)
:file.File
objects.s.subscription.Receive(ctx, s.receiveSingleEventWrapper)
.Receive
method blocks until a message is available or the context is cancelled.receiveSingleEventWrapper
is called for each Pub/Sub message.receiveSingleEventWrapper
and receiveSingleEvent
):Data
is expected to be a JSON payload describing a GCS object event (specifically, {"bucket": "...", "name": "..."}
).gs://
URI is constructed from the bucket and name.filter.Filter
(configured with regexes) is applied. If the filename is rejected, the message is acked (as there's no point retrying), and processing stops for this event.Sources
list in instanceConfig.IngestionConfig.SourceConfig.Sources
. These are typically gs://
prefixes. If the filename doesn‘t match any of these prefixes, it’s considered an unexpected file, the message is acked, and processing stops. This ensures that the ingester only processes files from explicitly configured GCS locations.obj.Attrs(ctx)
is called to get metadata like the creation time. If this fails (e.g., object deleted between notification and processing, or transient network error), the message is nacked (if dead-lettering is not enabled) or handled by the dead-letter policy, as retrying might succeed.obj.NewReader(ctx)
is called to get an io.ReadCloser
for the file's content. If this fails, the message is nacked (or dead-lettered).file.File
: A file.File
struct is created with the GCS path, the reader, the attrs.Created
time, and the original pubsub.Message
. This file.File
is sent to the fileChannel
.receiveSingleEvent
function returns true
if the initial stages of processing (up to sending to the channel) were successful and the message should be acked from Pub/Sub's perspective (meaning it was valid, filtered appropriately, and the object was accessible). It returns false
for transient errors where a retry might help (e.g., failing to get object attributes or reader).receiveSingleEventWrapper
then uses this boolean:s.deadLetterEnabled
):receiveSingleEvent
returned false
(transient error or should retry), the message is Nack()
-ed. This typically sends it to a dead-letter topic if configured, or allows Pub/Sub to redeliver it after a backoff.receiveSingleEvent
returned true
, the message is not explicitly Ack()
-ed here. The acknowledgement is deferred to the consumer of the file.File
(i.e., the ingester). This is a critical design choice: the message is only truly “done” when the file content has been fully processed by the downstream system.receiveSingleEvent
returned true
, the message is Ack()
-ed.receiveSingleEvent
returned false
, the message is Nack()
-ed.gcssource
itself doesn‘t always immediately Ack
messages upon successful GCS interaction. Instead, it passes the *pubsub.Message
along in the file.File
struct. This allows the ultimate consumer of the file’s content (e.g., the Perf ingestion pipeline) to Ack
the message only after it has successfully processed and stored the data. This provides end-to-end processing guarantees. If processing fails downstream, the message can be Nack
-ed, leading to a retry or dead-lettering.filter.Filter
and prefix-based SourceConfig.Sources
) ensure that only desired files are processed.Ack
(e.g., file explicitly filtered out) and those that warrant a Nack
(e.g., transient GCS errors), especially when dead-letter queues are in use.maxParallelReceives
) for Pub/Sub messages, although currently set to 1. This can be tuned for performance.New(config) -> GCSSource instance (GCS/PubSub clients, filter initialized) | V GCSSource.Start() --> Goroutine starts PubSub subscription.Receive loop | V PubSub message arrives | V receiveSingleEventWrapper(msg) | V receiveSingleEvent(msg) | +-> Deserialize msg data (JSON: bucket, name) -> Error? Ack, return. | +-> Filter filename (regex) -> Rejected? Ack, return. | +-> Check if filename matches config.Sources prefixes -> No match? Ack, return. | +-> GCS: storageClient.Object(bucket, name).Attrs() -> Error? Nack (retryable), return. | +-> GCS: object.NewReader() -> Error? Nack (retryable), return. | V Create file.File{Name, Contents, Created, PubSubMsg: msg} Send file.File to fileChannel | V Caller receives file.File from channel (Caller later Acks/Nacks msg via file.File.PubSubMsg)
This modular approach to file sourcing makes the Perf ingestion system flexible and easier to test and maintain. New file sources can be added by simply implementing the file.Source
interface.
The filestore
module provides an abstraction layer for interacting with different file storage systems. It defines a common interface, leveraging Go's io/fs.FS
, allowing the application to read files regardless of whether they are stored locally or in a cloud storage service like Google Cloud Storage (GCS). This design promotes flexibility and testability by decoupling file access logic from the specific storage implementation.
The primary goal is to enable Perf, the performance monitoring system, to seamlessly access data files from various sources. Perf often deals with large datasets and trace files, which might be stored in GCS for scalability and durability or locally during development and testing. By using this module, Perf components can be written to consume data using the standard fs.FS
interface without needing to know the underlying storage details.
Key components:
local
: This submodule provides an implementation of fs.FS
for the local file system.
local.New(rootDir string)
function initializes a filesystem
struct. This struct stores the absolute path to a rootDir
and uses os.DirFS(rootPath)
to create an fs.FS
instance scoped to that directory. When Open(name string)
is called, it calculates the path relative to rootDir
and then uses the underlying os.DirFS
to open the file. This ensures that file access is contained within the specified root directory.local.go
file contains the filesystem
struct and its methods. The core logic resides in the New
function for initialization and the Open
method for file access. filepath.Abs
and filepath.Rel
are used to correctly handle and relativize paths.gcs
: This submodule implements fs.FS
for Google Cloud Storage.
gcs.New(ctx context.Context)
function initializes a filesystem
struct. It authenticates with GCS using google.DefaultTokenSource
to obtain an OAuth2 token source and then creates a *storage.Client
. The Open(name string)
method expects a GCS URI (e.g., gs://bucket-name/path/to/file
). It parses this URI into a bucket name and object path using parseNameIntoBucketAndPath
. Then, it uses the storage.Client
to get a *storage.Reader
for the specified object. This reader is wrapped in a custom file
struct which implements fs.File
.gcs.go
file defines the filesystem
struct, which holds the *storage.Client
, and the file
struct, which wraps *storage.Reader
. The New
function handles GCS client initialization and authentication. The Open
method is responsible for parsing GCS URIs and obtaining a reader for the object. Notably, the Stat()
method for gcs.file
is intentionally not implemented (returns ErrNotImplemented
) because Perf's current usage patterns do not require it, simplifying the implementation. The parseNameIntoBucketAndPath
helper function is crucial for translating the GCS URI format into the bucket and object path components required by the GCS client library.Workflow: Opening a File (Conceptual)
The client code (e.g., a component within Perf) would typically decide which filestore implementation to use based on configuration or the nature of the file path.
Initialization:
fsImpl, err := local.New("/path/to/data/root")
fsImpl, err := gcs.New(context.Background())
File Access:
- The client calls `file, err :=
fsImpl.Open(“relative/path/to/file.json”)(for local) or
file, err := fsImpl.Open(“gs://my-bucket/data/some_trace.json”)` (for GCS).
Behind the Scenes:
- **Local**: `local.Open("relative/path/to/file.json") | V Calculates
absolute path based on rootDir | V Calls os.DirFS(rootDir).Open(“relative/path/to/file.json”) | V Returns fs.File (os.File) - **GCS**:
gcs.Open(“gs://my-bucket/data/some_trace.json”) | V parseNameIntoBucketAndPath(“gs://my-bucket/data/some_trace.json”) --> “my-bucket”, “data/some_trace.json” | V gcsClient.Bucket(“my-bucket”).Object(“data/some_trace.json”).NewReader() | V Wraps storage.Reader in gcs.file | V Returns fs.File (gcs.file)`
Reading Data:
fs.File
(e.g., file.Read(buffer)
) in a standard way, irrespective of whether it's an os.File
or a gcs.file
wrapping a storage.Reader
.This abstraction allows Perf to be agnostic to the underlying storage mechanism when reading files, simplifying its data processing pipelines.
The frontend
module serves as the backbone for the Perf web UI. It's responsible for handling HTTP requests, rendering HTML templates, and interacting with various backend services and data stores to provide a comprehensive performance analysis platform.
The design philosophy emphasizes a separation of concerns. The core frontend.go
file initializes and wires together various components, while the api
subdirectory houses specific handlers for different categories of user interactions (e.g., alerts, graphs, regressions). This modular approach simplifies development, testing, and maintenance.
Key Components and Responsibilities:
frontend.go
:
New
, initialize
): This is the entry point. It sets up logging, metrics, reads configuration (config.Config
), initializes database connections (TraceStore, AlertStore, RegressionStore, etc.), and establishes connections to external services like Git and potentially Chrome Perf.loadTemplates
, templateHandler
): It loads HTML templates from the dist
directory (produced by the build system). These templates are Go templates, allowing for dynamic data injection. Snippets for Google Analytics (googleanalytics.html
) and cookie consent (cookieconsent.html
) are embedded and can be included in the rendered pages.getPageContext
): This crucial function generates a JavaScript object (window.perf
) that is embedded in every HTML page. This object contains configuration values and settings that the client-side JavaScript needs to function correctly, such as API URLs, display preferences, and feature flags. This avoids hardcoding such values in the JavaScript and allows for easier configuration.GetHandler
, getFrontendApis
): It defines the HTTP routes and associates them with their respective handler functions. This is where chi
router is configured. It also instantiates and registers all the API handlers from the api
sub-module.loginProvider
, RoleEnforcedHandler
): It integrates with an authentication system (e.g., proxylogin
) to determine user identity and roles. RoleEnforcedHandler
is a middleware to protect certain endpoints based on user roles.progressTracker
): For operations that might take a significant amount of time (e.g., generating complex data frames for graphs, running regression detection), it uses a progress.Tracker
. This allows the frontend to initiate a task, return an ID to the client, and let the client poll for status and results, preventing HTTP timeouts for long operations./_/frame/start
with query details.frameStartHandler
creates a progress
object, adds it to progressTracker
.frame.ProcessFrameRequest
.frameStartHandler
immediately returns the progress
object's ID./_/status/{id}
./_/frame/results/{id}
(managed by progressTracker
) once finished.gotoHandler
, old URL handlers): Handles redirects for old URLs to new ones and provides a /g/
endpoint to navigate to specific views based on a Git hash.liveness
): Provides a /liveness
endpoint that checks the health of critical dependencies (like the database connection) for Kubernetes.api
(subdirectory): This directory contains the specific HTTP handlers for various features of Perf. Each API is typically encapsulated in its own file (e.g., alertsApi.go
, graphApi.go
) and implements the FrontendApi
interface, primarily its RegisterHandlers
method. This design promotes modularity.
alertsApi.go
: Manages CRUD operations for alert configurations (alerts.Alert
). It interacts with alerts.ConfigProvider
(for fetching configurations, potentially cached) and alerts.Store
(for persistence). It also handles trying out bug filing and notification sending for alerts. Includes endpoints to list subscriptions and manage dry-run requests for alert configurations.anomaliesApi.go
: Provides endpoints for fetching anomaly data. It has two modes of operation:cleanTestName
) addresses potential incompatibilities in test naming conventions or characters between systems.subscription.Store
, alerts.Store
). This allows Perf instances to manage their own anomaly data.favoritesApi.go
: Manages user-specific and instance-wide favorite links. User favorites are stored in favorites.Store
, while instance-wide favorites can be defined in the main configuration file (config.Config.Favorites
). It provides endpoints to list, create, delete, and update favorites.graphApi.go
: Handles requests related to plotting graphs.frameStartHandler
): As described above, this initiates the potentially long process of fetching trace data and constructing a dataframe.DataFrame
. It uses dfbuilder.DataFrameBuilder
for this.cidHandler
, cidRangeHandler
, shiftHandler
): Provides details about specific commits or ranges of commits by interacting with perfgit.Git
.detailsHandler
, linksHandler
): Fetches raw data or metadata for a specific trace point at a particular commit. This involves reading from tracestore.TraceStore
and potentially the ingestedFS
(filesystem where raw ingested data is stored) to get information like associated benchmark links from the original JSON files.pinpointApi.go
: Facilitates interaction with the Pinpoint bisection service. It allows users to create bisection jobs (to identify the commit that caused a performance regression) or try jobs (to test a patch). It can proxy requests to a legacy Pinpoint service or a newer backend service.queryApi.go
: Supports the query construction UI.initpageHandler
, getParamSet
): Provides the initial set of queryable parameters (keys and their possible values) to populate the UI. This uses psrefresh.ParamSetRefresher
which periodically updates this canonical paramset based on recent data, ensuring the UI reflects available data.countHandler
, nextParamListHandler
): As the user builds a query in the UI, these handlers can estimate the number of matching traces or provide the next relevant parameter values based on the current partial query. This gives users immediate feedback. The nextParamListHandler
is tailored for UIs where parameter selection is ordered (e.g., Chromeperf's UI).regressionsApi.go
: Deals with detected regressions.regressionRangeHandler
, regressionCountHandler
, alertsHandler
, regressionsHandler
): Fetches regression data from regression.Store
based on time ranges, alert configurations, or subscriptions. It can filter by user ownership or category.triageHandler
): Allows users (editors) to mark regressions as triaged (e.g., “positive”, “negative”, “ignored”) and associate them with bug reports. If a regression is marked as negative, it can generate a bug report URL using a configurable template.clusterStartHandler
): Allows users to initiate the regression detection process for a specific query or set of parameters. This is also a long-running operation managed by progressTracker
.anomalyHandler
, alertGroupQueryHandler
): Provides redirect URLs to the appropriate graph view for a given anomaly ID or alert group ID from Chromeperf. This involves generating graph shortcuts.sheriffConfigApi.go
: Handles interactions related to LUCI Config for sheriff configurations.getMetadataHandler
): Provides metadata to LUCI Config, indicating which configuration files (e.g., skia-sheriff-configs.cfg
) Perf owns and the URL for validating changes to these files. This is part of an automated config management system.validateConfigHandler
): Receives configuration content from LUCI Config and validates it (e.g., using sheriffconfig.ValidateContent
). Returns success or a structured error message.shortcutsApi.go
: Manages the creation and retrieval of shortcuts.keysHandler
): Allows storing a set of trace keys (queries) and getting a short ID for them. This is used, for example, by the “Share” button on the explore page.getGraphsShortcutHandler
, createGraphsShortcutHandler
): Manages shortcuts for more complex graph configurations, which can include multiple queries and formulas. These are used for sharing multi-graph views.triageApi.go
: Provides endpoints for triaging anomalies, specifically those originating from or managed by Chromeperf. This includes filing new bugs, associating anomalies with existing bugs, and performing actions like ignoring or nudging anomalies. It interacts with chromeperf.ChromePerfClient
and potentially an issuetracker.IssueTracker
implementation.userIssueApi.go
: Manages user-reported issues (Buganizer annotations) associated with specific data points (a trace at a commit). This allows users to link external bug reports directly to performance data points in the UI. It uses userissue.Store
for persistence.The overall goal of the frontend
module is to provide a responsive and informative user interface by efficiently querying and presenting performance data, while also enabling users to configure alerts, triage regressions, and collaborate on performance analysis. The interaction with various stores and services is abstracted to keep the request handling logic focused.
The go/git
module provides an abstraction layer for interacting with Git repositories. It is designed to efficiently retrieve and cache commit information, which is essential for performance analysis in Skia Perf. The primary goal is to offer a consistent interface for accessing commit data, regardless of whether the underlying data source is a local Git checkout or a remote Gitiles API.
Design Decisions and Implementation Choices:
/go/git/schema/schema.go
.provider.Provider
interface (defined in /go/git/provider/provider.go
). This allows for different implementations of how Git data is fetched. Currently, two providers are implemented:git_checkout
: Interacts with a local Git repository by shelling out to git
commands. This is suitable for environments where a local checkout is available and preferred.gitiles
: Uses the Gitiles API to fetch commit data. This is useful when direct repository access is not feasible or when leveraging Google's infrastructure for Git operations. The choice of provider is determined by the instance configuration, as seen in /go/git/providers/builder.go
.CommitNumber
s to commits as they are ingested. This provides a simple, ordered way to refer to commits.instanceConfig.GitRepoConfig.CommitNumberRegex
). This is useful for repositories like Chromium that embed a commit position in their messages. The repoSuppliedCommitNumber
flag in impl.go
controls this behavior.cache
in impl.go
) is used for frequently accessed commit details (CommitFromCommitNumber
). This further speeds up lookups for commonly requested commits. The size of this cache is defined by commitCacheSize
.StartBackgroundPolling
method in impl.go
initiates a goroutine that periodically calls the Update
method. This ensures that the local database cache stays synchronized with the remote repository.impl.go
. This helps in organizing and managing the queries. Separate statements are defined for different SQL dialects if needed (e.g., insert
vs insertSpanner
).BadCommit
constant provides a sentinel value for functions returning provider.Commit
to indicate an error or an invalid commit.Key Responsibilities and Components:
interface.go
(Git Interface):Git
interface, which is the public contract for this module. It specifies all the operations that can be performed to retrieve commit information.impl.go
(Git Implementation):Impl
struct, which is the primary implementation of the Git
interface.Update
method): This is a crucial method responsible for fetching new commits from the configured provider.Provider
and storing them in the SQL database. It determines the last known commit and fetches all subsequent commits.repoSuppliedCommitNumber
is true, it parses the commit number from the commit body using commitNumberRegex
.CommitNumberFromGitHash
: Retrieves the sequential CommitNumber
for a given Git hash.CommitFromCommitNumber
: Retrieves the full provider.Commit
details for a given CommitNumber
. Uses the LRU cache.CommitNumberFromTime
: Finds the CommitNumber
closest to (but not after) a given timestamp.CommitSliceFromTimeRange
, CommitSliceFromCommitNumberRange
: Fetches slices of commits based on time or commit number ranges.GitHashFromCommitNumber
: Retrieves the Git hash for a given CommitNumber
.PreviousGitHashFromCommitNumber
, PreviousCommitNumberFromCommitNumber
: Finds the Git hash or commit number of the commit immediately preceding a given commit number.CommitNumbersWhenFileChangesInCommitNumberRange
: Identifies commit numbers within a range where a specific file was modified. This involves converting commit numbers to hashes and then querying the provider.Provider
.urlFromParts
): Constructs a URL to view a specific commit, respecting configurations like DebouceCommitURL
or custom CommitURL
formats.updateCalled
, commitNumberFromGitHashCalled
) to monitor the usage and performance of different operations.provider/provider.go
(Provider Interface and Commit Struct):provider.Provider
interface, which abstracts the source of Git commit data. Implementations of this interface (like git_checkout
and gitiles
) handle the actual fetching of data.provider.Commit
struct, which is the standard representation of a commit used throughout the go/git
module and its providers. It includes fields like GitHash
, Timestamp
, Author
, Subject
, and Body
. The Body
is particularly important when repoSuppliedCommitNumber
is true, as it's parsed to extract the commit number.providers/builder.go
(Provider Factory):New
function, which acts as a factory for creating provider.Provider
instances based on the instanceConfig.GitRepoConfig.Provider
setting. This allows the system to dynamically choose between git_checkout
or gitiles
(or potentially other future providers).providers/git_checkout/git_checkout.go
(CLI Git Provider):provider.Provider
by executing git
command-line operations.CommitsFromMostRecentGitHashToHead
: Uses git rev-list
to get commit information.GitHashesInRangeForFile
: Uses git log
to find changes to a specific file.parseGitRevLogStream
: A helper function to parse the output of git rev-list --pretty
.providers/gitiles/gitiles.go
(Gitiles Provider):provider.Provider
by interacting with a Gitiles API endpoint.CommitsFromMostRecentGitHashToHead
: Uses gr.LogFnBatch
to fetch commits in batches. It handles logic for main branches versus other branches and respects the startCommit
.GitHashesInRangeForFile
: Uses gr.Log
with appropriate path filtering.Update
is a no-op for Gitiles as the API always provides the latest data.schema/schema.go
(Database Schema):Commit
struct with SQL annotations, representing the structure of the Commits
table in the database. This table stores the cached commit information.gittest/gittest.go
(Test Utilities):NewForTest
) for setting up test environments. This includes creating a temporary Git repository, populating it with commits, and initializing a test database. This is crucial for writing reliable unit and integration tests for the go/git
module and its components.mocks/Git.go
(Mock Implementation):Git
interface, generated by mockery
. This is used in tests of other modules that depend on go/git
, allowing them to isolate their tests from actual Git operations or database interactions.Key Workflows:
Initial Population / Update:
Application -> Impl.Update() | '-> Provider.Update() (e.g., git pull for git_checkout) | '-> Impl.getMostRecentCommit() (from local DB) | '-> Provider.CommitsFromMostRecentGitHashToHead(mostRecentDBHash, ...) | '-> (For each new commit from Provider) | '-> [If repoSuppliedCommitNumber] Impl.getCommitNumberFromCommit(commit.Body) | '-> Impl.CommitNumberFromGitHash(commit.GitHash) (Check if already exists) | '-> DB.Exec(INSERT INTO Commits ...)
Fetching Commit Details by CommitNumber:
Application -> Impl.CommitFromCommitNumber(commitNum) | '-> Check LRU Cache (cache.Get(commitNum)) | | | '-> [If found] Return cached provider.Commit | '-> [If not in LRU] DB.QueryRow(SELECT ... FROM Commits WHERE commit_number=$1) | '-> Construct provider.Commit | '-> Add to LRU Cache (cache.Add(commitNum, commit)) | '-> Return provider.Commit
Finding Commits Where a File Changed: Application -> Impl.CommitNumbersWhenFileChangesInCommitNumberRange(beginNum, endNum, file) | '-> Impl.PreviousGitHashFromCommitNumber(beginNum) -> beginHash (or Impl.GitHashFromCommitNumber if beginNum is 0 and start commit is used) | '-> Impl.GitHashFromCommitNumber(endNum) -> endHash | '-> Provider.GitHashesInRangeForFile(beginHash, endHash, file) -> changedGitHashes[] | '-> (For each changedGitHash) | '-> Impl.CommitNumberFromGitHash(changedGitHash) -> commitNum | '-> Add commitNum to result list | '-> Return result list
This structure allows Perf to efficiently query and manage Git commit information, supporting its core functionality of tracking performance data across different versions of the codebase.
The graphsshortcut
module provides a mechanism for storing and retrieving shortcuts for graph configurations in Perf. Users often define complex sets of graphs for analysis. Instead of redefining these configurations each time or relying on cumbersome URL sharing, this module allows users to save a collection of graph configurations and access them via a unique, shorter identifier. This significantly improves usability and sharing of common graph views.
The core idea is to represent a set of graphs, each with its own configuration (queries, formulas, keys), as a GraphsShortcut
object. This object can then be persisted and retrieved using a Store
interface. A key design decision is the generation of a unique ID for each GraphsShortcut
. This ID is a hash (MD5) of the content of the shortcut, ensuring that identical graph configurations will always have the same ID. This also provides a form of de-duplication. To ensure consistent ID generation, the queries and formulas within each graph configuration are sorted alphabetically before hashing. However, the order of the GraphConfig
objects within a GraphsShortcut
does affect the generated ID.
User defines graph configurations --> [GraphsShortcut object] -- InsertShortcut --> [Store] --> Generates ID (MD5 hash) --> Persists (ID, Shortcut) ^ | User provides ID -------------------> [Store] -- GetShortcut --------+------> [GraphsShortcut object] --> Display Graphs
graphsshortcut.go
: This file defines the central data structures and the Store
interface.
GraphConfig
: Represents the configuration for a single graph. It contains:Queries
: A slice of strings, where each string represents a query used to fetch data for the graph.Formulas
: A slice of strings, representing any formulas applied to the data.Keys
: A string, likely representing a pre-selected set of traces or keys to focus on.GraphsShortcut
: This is the primary object that is stored and retrieved. It's essentially a list of GraphConfig
objects.GetID()
: A method on GraphsShortcut
that calculates a unique MD5 hash based on its content. This method is crucial for identifying and de-duplicating shortcuts. It sorts queries and formulas within each GraphConfig
before hashing to ensure that the order of these internal elements doesn't change the ID.Store
: An interface defining the contract for persisting and retrieving GraphsShortcut
objects. It has two methods:InsertShortcut
: Takes a GraphsShortcut
and stores it, returning its generated ID.GetShortcut
: Takes an ID and returns the corresponding GraphsShortcut
.graphsshortcutstore/
: This subdirectory contains implementations of the graphsshortcut.Store
interface.
graphsshortcutstore.go
(GraphsShortcutStore
): This provides an SQL-backed implementation of the Store
.sql.Pool
) to manage database connections.InsertShortcut
: Marshals the GraphsShortcut
object into JSON and stores it as a string in the GraphsShortcuts
table along with its pre-computed ID. It uses ON CONFLICT (id) DO NOTHING
to avoid errors if the same shortcut (and thus same ID) is inserted multiple times.GetShortcut
: Retrieves the JSON string from the database based on the ID and unmarshals it back into a GraphsShortcut
object.cachegraphsshortcutstore.go
(cacheGraphsShortcutStore
): This provides an in-memory cache-backed implementation of the Store
.cache.Cache
client.InsertShortcut
: Marshals the GraphsShortcut
to JSON and stores it in the cache using the shortcut's ID as the cache key.GetShortcut
: Retrieves the JSON string from the cache by ID and unmarshals it.schema/schema.go
: Defines the SQL table schema for GraphsShortcuts
. The table primarily stores the id
(TEXT, PRIMARY KEY) and the graphs
(TEXT, storing the JSON representation of the GraphsShortcut
).graphsshortcuttest/graphsshortcuttest.go
: This file provides a suite of common tests that can be run against any implementation of the graphsshortcut.Store
interface.
InsertGet
: Verifies that a shortcut can be inserted and then retrieved, and that the retrieved shortcut is identical to the original (accounting for sorted queries/formulas).GetNonExistent
: Ensures that attempting to retrieve a shortcut with an unknown ID results in an error.mocks/Store.go
: This file contains a mock implementation of the graphsshortcut.Store
interface, generated by the testify/mock
library.
Store
interface without needing a real database or cache. They allow for controlled testing of different scenarios, such as simulating errors from the store.In summary, the graphsshortcut
module provides a flexible way to save and share complex graph views by defining a clear data structure (GraphsShortcut
), a standardized way to identify them (GetID
), and an interface (Store
) for various persistence mechanisms, with current implementations for SQL databases and in-memory caches.
The /go/ingest
module is responsible for the entire process of taking performance data files, parsing them, and storing the data into a trace store. This involves identifying the format of the input file, extracting relevant measurements and metadata, associating them with specific commits, and then writing this information to the configured data storage backend.
A key design principle is to support multiple ingestion file formats and to be resilient to errors in individual files. The system attempts to parse files in a specific order, falling back to legacy formats if the primary parsing fails. This allows for graceful upgrades of the ingestion format over time without breaking existing data producers.
The ingestion process also handles trybot data, extracting issue and patchset information, which is crucial for pre-submit performance analysis.
/go/ingest/filter/filter.go
This component provides a mechanism to selectively process or ignore input files based on their names using regular expressions.
Why: In many scenarios, not all files in a data source are relevant for performance analysis. For example, temporary files, logs, or files matching specific patterns might need to be excluded. This filter allows for fine-grained control over which files are ingested.
How:
accept
and reject
.accept
regex, if provided, means only filenames matching this regex will be considered for processing. If empty, all files are initially accepted.reject
regex, if provided, means any filename matching this regex will be ignored, even if it matched the accept
regex. If empty, no files are rejected based on this rule.Reject(name string) bool
method implements this logic: a file is rejected if it doesn't match the accept
regex (if one is provided) OR if it does match the reject
regex (if one is provided).Workflow:
File Name -> Filter.Reject() | +-- accept_regex_exists? -- Yes -> name_matches_accept? -- No -> REJECT | | | +-------------------------- Yes --+ +----------------------------- No -----------------------------+ | V reject_regex_exists? -- Yes -> name_matches_reject? -- Yes -> REJECT | | | +-- No --+ +----------------------------- No -----+ | V ACCEPT
/go/ingest/format/format.go
and /go/ingest/format/legacyformat.go
These files define the structure of the data files that the ingestion system can understand. format.go
defines the current standard format (Version 1), while legacyformat.go
defines an older format primarily used by nanobench.
Why: A well-defined input format is essential for reliable data ingestion. Versioning allows the format to evolve while maintaining backward compatibility or clear error handling for older, unsupported versions. The current format (Format
struct) is designed to be flexible, allowing for common metadata (like git hash, issue/patchset), global key-value pairs applicable to all results, and a list of individual results. Each result can have its own set of keys and either a single measurement or a map of “sub-measurements” (e.g., min, max, median for a single test). This structure allows for rich and varied performance data to be represented. The legacy format (BenchData
) exists to support older systems that still produce data in that schema.
How:
format.go
(Version 1):Format
struct: The top-level structure. Contains Version
, GitHash
, optional trybot info (Issue
, Patchset
), a global Key
map, a slice of Result
structs, and global Links
.Result
struct: Represents one or more measurements. It has its own Key
map (which gets merged with the global Key
), and critically, either a single Measurement
(float32) or a Measurements
map.SingleMeasurement
struct: Used within Measurements
map. It allows associating a value
(e.g., “min”, “median”) with a Measurement
(float32) and optional Links
. This is how multiple metrics for a single conceptual test run are represented.Parse(r io.Reader)
: Decodes JSON data from a reader into a Format
struct. It specifically checks fileFormat.Version == FileFormatVersion
.Validate(r io.Reader)
: Uses a JSON schema (formatSchema.json
) to validate the structure of the input data. This ensures that incoming files adhere to the expected contract, preventing malformed data from causing issues downstream.GetLinksForMeasurement(traceID string)
: Retrieves links associated with a specific measurement, combining global links with measurement-specific ones.legacyformat.go
:BenchData
struct: Defines the older nanobench format. It has fields like Hash
, Issue
, PatchSet
, Key
, Options
, and Results
. The Results
are nested maps leading to BenchResult
.BenchResult
: A map representing individual test results, typically map[string]interface{}
where values are float64s, except for an “options” key.ParseLegacyFormat(r io.Reader)
: Decodes JSON data into a BenchData
struct.The system will first attempt to parse an input file using format.Parse
. If that fails (e.g., due to a version mismatch or JSON parsing error), it may then attempt to parse it using format.ParseLegacyFormat
as a fallback.
/go/ingest/format/formatSchema.json
This file contains the JSON schema definition for the Format
struct defined in format.go
.
Why: A JSON schema provides a formal, machine-readable definition of the expected data structure. This is used for validation, ensuring that ingested files conform to the specified format. This helps catch errors early and provides clear feedback on what is wrong with a non-conforming file.
How: It's a standard JSON Schema file. The format.Validate
function uses this schema to check the structure and types of the fields in an incoming JSON file. The schema is embedded into the Go binary.
/go/ingest/format/generate/main.go
This is a utility program used to automatically generate formatSchema.json
from the Go Format
struct definition.
Why: Manually keeping a JSON schema synchronized with Go struct definitions is error-prone. This generator ensures that the schema always accurately reflects the Go types.
How: It uses the go.skia.org/infra/go/jsonschema
library, which can reflect on Go structs and produce a corresponding JSON schema. The //go:generate
directive in the file allows this program to be run easily (e.g., via go generate
).
/go/ingest/parser/parser.go
This is the core component responsible for taking an input file (as file.File
), attempting to parse it using the defined formats, and extracting the performance data into a standardized intermediate representation.
Why: This component decouples the specifics of file formats from the process of writing data to the trace store. It handles the logic of trying different parsers, extracting common information like Git hashes and trybot details, and transforming the data into lists of parameter maps (paramtools.Params
) and corresponding measurement values (float32
). It also enforces rules like branch name filtering and parameter key/value validation.
How:
New(...)
: Initializes a Parser
with instance-specific configurations, such as recognized branch names and a regex for invalid characters in parameter keys/values.Parse(ctx context.Context, file file.File)
: This is the main entry point for processing a regular data file.extractFromVersion1File
(which uses format.Parse
).extractFromLegacyFile
(which uses format.ParseLegacyFormat
).ErrFileShouldBeSkipped
.query.ForceValidWithRegex
based on the invalidParamCharRegex
from the instance configuration. This is crucial because trace IDs (which are derived from these parameters) often have restrictions on allowed characters.params
(a slice of paramtools.Params
), values
(a slice of float32
), the gitHash
, any global links
from the file, and an error.ParseTryBot(file file.File)
: A specialized function to extract only the Issue
and Patchset
information from a file, trying both V1 and legacy formats. This is likely used for systems that only need to identify the tryjob associated with a file without processing all the measurement data.ParseCommitNumberFromGitHash(gitHash string)
: Extracts an integer commit number from a specially formatted git hash string (e.g., “CP:12345” -> 12345). This supports systems that use such commit identifiers.getParamsAndValuesFromLegacyFormat
and getParamsAndValuesFromVersion1Format
do the actual work of traversing the parsed file structures (BenchData
or Format
) and flattening them into the params
and values
slices.f.Results
. If a Result
has a single Measurement
, it combines f.Key
and result.Key
to form the paramtools.Params
.Result
has Measurements
(a map of string
to []SingleMeasurement
), it iterates through this map. For each entry, it takes the map's key and the Value
from SingleMeasurement
to add more key-value pairs to the paramtools.Params
.GetSamplesFromLegacyFormat(b *format.BenchData)
: Extracts raw sample data (if present) from the legacy format. This seems to be for specific use cases where individual sample values, rather than just aggregated metrics, are needed.Key Workflow (Simplified Parse
):
Input: file.File Output: ([]paramtools.Params, []float32, gitHash, links, error) 1. Read file contents. 2. Attempt Parse as Version 1 Format: `f, err := format.Parse(contents)` If success: `params, values := getParamsAndValuesFromVersion1Format(f, p.invalidParamCharRegex)` `gitHash = f.GitHash` `links = f.Links` `commonKeys = f.Key` Else (error): Reset reader. Attempt Parse as Legacy Format: `benchData, err := format.ParseLegacyFormat(contents)` If success: `params, values := getParamsAndValuesFromLegacyFormat(benchData)` `gitHash = benchData.Hash` `links = nil` (legacy format doesn't have global links in the same way) `commonKeys = benchData.Key` Else (error): Return error. 3. `branch, ok := p.checkBranchName(commonKeys)` If !ok: Return `ErrFileShouldBeSkipped`. 4. If len(params) == 0: Return `ErrFileShouldBeSkipped`. 5. Return `params, values, gitHash, links, nil`.
/go/ingest/process/process.go
This component orchestrates the entire ingestion pipeline. It takes files from a source (e.g., a directory, GCS bucket), uses the parser
to extract data, interacts with git
to resolve commit information, and then writes the processed data to a tracestore.TraceStore
and tracestore.MetadataStore
. It also handles sending Pub/Sub events for ingested files.
Why: This provides the high-level control flow for ingestion. It manages concurrency (multiple worker goroutines), error handling at a macro level (retries for writing to the store), and integration with external systems like Git and Pub/Sub.
How:
Start(...)
:file.Source
(to get files), the tracestore.TraceStore
and tracestore.MetadataStore
(to write data), and perfgit.Git
(to map git hashes to commit numbers).worker
goroutines specified by numParallelIngesters
.worker
listens on a channel provided by the file.Source
.worker(...)
:parser.Parser
instance.file.File
objects from the channel.workerInfo.processSingleFile
.workerInfo.processSingleFile(f file.File)
: This is the heart of the per-file processing.p.Parse(ctx, f)
to get params
, values
, gitHash
, and fileLinks
.Parse
:parser.ErrFileShouldBeSkipped
, acks the Pub/Sub message (if any) and skips.gitHash
is empty, logs an error and nacks.p.ParseCommitNumberFromGitHash
.g.GetCommitNumber(ctx, gitHash, commitNumberFromFile)
to resolve the gitHash
(or verify the supplied commit number) against the Git repository. It includes logic to update the local Git repository clone if the hash isn‘t initially found. If the commit cannot be resolved, it logs an error, acks the Pub/Sub message (as retrying won’t help for an unknown commit), and skips.paramtools.ParamSet
from all the extracted params
.tracestore.TraceStore
using store.WriteTraces
or store.WriteTraces2
(depending on instanceConfig.IngestionConfig.TraceValuesTableInlineParams
). This involves retries in case of transient store errors.WriteTraces2
suggests an optimized path where some parameter data might be stored directly with trace values, potentially for performance reasons.sendPubSubEvent
to publish information about the ingested file (trace IDs, paramset, filename) to a configured Pub/Sub topic. This allows other services to react to new data ingestion.fileLinks
were present in the input, it calls metadataStore.InsertMetadata
to store these links.sendPubSubEvent(...)
: If a FileIngestionTopicName
is configured, this function constructs an ingestevents.IngestEvent
containing the trace IDs, the overall ParamSet
for the file, and the filename. It then publishes this event to the specified Pub/Sub topic.Overall Ingestion Workflow:
File Source (e.g., GCS bucket watcher) | v [ file.File channel ] -> Worker Goroutine(s) | v processSingleFile(file) | +--------------------------+--------------------------+ | | | v v v Parser.Parse(file) --> Git.GetCommitNumber(hash) --> TraceStore.WriteTraces(...) | ^ | | ^ | | (if parsing fails)| | | (retries) | +-------------------| (update repo if needed) | | | | | | +-----> ParamSet Creation +--------------------------+ | | | v | sendPubSubEvent (if success) ------------------------------+ | v MetadataStore.InsertMetadata (if links exist)
This architecture allows for robust and scalable ingestion of performance data from various sources and formats, with clear separation of concerns between parsing, data transformation, Git interaction, and storage. The use of Pub/Sub facilitates downstream processing and real-time reactions to newly ingested data.
The ingestevents
module is designed to facilitate the communication of ingestion completion events via PubSub. This is a critical part of the event-driven alerting system within Perf, where the completion of data ingestion for a file triggers subsequent processes like regression detection in a clusterer.
The core of this module revolves around the IngestEvent
struct. This struct encapsulates the necessary information to be transmitted when a file has been successfully ingested. It includes:
TraceIDs
: A slice of strings representing all the unencoded trace identifiers found within the ingested file. These IDs are fundamental for identifying the specific data points that have been processed.ParamSet
: An unencoded, read-only representation of the paramtools.ParamSet
that summarizes the TraceIDs
. This provides a consolidated view of the parameters associated with the ingested traces.Filename
: The name of the file that was ingested. This helps in tracking the source of the ingested data.To handle the transmission of IngestEvent
data over PubSub, the module provides two key functions:
CreatePubSubBody
: This function takes an IngestEvent
struct as input and prepares it for PubSub transmission. The “how” here involves a two-step process:
IngestEvent
is first encoded into a JSON format. This provides a structured and widely compatible representation of the data.IngestEvent (struct) ---> JSON Encoding ---> Gzip Compression ---> []byte (for PubSub)
DecodePubSubBody
: This function performs the reverse operation. It takes a byte slice (presumably received from a PubSub message) and decodes it back into an IngestEvent
struct. The process is:
IngestEvent
struct. Error handling is incorporated at each step to manage potential issues during decompression or JSON decoding.[]byte (from PubSub) ---> Gzip Decompression ---> JSON Decoding ---> IngestEvent (struct)
The primary responsibility of this module is therefore to provide a standardized and efficient way to serialize and deserialize ingestion event information for PubSub communication. The design choice of using JSON for structure and gzip for compression balances readability, interoperability, and an efficient use of PubSub resources.
The file ingestevents.go
contains the definition of the IngestEvent
struct and the implementation of the CreatePubSubBody
and DecodePubSubBody
functions. The corresponding test file, ingestevents_test.go
, ensures that the encoding and decoding processes work correctly, verifying that an IngestEvent
can be successfully round-tripped through the serialization and deserialization process.
The initdemo
module provides a command-line application designed to initialize a database instance, specifically targeting CockroachDB or a Spanner emulator, for demonstration or development purposes.
Its primary purpose is to automate the creation of a named database and the application of the latest database schema. This ensures a consistent and ready-to-use database environment, removing the manual steps often required for setting up a database for applications like Skia Perf.
The core functionality revolves around connecting to a specified database URL, attempting to create the database (gracefully handling cases where it already exists), and then executing the appropriate schema definition. The choice of schema (standard SQL or Spanner-specific) is determined by a command-line flag.
Key Components and Responsibilities:
main.go
: This is the entry point and sole Go source file for the application.--databasename
: Specifies the name of the database to be created (defaults to “demo”). This allows users to customize the database name for different environments or purposes.--database_url
: Provides the connection string for the CockroachDB instance (defaults to a local instance postgresql://root@127.0.0.1:26257/?sslmode=disable
). This allows connection to different database servers or configurations.--spanner
: A boolean flag that, when set, instructs the application to use the Spanner-specific schema. This is crucial for ensuring compatibility when targeting a Spanner emulator, which may have different SQL syntax or feature support compared to CockroachDB.pgxpool
library, which is a PostgreSQL driver and connection pool for Go. This library was chosen for its robustness and performance in handling PostgreSQL-compatible databases like CockroachDB.CREATE DATABASE
SQL statement. The implementation includes error handling to log an informational message if the database already exists, rather than failing, making the script idempotent in terms of database creation.SET DATABASE
to switch the current session's context to the newly created (or existing) database. This is a CockroachDB-specific command.--spanner
flag, it selects the appropriate schema definition.--spanner
is false, it uses sql.Schema
from the //perf/go/sql
module, which contains the standard SQL schema for Perf.--spanner
is true, it uses spanner.Schema
from the //perf/go/sql/spanner
module, which contains the schema adapted for Spanner. This separation allows maintaining distinct schema versions tailored to the nuances of each database system.Workflow:
The typical workflow of the initdemo
application can be visualized as:
Parse Flags: Application Start
-> Read --databasename, --database_url, --spanner
Connect to Database: Use --database_url
-> pgxpool.Connect()
-> Connection Pool (conn)
Create Database: conn
+ Use --databasename
-> Execute "CREATE DATABASE <name>"
|
+-- Success
|
+-- Error (e.g., already exists)
-> Log Info "Database <name> already exists."
Set Active Database (if not Spanner): Is --spanner false?
|
+-- Yes
-> conn
+ Use --databasename
-> Execute "SET DATABASE <name>"
| |
| +-- Error
-> sklog.Fatal()
|
+-- No (Spanner enabled)
-> Skip this step
Select Schema: Is --spanner true?
|
+-- Yes
-> dbSchema = spanner.Schema
|
+-- No
-> dbSchema = sql.Schema
Apply Schema: conn
+ dbSchema
-> Execute schema DDL
|
+-- Error
-> sklog.Fatal()
Close Connection: conn.Close()
-> Application End
This process ensures that a target database is either created or confirmed to exist, and then the correct schema is applied, making it ready for use. The choice of using pgxpool
for database interaction and providing separate schema definitions for standard SQL and Spanner demonstrates a design focused on supporting multiple database backends for the Perf system. The error handling, particularly for the database creation step, aims for robust and user-friendly operation.
This module provides an interface and implementation for interacting with the Google Issue Tracker API, specifically tailored for Perf's needs. The primary goal is to abstract the complexities of the Issue Tracker API and provide a simpler, more focused way to retrieve issue details and add comments to existing issues. This allows other parts of the Perf system to integrate with issue tracking without needing to directly handle API authentication, request formatting, or response parsing.
The module is designed around the IssueTracker
interface, which defines the core operations:
Listing Issues (ListIssues
): This function allows retrieving details for a set of specified issue IDs.
- **Why**: Perf often needs to fetch information about bugs that have been filed (e.g., to display their status or link to them from alerts). Providing a bulk retrieval mechanism based on IDs is efficient. - **How**: The implementation takes a `ListIssuesRequest` containing a slice of integer issue IDs. It constructs a query string by joining these IDs with " | " (OR operator in Issue Tracker query language) and prepending "id:()". This formatted query is then sent to the Issue Tracker API. - **Example Workflow**: `Perf System --- ListIssuesRequest (IDs: [123,
456]) ---> issuetracker Module | v Construct Query: “id:(123 | 456)” | v Issue Tracker API <--- GET Request --- issueTrackerImpl | v Perf System <--- []*issuetracker.Issue --- Response Parsing <--- API Response`
Creating Comments (CreateComment
): This function allows adding a new comment to an existing issue.
- **Why**: Perf might need to automatically update bugs with new information, such as when a regression is fixed or when more data about an alert becomes available. - **How**: It takes a `CreateCommentRequest` containing the `IssueId` and the `Comment` string. The implementation constructs an `issuetracker.IssueComment` object and uses the Issue Tracker client library to post this comment to the specified issue. - **Example Workflow**: `Perf System --- CreateCommentRequest (ID: 789,
Comment: “...”) ---> issuetracker Module | v Issue Tracker API <--- POST Request --- issueTrackerImpl | v Perf System <--- CreateCommentResponse <--- Response Parsing <--- API Response`
issuetracker.go
:
IssueTracker
interface: Defines the contract for interacting with the issue tracker. This allows for decoupling the client code from the specific implementation and facilitates testing using mocks.issueTrackerImpl
struct: The concrete implementation of the IssueTracker
interface. It holds an instance of the issuetracker.Service
client, which is the generated Go client for the Google Issue Tracker API.NewIssueTracker
function: This is the factory function for creating an issueTrackerImpl
instance.config.IssueTrackerConfig
. It then uses google.DefaultClient
with the “https://www.googleapis.com/auth/buganizer” scope to obtain an authenticated HTTP client. This client and the API key are then used to initialize the issuetracker.Service
.BasePath
of the issuetracker.Service
is explicitly set to “https://issuetracker.googleapis.com” to ensure it points to the correct API endpoint.ListIssuesRequest
, CreateCommentRequest
, CreateCommentResponse
): These simple structs define the data structures for requests and responses, making the interface clear and easy to use. They are designed to be minimal and specific to the needs of the Perf system.mocks/IssueTracker.go
:
IssueTracker
interface, generated using the testify/mock
library.issuetracker
module. They allow tests to simulate various responses (success, failure, specific data) from the issue tracker without making actual API calls. This makes tests faster, more reliable, and independent of external services.IssueTracker
mock struct embeds mock.Mock
and provides mock implementations for ListIssues
and CreateComment
. The NewIssueTracker
function in this file is a constructor for the mock, which also sets up test cleanup to assert that all expected mock calls were made.IssueTracker
) promotes loose coupling and testability. Consumers depend on the abstraction rather than the concrete implementation.skerr.Wrapf
to wrap errors, providing context and making debugging easier. It also includes input validation for CreateCommentRequest
to prevent invalid API calls.sklog.Debugf
) are included to trace requests and responses, which can be helpful during development and troubleshooting.The module relies on the external go.skia.org/infra/go/issuetracker/v1
library, which is the auto-generated client for the Google Issue Tracker API. This design choice leverages existing, well-tested client libraries instead of reimplementing API interaction from scratch.
This module provides a generic implementation of the k-means clustering algorithm. The primary goal is to offer a flexible way to group a set of data points (observations) into a predefined number of clusters (k) based on their similarity. The “similarity” is determined by a distance metric, and the “center” of each cluster is represented by a centroid.
The module is designed with generality in mind. Instead of being tied to a specific data type or distance metric, it uses interfaces (Clusterable
, Centroid
) and a function type (CalculateCentroid
). This approach allows users to define their own data structures and distance calculations, making the k-means algorithm applicable to a wide variety of problems.
Interfaces for Flexibility:
Clusterable
: This is a marker interface. Any data type that needs to be clustered must satisfy this interface. In practice, this means you can use interface{}
and then perform type assertions within your custom distance and centroid calculation functions. This design choice prioritizes ease of use for simple cases, where the same type might represent both an observation and a centroid.Centroid
: This interface defines the contract for centroids.AsClusterable() Clusterable
: This method is crucial for situations where a centroid itself can be treated as a data point (e.g., when calculating distances or when a centroid is part of the initial observation set). It allows the algorithm to seamlessly integrate centroids into lists of clusterable items. If a centroid cannot be meaningfully converted to a Clusterable
, it returns nil
.Distance(c Clusterable) float64
: This method is the core of the similarity measure. It calculates the distance between the centroid and a given Clusterable
data point. The user provides the specific implementation for this, enabling the use of various distance metrics (Euclidean, Manhattan, etc.).CalculateCentroid func([]Clusterable) Centroid
: This function type defines how a new centroid is computed from a set of Clusterable
items belonging to a cluster. This allows users to implement different strategies for centroid calculation, such as taking the mean, median, or other representative points.Lloyd's Algorithm Implementation:
The core clustering logic is implemented in the Do
function, which performs a single iteration of Lloyd's algorithm. This is a common and relatively straightforward iterative approach to k-means.
The KMeans
function orchestrates multiple iterations of Do
. A key design consideration here is the convergence criteria. Currently, it runs for a fixed number of iterations (iters
). A more sophisticated approach, would be to iterate until the total error (or the change in centroid positions) falls below a certain threshold, indicating that the clusters have stabilized. This was likely deferred for simplicity in the initial implementation, but it's an important aspect for practical applications to avoid unnecessary computations or premature termination.
Why modify centroids in-place in Do
?
The Do
function modifies the centroids
slice passed to it. The documentation explicitly advises calling it as centroids = Do(observations, centroids, f)
. This design choice might have been made for efficiency, avoiding the allocation of a new centroids slice in every iteration if the number of centroids remains the same. However, it also means the caller needs to be aware of this side effect. The function does return the potentially new slice of centroids, which is important because centroids can be “lost” if a cluster becomes empty.
kmeans.go
: This is the sole source file and contains all the logic for the k-means algorithm.
Clusterable
(interface): Defines the contract for data points that can be clustered. Its main purpose is to allow generic collections of items.Centroid
(interface): Defines the contract for cluster centers, including how to calculate their distance to data points and how to treat them as data points themselves.CalculateCentroid
(function type): A user-provided function that defines the logic for computing a new centroid from a group of data points. This separation of concerns is key to the module's flexibility.closestCentroid(observation Clusterable, centroids []Centroid) (int, float64)
: A helper function that finds the index of the centroid closest to a given observation and the distance to it. This is a fundamental step in assigning observations to clusters.Do(observations []Clusterable, centroids []Centroid, f CalculateCentroid) []Centroid
:Observations --> [Find Closest Centroid for each] --> Temporary Cluster Assignments
2. For each temporary cluster, it recalculates a new centroid using the user-provided f
function. Temporary Cluster Assignments --> [Group by Cluster] --> Sets of Clusterable items | V [Apply 'f'] --> New Centroids
3. If a cluster becomes empty (no observations are closest to its centroid), that centroid is effectively removed in this iteration, as f
will not be called for an empty set of Clusterable
items, and newCentroids
will not include it.KMeans
function clearer. The in-place modification (and return value) addresses the potential for the number of centroids to change.GetClusters(observations []Clusterable, centroids []Centroid) ([][]Clusterable, float64)
:AsClusterable()
is not nil).totalError
.KMeans(observations []Clusterable, centroids []Centroid, k, iters int, f CalculateCentroid) ([]Centroid, [][]Clusterable)
:Initial Centroids --(iter 1)--> Do() --(updates)--> Centroids' | --(iter 2)--> Do() --(updates)--> Centroids'' ... --(iter 'iters')--> Do() --(updates)--> Final Centroids | V GetClusters() --> Final Clusters
iters
) is a straightforward stopping condition, though, as mentioned, convergence-based stopping would be more robust. The k
parameter seems redundant given that the initial number of centroids is determined by len(centroids)
. If k
was intended to specify the desired number of clusters and the initial centroids
were just starting points, the implementation would need to handle cases where len(centroids)
!= k
. However, the current Do
function naturally adjusts the number of centroids if some clusters become empty.TotalError(observations []Clusterable, centroids []Centroid) float64
:GetClusters
and returns the totalError
computed by it.1. Single K-Means Iteration (Do
function):
Input: Observations (O), Current Centroids (C_curr), CalculateCentroid function (f) 1. For each Observation o in O: Find c_closest in C_curr such that Distance(o, c_closest) is minimized. Assign o to the cluster associated with c_closest. ---> Result: A mapping of each Observation to a Centroid index. 2. Initialize NewCentroids (C_new) as an empty list. 3. For each unique Centroid index j (from 0 to k-1): a. Collect all Observations (O_j) assigned to cluster j. b. If O_j is not empty: Calculate new_centroid_j = f(O_j). Add new_centroid_j to C_new. ---> Potentially, some original centroids might not have any observations assigned, so C_new might have fewer centroids than C_curr. Output: New Centroids (C_new)
2. Full K-Means Clustering (KMeans
function):
Input: Observations (O), Initial Centroids (C_init), Number of Iterations (iters), CalculateCentroid function (f) 1. Set CurrentCentroids = C_init. 2. Loop 'iters' times: CurrentCentroids = Do(O, CurrentCentroids, f) // Perform one iteration ---> CurrentCentroids are updated. 3. FinalCentroids = CurrentCentroids. 4. Clusters, TotalError = GetClusters(O, FinalCentroids) ---> Assigns each observation to its final cluster based on FinalCentroids. The first element of each sub-array in Clusters is the centroid itself. Output: FinalCentroids, Clusters
The unit tests in kmeans_test.go
provide excellent examples of how to implement the Clusterable
, Centroid
, and CalculateCentroid
requirements for a simple 2D point scenario. They demonstrate the expected behavior of the Do
and KMeans
functions, including edge cases like empty inputs or losing centroids when clusters become empty.
The maintenance
module in Perf is responsible for executing a set of long-running background processes that are essential for the health and operational integrity of a Perf instance. These tasks ensure that data is kept up-to-date, system configurations are current, and storage is managed efficiently. The module is designed to be started once and run continuously, performing its duties at predefined intervals.
The core design principle behind the maintenance
module is to centralize various periodic tasks that would otherwise be scattered or require manual intervention. By consolidating these operations, the system becomes more robust and easier to manage.
Key design choices include:
config.MaintenanceFlags
) and the instance-specific configuration (config.InstanceConfig
). This provides flexibility for different Perf deployments and operational needs.builders.NewDBPoolFromConfig
), Git interfaces (builders.NewPerfGitFromConfig
), and caching mechanisms (builders.GetCacheFromConfig
) are created and passed into the respective maintenance tasks. This promotes modularity and testability.sklog
) to provide visibility into its operations and to aid in diagnosing issues. While errors in one task might be logged, the overall Start
function aims to keep other independent tasks running.flags.MigrateRegressions
, instanceConfig.EnableSheriffConfig
). This allows for gradual rollouts and testing in production environments.The maintenance
module orchestrates several distinct background processes.
1. Core Initialization and Schema Management (maintenance.go
)
tracing.Init
: Sets up the distributed tracing system.builders.NewDBPoolFromConfig
: Establishes a connection pool to the database.expectedschema.ValidateAndMigrateNewSchema
: Checks the current database schema version against the expected version defined in the codebase. If they don't match, it applies the necessary migrations to bring the schema up to date. This is a critical step to prevent data corruption or application errors due to schema mismatches.2. Git Repository Synchronization (maintenance.go
)
builders.NewPerfGitFromConfig
: Creates an instance of perfgit.Git
, which provides an interface to the Git repository.g.StartBackgroundPolling(ctx, gitRepoUpdatePeriod)
: This method launches a goroutine within the perfgit
component. This goroutine periodically fetches the latest changes from the remote Git repository (origin) and updates the local representation, typically also updating a Commits
table in the database with new commit information. The gitRepoUpdatePeriod
constant (e.g., 1 minute) defines how frequently this update occurs.3. Regression Schema Migration (maintenance.go
)
flags.MigrateRegressions
flag.migration.New
: Creates a Migrator
instance, likely configured with database connections for both the old and new regression storage mechanisms.migrator.RunPeriodicMigration(regressionMigratePeriod, regressionMigrationBatchSize)
: Starts a goroutine that, at intervals defined by regressionMigratePeriod
, processes a regressionMigrationBatchSize
number of regressions, moving them from the old storage to the new. This batching approach prevents overwhelming the database and allows the migration to proceed incrementally.4. Sheriff Configuration Import (maintenance.go
)
instanceConfig.EnableSheriffConfig
and a non-empty instanceConfig.InstanceName
.AlertStore
and SubscriptionStore
for managing alert and subscription data within Perf.luciconfig.NewApiClient
: Creates a client to communicate with the LUCI Config service.sheriffconfig.New
: Initializes the SheriffConfig
service, which encapsulates the logic for fetching, parsing, and applying Sheriff configurations.sheriffConfig.StartImportRoutine(configImportPeriod)
: Launches a goroutine that periodically (every configImportPeriod
) polls the LUCI Config service for the specified instance. If new or updated configurations are found, they are processed and stored/updated in Perf's database (e.g., in the Alerts
and Subscriptions
tables).5. Query Cache Refresh (maintenance.go
)
Why: To speed up common queries (e.g., retrieving the set of available trace parameters, known as ParamSets), Perf can cache this information. This component is responsible for periodically rebuilding and refreshing these caches.
How:
flags.RefreshQueryCache
flag.builders.NewTraceStoreFromConfig
: Gets an interface to the trace data.dfbuilder.NewDataFrameBuilderFromTraceStore
: Creates a utility for building data frames from traces, which is likely used to derive the ParamSet.psrefresh.NewDefaultParamSetRefresher
: Initializes a component specifically designed to refresh ParamSets. It uses the DataFrameBuilder
to scan trace data and determine the current set of unique parameter key-value pairs.psRefresher.Start(time.Hour)
: Starts a goroutine to refresh the primary ParamSet (perhaps stored directly in the database or an in-memory representation) hourly.builders.GetCacheFromConfig
: If a distributed cache like Redis is configured, this obtains a client for it.psrefresh.NewCachedParamSetRefresher
: Wraps the primary psRefresher
with a caching layer.cacheParamSetRefresher.StartRefreshRoutine(redisCacheRefreshPeriod)
: Starts another goroutine that takes the ParamSet generated by psRefresher
and populates the external cache (e.g., Redis) at redisCacheRefreshPeriod
intervals (e.g., every 4 hours). This provides a faster lookup path for frequently accessed ParamSet data.Workflow:
Trace Data --> DataFrameBuilder --> ParamSetRefresher (generates primary ParamSet) | v CachedParamSetRefresher --> External Cache (e.g., Redis)
6. Old Data Deletion (deletion/deleter.go
, maintenance.go
)
Why: Over time, Perf accumulates a large amount of data, including regression information and associated shortcuts (which are often links or identifiers for specific data views). To manage storage costs and maintain system performance, very old data that is unlikely to be accessed needs to be periodically deleted.
How:
flags.DeleteShortcutsAndRegressions
flag.deletion.New(db, ...)
: Initializes a Deleter
object. This object encapsulates the logic for identifying and removing outdated regressions and shortcuts. It takes a database connection pool (db
) and the datastore type. Internally, it creates instances of sqlregressionstore
and sqlshortcutstore
to interact with the respective database tables.deleter.RunPeriodicDeletion(deletionPeriod, deletionBatchSize)
: This method in maintenance.go
calls the RunPeriodicDeletion
method on the Deleter
instance.deleter.go
, RunPeriodicDeletion
starts a goroutine.deletionPeriod
(e.g., every 15 minutes).d.DeleteOneBatch(deletionBatchSize)
.Deleter.DeleteOneBatch(shortcutBatchSize)
:d.getBatch(ctx, shortcutBatchSize)
to identify a batch of regressions and shortcuts eligible for deletion.Deleter.getBatch(...)
:Regressions
table.Regressions
table for ranges of commits, starting from the oldest.Low
and High
StepPoint
s.StepPoint
's timestamp is older than the defined ttl
(Time-To-Live, currently -18 months), the associated shortcut and the commit number of the regression are marked for deletion.shortcutBatchSize
.d.deleteBatch(ctx, commitNumbers, shortcuts)
to perform the actual deletion.Deleter.deleteBatch(...)
:commitNumbers
and calls d.regressionStore.DeleteByCommit()
for each, removing the regression data associated with that commit.shortcuts
and calls d.shortcutStore.DeleteShortcut()
for each, removing the shortcut entry.Deletion Workflow:
Timer (every deletionPeriod) --> DeleteOneBatch | v getBatch (identifies old data based on TTL) | | Returns (commit_numbers_to_delete, shortcut_ids_to_delete) v deleteBatch (deletes in a transaction) | +--> RegressionStore.DeleteByCommit +--> ShortcutStore.DeleteShortcut
The ttl
variable in deleter.go
is set to -18 months, meaning regressions and their associated shortcuts older than 1.5 years are targeted for deletion. This value was determined based on stakeholder requirements for data retention.
The select {}
at the end of the Start
function in maintenance.go
is a common Go idiom to make the main goroutine (the one that called Start
) block indefinitely. Since all the actual work is done in background goroutines launched by Start
, this prevents the Start
function from returning and thus keeps the maintenance processes alive.
The notify
module in Perf is responsible for handling notifications related to performance regressions. It provides a flexible framework for formatting and sending notifications through various channels like email, issue trackers, or custom endpoints like Chromeperf.
Core Concepts and Design:
The notification system is built around a few key abstractions:
Notifier
Interface (notify.go
): This is the central interface for sending notifications. It defines methods for:
RegressionFound
: Called when a new regression is detected.RegressionMissing
: Called when a previously detected regression is no longer found (e.g., due to new data or fixes).ExampleSend
: Used for sending test/dummy notifications to verify configuration.UpdateNotification
: For updating an existing notification (e.g., adding a comment to an issue).Formatter
Interface (notify.go
): This interface is responsible for constructing the content (body and subject) of a notification. Implementations exist for:
HTMLFormatter
(html.go
): Generates HTML-formatted notifications, suitable for email.MarkdownFormatter
(markdown.go
): Generates Markdown-formatted notifications, suitable for issue trackers or other systems that support Markdown. The formatters use Go's text/template
package, allowing for customizable notification messages. Templates can access a TemplateContext
(or AndroidBugTemplateContext
for Android-specific notifications) which provides data about the regression, commit, alert, etc.Transport
Interface (notify.go
): This interface defines how a formatted notification is actually sent. Implementations include:
EmailTransport
(email.go
): Sends notifications via email using the emailclient
module.IssueTrackerTransport
(issuetracker.go
): Interacts with an issue tracking system (configured for Google's Issue Tracker/Buganizer) to create or update issues. It uses the go/issuetracker/v1
client and requires an API key for authentication.NoopTransport
(noop.go
): A “do nothing” implementation, useful for disabling notifications or for testing.NotificationDataProvider
Interface (notification_provider.go
): This interface is responsible for gathering the necessary data to populate the notification templates.
defaultNotificationDataProvider
uses a Formatter
to generate the notification body and subject based on RegressionMetadata
.androidNotificationProvider
(android_notification_provider.go
) is a specialized provider for Android-specific bug reporting. It uses its own AndroidBugTemplateContext
which includes Android-specific details like Build ID diff URLs. It leverages the MarkdownFormatter
for content generation but with Android-specific templates.Workflow for Sending a Notification (Simplified):
alerter
module).Notifier
's RegressionFound
method is called with details about the regression (commit, alert configuration, cluster summary, etc.).Notifier
(typically defaultNotifier
) uses its NotificationDataProvider
to get the raw notification data (body and subject).NotificationDataProvider
populates a context object (e.g., TemplateContext
or AndroidBugTemplateContext
).Formatter
(e.g., MarkdownFormatter
) to execute the appropriate template with this context, producing the final body and subject.Notifier
then calls its Transport
's SendNewRegression
method, passing the formatted body and subject.Transport
implementation handles the actual sending (e.g., makes an API call to the issue tracker or sends an email).Regression Detected --> Notifier.RegressionFound(...) | v NotificationDataProvider.GetNotificationDataRegressionFound(...) | | (Populates Context, e.g., TemplateContext) v Formatter.FormatNewRegressionWithContext(...) | (Uses Go templates) v Formatted Body & Subject | v Transport.SendNewRegression(body, subject) | +------------------> EmailTransport --> Email Server | +------------------> IssueTrackerTransport --> Issue Tracker API | +------------------> NoopTransport --> (Does nothing)
Key Files and Responsibilities:
notify.go
:
Notifier
, Formatter
, Transport
.defaultNotifier
implementation, which orchestrates the notification process by composing a NotificationDataProvider
, Formatter
, and Transport
.New()
factory function that constructs the appropriate Notifier
based on the NotifyConfig
. This is the main entry point for creating a notifier.TemplateContext
used by generic formatters.getRegressionMetadata
to fetch additional information like source file links from TraceStore
if the alert is for an individual trace.notification_provider.go
:
NotificationDataProvider
interface.defaultNotificationDataProvider
which uses a generic Formatter
.Notifier
or Transport
mechanisms.android_notification_provider.go
:
NotificationDataProvider
specifically for Android bug creation.AndroidBugTemplateContext
to provide Android-specific data to templates, such as GetBuildIdUrlDiff
for generating links to compare Android build CLs.MarkdownFormatter
but configures it with Android-specific notification templates defined in the NotifyConfig
. This allows Android teams to customize their bug reports.markdown.go
& html.go
:
Formatter
interface for Markdown and HTML respectively.MarkdownFormatter
can be configured with custom templates via NotifyConfig
. It also provides a buildIDFromSubject
template function, specifically designed for Android's commit message format, to extract build IDs.viewOnDashboard
is a utility function to construct a URL to the Perf explore page for the given regression.email.go
& issuetracker.go
& noop.go
:
Transport
interface.email.go
: Uses emailclient
to send emails. Splits comma/space-separated recipient lists.issuetracker.go
: Interacts with the Google Issue Tracker API. It requires API key secrets (configured via NotifyConfig
) and uses OAuth2 for authentication. It can create new issues and update existing ones (e.g., to mark them obsolete).noop.go
: A null implementation for disabling notifications.chromeperfnotifier.go
:
Notifier
interface directly, without using the Formatter
or Transport
abstractions in the same way as defaultNotifier
. This is because it communicates directly with the Chrome Performance Dashboard's Anomaly API.ReportRegression
).isParamSetValid
, getTestPath
) to ensure the data conforms to Chromeperf's requirements (e.g., specific param keys like master
, bot
, benchmark
, test
).improvement_direction
parameter and the step direction.commitrange.go
:
URLFromCommitRange
, a utility function to generate a URL for a commit or a range of commits. If a commitRangeURLTemplate
is provided (e.g., via configuration), it will be used to create a URL showing the diff between two commits. Otherwise, it defaults to the individual commit's URL. This is used by formatters to create links in notifications.common/notificationData.go
:
NotificationData
(simple struct for body and subject) and RegressionMetadata
(a comprehensive struct holding all relevant information about a regression needed for notification generation). This promotes a common data structure for passing regression details.Configuration and Customization (NotifyConfig
):
The behavior of the notify
module is heavily influenced by config.NotifyConfig
. This configuration allows users to:
Notifications
field): None
, HTMLEmail
, MarkdownIssueTracker
, ChromeperfAlerting
, AnomalyGrouper
.NotificationDataProvider
: DefaultNotificationProvider
or AndroidNotificationProvider
.Subject
, Body
, MissingSubject
, MissingBody
). This is particularly relevant for MarkdownFormatter
and androidNotificationProvider
.IssueTrackerTransport
(API key secret locations).This design allows for flexibility in how notifications are generated and delivered, catering to different needs and integrations. For instance, the Android team can have highly customized bug reports, while other users might prefer standard email notifications. The ChromeperfNotifier
demonstrates a direct integration with another system, bypassing some of the general-purpose formatting/transport layers when a specific API is targeted.
The notifytypes
module in Perf defines the various types of notification mechanisms that can be triggered in response to performance regressions or other significant events. It also defines types for data providers that supply the necessary information for these notifications. This module serves as a central point for enumerating and categorizing notification strategies, enabling flexible and extensible notification handling within the Perf system.
The primary goal of this module is to provide a structured and type-safe way to manage notification types.
Type
string, new notification methods can be easily added in the future without requiring significant code changes in consuming modules. This promotes loose coupling and allows the notification system to evolve independently.HTMLEmail
, MarkdownIssueTracker
) instead of raw strings makes the code more self-documenting and reduces the likelihood of errors due to typos.NotificationDataProviderType
allows for different sources or formats of data to be used for generating notifications, separating the concern of what data is needed from how the notification is delivered. This is crucial, for example, when different platforms (like Android) might require specific data formatting or additional information.Type
(string alias): The Type
is defined as an alias for string
. This allows for string-based storage and transmission of notification types (e.g., in configuration files or database entries) while still providing a degree of type safety within Go code.Type
. This ensures that only valid, predefined notification types can be used.HTMLEmail
: Indicates notifications sent as HTML-formatted emails. This is suitable for rich content and direct user communication.MarkdownIssueTracker
: Represents notifications formatted in Markdown, intended for integration with issue tracking systems. This facilitates automated ticket creation or updates.ChromeperfAlerting
: Specifies that regression data should be sent to the Chromeperf alerting system. This allows for integration with a specialized alerting infrastructure.AnomalyGrouper
: Designates that regressions should be processed by an anomaly grouping logic, which then determines the appropriate action. This enables more sophisticated handling of multiple related anomalies.None
: A special type indicating that no notification should be sent. This is useful for disabling notifications in certain contexts or for configurations where alerting is not desired.AllNotifierTypes
Slice: This public variable provides a convenient way for other parts of the system to iterate over or validate against all known notification types.NotificationDataProviderType
(string alias): Similar to Type
, this defines the kind of data provider to use for notifications.DefaultNotificationProvider
: Represents the standard or default data provider.AndroidNotificationProvider
: Indicates a specialized data provider tailored for Android-specific notification requirements. This might involve fetching different metrics, formatting data in a particular way, or including Android-specific metadata.notifytypes.go
: This is the sole file in the module and contains all the definitions.HTMLEmail
, MarkdownIssueTracker
, ChromeperfAlerting
, AnomalyGrouper
, None
). This acts as a contract for other modules that implement or consume notification functionalities.DefaultNotificationProvider
, AndroidNotificationProvider
) that can be used to source information for notifications. This allows the notification system to adapt to different data sources or formats.AllNotifierTypes
variable makes it easy for other components to get a list of all valid notification types, for example, for display in a UI or for validation purposes.While this module itself doesn't implement workflows, it underpins them. A typical conceptual workflow where these types would be used is:
Regression Event -->
notifytypes.Type
. Configuration Lookup (specifies notifytypes.Type, e.g., HTMLEmail) -->
notifytypes.Type
from the configuration, the appropriate notifier implementation is selected. Notification System -->
notifytypes.NotificationDataProviderType
, the corresponding data provider is chosen. Data Provider (e.g., AndroidNotificationProvider) -->
Notification Delivered (e.g., Email Sent)
For example, if a regression is detected for an Android benchmark and the configuration specifies HTMLEmail
as the Type
and AndroidNotificationProvider
as the NotificationDataProviderType
:
Regression Event
-> Config: {Type: HTMLEmail, DataProvider: AndroidNotificationProvider}
-> Select EmailNotifier
-> Select AndroidDataProvider
-> AndroidDataProvider fetches data
-> EmailNotifier formats and sends HTML email
The perf-tool
module provides a command-line interface (CLI) for interacting with various aspects of the Perf performance monitoring system. It allows developers and administrators to manage configurations, inspect data, perform database maintenance tasks, and validate ingestion files.
The primary motivation behind perf-tool
is to offer a centralized and scriptable way to perform common Perf operations that would otherwise require manual intervention or direct database interaction. This simplifies workflows and enables automation of routine tasks.
The core functionality is organized into subcommands, each addressing a specific area of Perf:
config
: Manages Perf instance configurations.create-pubsub-topics-and-subscriptions
: Sets up the necessary Google Cloud Pub/Sub topics and subscriptions required for data ingestion. This is crucial for ensuring that Perf instances can receive and process performance data.validate
: Checks the syntax and validity of a Perf instance configuration file. This helps prevent deployment of misconfigured instances.tiles
: Interacts with the tiled data storage used by Perf's tracestore
. Tiles are segments of time-series data.last
: Displays the index of the most recent tile, providing insight into the current state of data ingestion.list
: Shows a list of recent tiles and the number of traces they contain, useful for understanding data volume and distribution.traces
: Allows querying and exporting trace data.list
: Retrieves and displays the IDs of traces that match a given query within a specific tile. This is useful for ad-hoc data exploration.export
: Exports trace data matching a query and commit range to a JSON file. This enables external analysis or data migration.ingest
: Manages the data ingestion process.force-reingest
: Triggers the re-ingestion of data files from Google Cloud Storage (GCS) for a specified time range. This is useful for reprocessing data after configuration changes or to fix ingestion errors. The workflow is:validate
: Validates the format and content of an ingestion file against the expected schema and parsing rules. This helps ensure data quality before ingestion.database
: Provides tools for backing up and restoring Perf database components. This is critical for disaster recovery and data migration.backup
:alerts
: Backs up alert configurations to a zip file.shortcuts
: Backs up saved shortcut configurations to a zip file.regressions
: Backs up regression data (detected performance changes) and associated shortcuts to a zip file. It backs up data up to a specified date (defaulting to four weeks ago). The process involves iterating backward through commits in batches, fetching regressions for each commit range, and storing them along with any shortcuts referenced in those regressions.restore
:alerts
: Restores alert configurations from a backup file.shortcuts
: Restores shortcut configurations from a backup file.regressions
: Restores regression data and their associated shortcuts from a backup file. It's important to note that restoring regressions also attempts to re-create the associated shortcuts.trybot
: Contains experimental functionality related to trybot (pre-submit testing) data.reference
: Generates a synthetic nanobench reference file. This file is constructed by loading a specified trybot results file, identifying all trace IDs within it, and then fetching historical sample data for these traces from the main Perf instance (specifically, from the last N ingested files). The aggregated historical samples are then formatted into a new nanobench JSON file. This allows for comparing trybot results against a baseline derived from recent production data using tools like nanostat
.markdown
: Generates Markdown documentation for the perf-tool
CLI itself.The main.go
file sets up the CLI application using the urfave/cli
library. It defines flags, commands, and subcommands, and maps them to corresponding functions in the application
package. It handles flag parsing, configuration loading (from a file, with optional connection string overrides), and initialization of logging.
The application/application.go
file defines the Application
interface and its concrete implementation app
. This interface abstracts the core logic for each command, promoting testability and separation of concerns. The app
struct implements methods that interact with various Perf components like tracestore
, alertStore
, shortcutStore
, regressionStore
, and GCS.
Key design choices include:
Application
interface): This allows for mocking the application logic during testing (as seen in main_test.go
and application/mocks/Application.go
), ensuring that the CLI command parsing and flag handling can be tested independently of the actual backend operations.--config_filename
), which defines data store connections, GCS sources, etc. This makes the tool adaptable to different Perf deployments.perf/go/builders
are used to instantiate components like TraceStore
, AlertStore
, etc., based on the provided instance configuration. This centralizes component creation logic.encoding/gob
. This provides a simple and portable backup solution.regressionBatchSize
) to manage memory and avoid overwhelming the database.ingest force-reingest
command leverages Pub/Sub by publishing messages that mimic GCS notifications, effectively triggering the standard ingestion pipeline.The application/mocks/Application.go
file contains a mock implementation of the Application
interface, generated by the mockery
tool. This is used in main_test.go
to test the command-line argument parsing and dispatch logic without actually performing the underlying operations.
The perfclient
module provides an interface for sending performance data to Skia Perf's ingestion system. The primary goal of this module is to abstract the complexities of interacting with Google Cloud Storage (GCS), which is the underlying mechanism Perf uses for data ingestion. By providing a dedicated client, it simplifies the process for other applications and services that need to report performance metrics.
The core design centers around a ClientInterface
and its concrete implementation, Client
. This approach allows for easy mocking and testing, promoting loose coupling between the perfclient
and its consumers.
Key Components and Responsibilities:
perf_client.go
:
ClientInterface
: This interface defines the contract for pushing performance data. The key method is PushToPerf
. The decision to use an interface here is crucial for testability and dependency injection. It allows consumers to use a real GCS-backed client in production and a mock client in tests.Client
: This struct is the concrete implementation of ClientInterface
. It holds a gcs.GCSClient
instance, which is responsible for the actual communication with Google Cloud Storage, and a basePath
string that specifies the root directory within the GCS bucket where performance data will be stored. The constructor New
takes these as arguments, allowing users to configure the GCS bucket and the top-level folder for their data.PushToPerf
method: This is the workhorse of the module.time.Time
object (now
), a folderName
, a filePrefix
, and a format.BenchData
struct (which represents the performance metrics).format.BenchData
is first marshaled into a JSON string. This is the standard format Perf expects for ingestion.gzip
. This is a performance optimization, as GCS can automatically decompress gzipped files with the correct ContentEncoding
header, reducing storage costs and transfer times.objectPath
helper function. This path incorporates the basePath
, the current timestamp (formatted as YYYY/MM/DD/HH/
), the folderName
, and a filename composed of the filePrefix
, an MD5 hash of the JSON data, and a millisecond-precision timestamp. The inclusion of the MD5 hash helps in avoiding duplicate uploads of identical data and can be useful for debugging or data verification. The timestamp in the path and filename ensures that data from different runs or times are stored separately and can be easily queried.storageClient.SetFileContents
method. Crucially, it sets ContentEncoding: "gzip"
and ContentType: "application/json"
in the gcs.FileWriteOptions
. This metadata informs GCS about the compression and data type, enabling features like automatic decompression.objectPath
function: This helper function is responsible for constructing the unique GCS path for each performance data file. The rationale for this specific path structure (basePath/YYYY/MM/DD/HH/folderName/filePrefix_hash_timestamp.json
) is to organize data chronologically and by task, making it easier to browse, query, and manage within GCS. The hash ensures uniqueness and integrity.mock_perf_client.go
:
MockPerfClient
: This provides a mock implementation of ClientInterface
using the testify/mock
library. This is essential for unit testing components that depend on perfclient
without requiring actual GCS interaction. It allows developers to define expected calls to PushToPerf
and verify that their code interacts with the client correctly. The NewMockPerfClient
constructor returns a pointer to ensure that the methods provided by mock.Mock
(like On
and AssertExpectations
) are accessible.Workflow: Pushing Performance Data
The primary workflow involves a client application using perfclient
to send performance data:
Client App perfclient.Client gcs.GCSClient | | | | -- Call PushToPerf(now, | | | folder, prefix, data) ->| | | | -- Marshal data to JSON | | | -- Compress JSON (gzip) | | | -- Construct GCS objectPath | | | (includes time, folder, | | | prefix, data hash) | | | | | | -- Call SetFileContents(path, | | | options, compressed_data) -> | | | | -- Upload to GCS | | | with gzip encoding | | | and JSON content type | | <-------------------------------| -- Return success/error | <--------------------------| | | -- Receive success/error | |
The design emphasizes creating a clear separation of concerns: the perfclient
handles the formatting, compression, and path generation logic specific to Perf's ingestion requirements, while the underlying gcs.GCSClient
handles the raw GCS communication. This makes the perfclient
a focused and reusable component for any system needing to integrate with Skia Perf.
The perfresults
module is responsible for fetching, parsing, and processing performance results data generated by Telemetry-based benchmarks in the Chromium project. This data typically resides in perf_results.json
files. The module provides functionalities to:
perf_results.json
files.perf_results.json
files. These files contain sets of histograms, where each histogram represents a specific benchmark measurement. The parser extracts these histograms and associated metadata.The primary goal is to provide a reliable and efficient way to access and utilize Chromium's performance data for analysis and monitoring.
The process of loading performance results from a Buildbucket build involves several steps:
Buildbucket ID -> BuildInfo -> Swarming Task ID -> Child Swarming Task IDs -> CAS Outputs -> PerfResults
Buildbucket Interaction (buildbucket.go
):
bbClient
interacts with the Buildbucket PRPC API to fetch build details using a given buildID
. It specifically requests fields like builder
, status
, infra.backend.task.id
(for the Swarming task ID), output.properties
(for git revision information), and input.properties
(for perf_dashboard_machine_group
).BuildInfo
struct is populated with this information, providing a consolidated view of the build's context. The GetPosition()
method on BuildInfo
is crucial as it determines the commit identifier (either commit position or git hash) used for associating the performance data with a specific point in the codebase.Swarming Interaction (swarming.go
):
swarmingClient
uses the Swarming PRPC API.findChildTaskIds
: Given a parent Swarming task ID (obtained from BuildInfo
), this function lists all child tasks by querying for tasks with a matching parent_task_id
tag. The query is scoped by the parent task's creation and completion timestamps to narrow down the search.findTaskCASOutputs
: For each child task ID, this function retrieves the task result, specifically looking for the CasOutputRoot
. This reference points to the RBE-CAS location where the task's output files (including perf_results.json
) are stored.RBE-CAS Interaction (rbecas.go
):
perf_results.json
files are stored in RBE-CAS. RBE-CAS provides efficient and reliable storage for large build artifacts.RBEPerfLoader
uses the RBE SDK to interact with CAS.fetchPerfDigests
: Given a CAS reference (pointing to the root directory of a task's output), this function:Directory
proto.GetDirectoryTree
.perf_results.json
. The path structure is expected to be benchmark_name/perf_results.json
, allowing association of results with a specific benchmark.loadPerfResult
: Given a digest for a perf_results.json
file, this reads the blob from CAS and parses it using NewResults
.LoadPerfResults
: This orchestrates the loading for multiple CAS references (from multiple child Swarming tasks). It iterates through each CAS reference, fetches the digests of perf_results.json
files, loads each file, and then merges results from the same benchmark. Merging is important because a single benchmark might have its results split across multiple files or tasks.Orchestration (perf_loader.go
):
loader.LoadPerfResults
method coordinates the entire workflow:bbClient
to get BuildInfo
.swarmingClient
to find child task IDs and then their CAS outputs.checkCasInstances
) to ensure all CAS outputs come from the same RBE instance, simplifying client initialization.RBEPerfLoader
(via rbeProvider
for testability) for the determined CAS instance.RBEPerfLoader.LoadPerfResults
with the list of CAS references to fetch and parse all perf_results.json
files.rbeProvider
is a good example of dependency injection, allowing tests to mock the RBE-CAS interaction.perf_results_parser.go
)perf_results.json
files have a specific, somewhat complex structure. A dedicated parser is needed to extract meaningful data (histograms and their metadata).PerfResults
struct is the main container, holding a map of TraceKey
to Histogram
.TraceKey
uniquely identifies a trace, composed of ChartName
(metric name), Unit
, Story
(user journey/test case), Architecture
, and OSName
. These fields are extracted from the histogram's own properties and its associated “diagnostics” which are references to other metadata objects within the JSON file.Histogram
stores the SampleValues
(the actual measurements).NewResults
uses json.NewDecoder
to process the input io.Reader
in a streaming fashion.perf_results.json
files can be very large (10MB+). Reading the entire file into memory before parsing would be inefficient and could lead to high memory usage. Streaming allows processing the JSON array element by element.[
of the JSON array.decoder.More()
is true, decoding each element into a singleEntry
struct.singleEntry
is a union-like struct that can hold different types of objects found in the JSON (histograms, generic sets, date ranges, related name maps). This is determined by checking fields like Name
(present for histograms) or Type
.entry.Name != ""
), it‘s converted to TraceKey
and Histogram
via histogramRaw.asTraceKeyAndHistogram
. This conversion involves looking up GUIDs from the histogram’s Diagnostics
map in a locally maintained metadata
map (md
).GenericSet
, DateRange
, RelatedNameMap
) are stored in the md
map, keyed by their GUID
, so they can be referenced by histograms later in the stream.pr.Histograms
. If a TraceKey
already exists, sample values are appended.]
of the JSON array.Histogram
type provides methods for common aggregations (Min, Max, Mean, Stddev, Sum, Count). AggregationMapping
provides a convenient way to access these aggregation functions by string keys, which is used by downstream consumers like the ingestion module.UnmarshalJSON
: An UnmarshalJSON
method exists, which reads the entire byte slice into memory. This is less efficient and marked for deprecation in favor of NewResults
.ingest/
)This submodule focuses on transforming the parsed PerfResults
into the format.Format
structure required by the Perf ingestion system.
json.go
(ConvertPerfResultsFormat
):
PerfResults
structure is not directly ingestible. It needs to be reshaped.(TraceKey, Histogram)
pair in the input PerfResults
.format.Result
. The Key
map within format.Result
is populated from TraceKey
fields (chart, unit, story, arch, os).Measurements
map within format.Result
is populated by calling toMeasurement
on the Histogram
.toMeasurement
iterates through perfresults.AggregationMapping
, applying each aggregation function to the histogram's samples. Each resulting aggregation (e.g., “max”, “mean”) becomes a format.SingleMeasurement
with the aggregation type as its Value
and the computed metric as its Measurement
.format.Format
object includes the version, commit hash (GitHash
), and any provided headers and links.gcs.go
:
convertPath
: Constructs a GCS path like gs://<bucket>/ingest/<time_path>/<build_info_path>/<benchmark>
.convertTime
: Formats a time.Time
into YYYY/MM/DD/HH
(UTC).convertBuildInfo
: Formats BuildInfo
into <MachineGroup>/<BuilderName>
. It defaults MachineGroup
to “ChromiumPerf” and BuilderName
to “BuilderNone” if they are empty.isInternal
: Determines if the results are internal or public based on the BuilderName
. It checks against a list of known external bot configurations (pinpoint/go/bot_configs
). If not found, it defaults to internal. This determines whether PublicBucket
(chrome-perf-public
) or InternalBucket
(chrome-perf-non-public
) is used.perf_loader.go
: Orchestrates the loading of performance results from Buildbucket. NewLoader().LoadPerfResults()
is the main entry point.buildbucket.go
: Handles interaction with the Buildbucket API to fetch build metadata. Defines BuildInfo
.swarming.go
: Handles interaction with the Swarming API to find child tasks and their CAS outputs.rbecas.go
: Handles interaction with RBE-CAS to download and parse perf_results.json
files. Defines RBEPerfLoader
.perf_results_parser.go
: Parses the content of perf_results.json
files. Defines PerfResults
, TraceKey
, Histogram
, and the streaming NewResults
parser.ingest/json.go
: Transforms parsed PerfResults
into the format.Format
structure for ingestion.ingest/gcs.go
: Provides utilities to determine GCS paths for storing transformed results.cli/main.go
: A command-line interface utility that uses the perfresults
library to fetch results for a given Buildbucket ID and outputs them as JSON files in the ingestion format. This serves as a practical example and a tool for ad-hoc data retrieval.testdata/
: Contains JSON files used for replaying HTTP and gRPC interactions during tests (*.json
, *.rpc
), and sample perf_results.json
files for parser testing. replay_test.go
sets up the replay mechanism.User/System --Buildbucket ID--> perf_loader.LoadPerfResults() | +--> buildbucket.findBuildInfo() --PRPC call--> Buildbucket API | (Returns BuildInfo: Swarming Task ID, Git Revision, Machine Group, etc.) | +--> swarming.findChildTaskIds() --PRPC call--> Swarming API (using Parent Task ID) | (Returns list of Child Swarming Task IDs) | +--> swarming.findTaskCASOutputs() --PRPC calls--> Swarming API (for each Child Task ID) | (Returns list of CASReference objects) | (Error if CAS instances differ for CASReferences) | +--> rbecas.RBEPerfLoader.LoadPerfResults() (with list of CASReferences) | +--> For each CASReference: | | | +--> rbecas.fetchPerfDigests() --RBE SDK calls--> RBE-CAS | | (Returns map of benchmark_name to digest of perf_results.json) | | | +--> For each (benchmark_name, digest): | | | +--> rbecas.loadPerfResult() --RBE SDK call (ReadBlob)--> RBE-CAS | | | | | +--> perf_results_parser.NewResults() (Parses JSON stream) | | (Returns PerfResults object for this file) | | | +--> (Merge with existing PerfResults for the same benchmark_name) | (Returns map[benchmark_name]*PerfResults and BuildInfo)
CLI User --Build ID, Output Dir--> cli/main.main() | +--> perfresults.NewLoader().LoadPerfResults(Build ID) | (Executes the Primary Workflow described above) | (Returns BuildInfo, map[benchmark]*PerfResults) | +--> For each (benchmark, perfResult) in results: | +--> ingest.ConvertPerfResultsFormat(perfResult, buildInfo.GetPosition(), headers, links) | (Transforms PerfResults to ingest.Format) | +--> Marshal ingest.Format to JSON | +--> Write JSON to output file: <outputDir>/<benchmark>_<BuildID>.json | +--> Print output filename to stdout
The workflows/worker/main.go
file sets up a Temporal worker. Currently, it‘s a basic skeleton that initializes a worker and connects to a Temporal server. It doesn’t register any specific activities or workflows from the perfresults
module itself. Its presence suggests an intention to integrate perfresults
functionalities into Temporal workflows in the future, possibly for automated ingestion or processing tasks. The worker itself is a generic Temporal worker setup.
The module employs a robust testing strategy:
_test.go
file with unit tests for its specific logic. For example, perf_results_parser_test.go
tests the JSON parsing, and buildbucket_test.go
tests BuildInfo
logic.replay_test.go
, testdata/
):cloud.google.com/go/httpreplay
. Recorded interactions are stored as .json
files in testdata/
.cloud.google.com/go/rpcreplay
. Recorded interactions are stored as gzipped .rpc
files in testdata/
.-record_path
) controls whether tests run in replay mode (reading from testdata/
) or record mode (writing new replay files to the specified path). This allows updating replay files when external APIs change or new test cases are needed.setupReplay()
and newRBEReplay()
in replay_test.go
are helper functions that configure the HTTP client and RBE client for either recording or replaying.testdata/perftest/
): Contains various perf_results.json
files (e.g., full.json
, empty.json
, merged.json
) to test different scenarios for the perf_results_parser.go
. This ensures the parser correctly handles different valid and edge-case inputs.cli/main.go
): The CLI itself serves as an integration test for the core loading and conversion logic. Its tests (perf_loader_test.go
for example) often use the replay mechanism to test the end-to-end flow from Build ID to parsed PerfResults
.This combination ensures both isolated unit correctness and reliable integration testing without external dependencies during typical test runs.
The perfserver
module serves as the central executable for the Perf performance monitoring system. It consolidates various essential components into a single command-line tool, simplifying deployment and management. The primary goal is to provide a unified entry point for running the web UI, data ingestion processes, regression detection, and maintenance tasks. This approach avoids the complexity of managing multiple separate services and their configurations.
The module leverages the urfave/cli
library to define and manage sub-commands, each corresponding to a distinct functional area of Perf. This design allows for clear separation of concerns while maintaining a single binary. Configuration for each sub-command is handled through flags, with the config
package providing structured types for these flags.
Key components and their responsibilities:
main.go
: This is the entry point of the perfserver
executable.
Why: It orchestrates the initialization and execution of the different Perf sub-systems.
How: It defines a cli.App
with several sub-commands:
frontend
: This sub-command launches the main web user interface for Perf.
frontend
component (from //perf/go/frontend
). Configuration is passed via config.FrontendFlags
. The frontend
component itself handles serving HTTP requests and rendering the UI.maintenance
: This sub-command starts background maintenance tasks.
maintenance
component (from //perf/go/maintenance
). It first validates the instance configuration (using //perf/go/config/validate
) and then starts the maintenance routines. Prometheus metrics are exposed for monitoring.ingest
: This sub-command runs the data ingestion process.
- **Why**: To continuously import performance data from various sources (e.g., build artifacts, test results) and populate the central data store (TraceStore). - **How**: It initializes and runs the ingestion process logic (from `//perf/go/ingest/process`). Similar to `maintenance`, it validates the instance configuration. It supports parallel ingestion for improved throughput. Prometheus metrics are also exposed. - Data Ingestion Workflow: `Configured Sources --> [Ingest Process]
--Parses/Validates--> [TraceStore] | Handles incoming files Populates data`
cluster
: This sub-command runs the regression detection process.
- **Why**: To automatically analyze incoming performance data against configured alerts and identify significant performance regressions. - **How**: Interestingly, this sub-command also utilizes the `frontend.New` and `f.Serve()` mechanism, similar to the `frontend` sub-command. This suggests that the regression detection logic might be tightly coupled with or exposed through the same underlying service framework as the main UI, potentially for sharing configuration or common infrastructure. It uses `config.FrontendFlags` but specifically for clustering-related settings (indicated by `AsCliFlags(true)`). - Regression Detection Workflow: `[TraceStore] --New Data--> [Cluster
Process] --Applies Alert Rules--> [Alerts/Notifications] ^ | | Identifies Regressions +-------------------------+`
markdown
: A utility sub-command to generate Markdown documentation for perfserver
itself.
ToMarkdown()
method provided by the urfave/cli
library.Logging: The Before
hook in the cli.App
configures sklog
to output logs to standard output, ensuring that operational messages from any sub-command are visible.
Configuration Loading: For sub-commands like ingest
and maintenance
, instance configuration is loaded from a specified file (ConfigFilename
flag) and validated using //perf/go/config/validate
. The database connection string can be overridden via a command-line flag.
Metrics: The ingest
and maintenance
sub-commands initialize Prometheus metrics, allowing for monitoring of their operational health and performance.
The design emphasizes modularity by delegating the core logic of each function (UI, ingestion, clustering, maintenance) to dedicated packages (//perf/go/frontend
, //perf/go/ingest/process
, //perf/go/maintenance
). perfserver
acts as the conductor, parsing command-line arguments, loading appropriate configurations, and invoking the correct sub-system. This structure makes the overall Perf system more maintainable and easier to understand, as each component has a well-defined responsibility.
The /go/pinpoint
module provides a Go client for interacting with the Pinpoint service, which is part of Chromeperf. Pinpoint is a performance testing and analysis tool used to identify performance regressions and improvements. This client enables other Go applications within the Skia infrastructure to programmatically trigger Pinpoint jobs.
Core Functionality:
The primary purpose of this module is to abstract the complexities of making HTTP requests to the Pinpoint API. It handles authentication, request formatting, and response parsing. This allows other services to easily initiate two main types of Pinpoint jobs:
pinpointURL
endpoint.pinpointLegacyURL
for these types of jobs.Design Decisions and Implementation Choices:
pinpointURL
) and legacy try jobs (pinpointLegacyURL
). The client reflects this by having separate methods (CreateBisect
and CreateTryJob
) and corresponding request URL builder functions (buildBisectRequestURL
and buildTryJobRequestURL
). This design choice directly maps to the underlying Pinpoint API structure, making it clear which type of job is being created.buildBisectRequestURL
and buildTryJobRequestURL
functions are responsible for constructing these URLs by populating url.Values
and then encoding them. This is a direct consequence of how the Pinpoint API is designed.google.DefaultTokenSource
) with the auth.ScopeUserinfoEmail
scope. This is a standard approach for service-to-service authentication within the Google Cloud ecosystem, ensuring secure communication with the Pinpoint API.go/metrics2
to track the number of times bisect and try jobs are called and the number of times these calls fail. This is crucial for monitoring the reliability and usage of the Pinpoint integration.go/skerr
for wrapping errors. This provides more context to errors, making debugging easier. For example, if a Pinpoint request fails, the HTTP status code and response body are included in the error message.pinpoint/go/bot_configs
: For try jobs, the target
parameter is required by the Pinpoint API. This target
is derived from the Configuration
(bot) and Benchmark
using the bot_configs.GetIsolateTarget
function. This indicates a specific configuration setup for running the performance tests.test_path
Parameter for Bisect Jobs: The Pinpoint API requires a test_path
parameter for bisect jobs. This parameter is constructed by joining several components like “ChromiumPerf”, configuration, benchmark, chart, and story. This specific formatting is a legacy requirement of the Chromeperf API.bug_id
for Bisect Jobs: The Pinpoint API mandates the bug_id
parameter for bisect jobs. If not provided by the caller, the client defaults it to "null"
. This reflects a specific constraint of the upstream service.tags
Parameter: Both job types include a tags
parameter set to {"origin":"skia_perf"}
. This helps in tracking and filtering jobs originating from the Skia infrastructure within the Pinpoint system.Key Components/Files:
pinpoint.go
: This is the sole Go file in the module and contains all the logic.Client
struct: Represents the Pinpoint client. It holds the authenticated http.Client
and counters for metrics.New()
function: The constructor for the Client
. It initializes the HTTP client with appropriate authentication.CreateLegacyTryRequest
and CreateBisectRequest
structs: Define the structure of the data required to create try jobs and bisect jobs, respectively. These fields directly map to the parameters expected by the Pinpoint API.CreatePinpointResponse
struct: Defines the structure of the JSON response from Pinpoint, which includes the JobID
and JobURL
.CreateTryJob()
method:CreateLegacyTryRequest
and a context.Context
.buildTryJobRequestURL
to construct the request URL.pinpointLegacyURL
.CreatePinpointResponse
.CreateBisect()
method:CreateTryJob()
, but takes a CreateBisectRequest
.buildBisectRequestURL
.pinpointURL
.buildTryJobRequestURL()
function:CreateLegacyTryRequest
.Benchmark
and Configuration
.target
using bot_configs.GetIsolateTarget
.url.Values
with all relevant parameters from the request, including hardcoded values like comparison_mode
and tags
.buildBisectRequestURL()
function:CreateBisectRequest
.url.Values
with parameters from the request.bug_id
if not provided.test_path
parameter based on available request fields.tags
parameter.Key Workflows:
Creating a Bisect Job:
Application Code go/pinpoint.Client Pinpoint API ---------------- ------------------ ------------ 1. CreateBisectRequest data ----> 2. Calls client.CreateBisect() --> 3. buildBisectRequestURL() (constructs URL with params) 4. HTTP POST to pinpointURL --------> 5. Processes request 6. Returns JSON response <----------------------------------- 7. Receives HTTP response 8. Parses JSON into CreatePinpointResponse <--------------------------------- 9. Returns CreatePinpointResponse
Creating a Try Job (A/B Test):
Application Code go/pinpoint.Client Pinpoint API (Legacy) ---------------- ------------------ --------------------- 1. CreateLegacyTryRequest data -> 2. Calls client.CreateTryJob() --> 3. buildTryJobRequestURL() (gets 'target' from bot_configs, constructs URL with params) 4. HTTP POST to pinpointLegacyURL -----> 5. Processes request 6. Returns JSON response <---------------------------------------- 7. Receives HTTP response 8. Parses JSON into CreatePinpointResponse <--------------------------------- 9. Returns CreatePinpointResponse
The pivot
module provides functionality analogous to pivot tables in spreadsheets or GROUP BY
operations in SQL. Its primary purpose is to aggregate and summarize trace data within a DataFrame
based on specified grouping criteria and operations. This allows users to transform raw trace data into more insightful, summarized views, facilitating comparisons and analysis across different dimensions of the data. For example, one might want to compare the performance of ‘arm’ architecture machines against ‘intel’ architecture machines by summing or averaging their respective performance metrics.
The core of the pivot
module revolves around the Request
struct and the Pivot
function.
Request
Struct:
The Request
struct encapsulates the parameters for a pivot operation. It defines:
GroupBy
: A slice of strings representing the parameter keys to group the traces by. This is the fundamental dimension along which the data will be aggregated. For instance, if GroupBy
is ["arch"]
, all traces with the same ‘arch’ value will be grouped together.Operation
: An Operation
type (e.g., Sum
, Avg
, Geo
) that specifies how the values within each group of traces should be combined. This operation is applied to each point in the traces within a group, resulting in a new, summarized trace for that group.Summary
: An optional slice of Operation
types. If provided, these operations are applied to the resulting traces from the GroupBy
step. Each Summary
operation generates a single value (a column in the final output if viewed as a table) for each grouped trace. If Summary
is empty, the output is a DataFrame
where each row is a summarized trace (suitable for plotting).Pivot
Function Workflow:
The Pivot
function executes the aggregation and summarization process. Here's a breakdown of its key steps and the reasoning behind them:
Input Validation (req.Valid()
):
GroupBy
keys or invalid Operation
or Summary
values.GroupBy
is non-empty and if the specified Operation
and Summary
operations are among the predefined valid operations (AllOperations
).Initialization and Grouping Structure (groupedTraceSets
):
types.TraceSet
containing traces belonging to that group.groupedTraceSets
by determining all possible unique combinations of values for the GroupBy
keys present in the input DataFrame
's ParamSet
. This is done using df.ParamSet.CartesianProduct(req.GroupBy)
. This pre-population ensures that even groups with no matching traces are considered, although they will be filtered out later if they remain empty.DataFrame
(df.TraceSet
).req.GroupBy
to form a groupKey
using groupKeyFromTraceKey
. This function ensures that only traces containing all the GroupBy
keys contribute to a group. If a trace is missing a GroupBy
key, it's ignored.types.TraceSet
associated with its groupKey
in groupedTraceSets
.Input DataFrame (df.TraceSet) | v For each traceID, trace in df.TraceSet: Parse traceID into params groupKey = groupKeyFromTraceKey(params, req.GroupBy) If groupKey is valid: Add trace to groupedTraceSets[groupKey] | v Grouped Traces (groupedTraceSets)
Applying the GroupBy Operation:
req.Operation
.groupedTraceSets
.groupByOperation
function corresponding to req.Operation
(obtained from opMap
) to the types.TraceSet
of that group. The opMap
is a crucial design choice, mapping Operation
constants to their respective implementation functions (one for grouping traces, another for summarizing single traces). This provides a clean and extensible way to manage different aggregation functions.ret.TraceSet
of the new DataFrame
.ctx.Err()
) is checked periodically to allow for early termination if the operation is cancelled.Grouped Traces (groupedTraceSets) | v For each groupID, traces in groupedTraceSets: If len(traces) > 0: summarizedTrace = opMap[req.Operation].groupByOperation(traces) ret.TraceSet[groupID] = summarizedTrace | v DataFrame with GroupBy Applied (ret)
Building ParamSet for the Result:
DataFrame
needs its own ParamSet
reflecting the new structure where trace keys only contain the GroupBy
parameters.ret.BuildParamSet()
is called.Applying Summary Operations (Optional):
req.Summary
is specified. This is useful for generating tabular summaries rather than plots.req.Summary
is empty, the original DataFrame
's Header
is used for the new DataFrame
, and the function returns. The result is a DataFrame
of summarized traces.req.Summary
is not empty:ret.TraceSet
.types.Trace
(called summaryValues
) whose length is equal to the number of Summary
operations.Operation
in req.Summary
, it applies the corresponding summaryOperation
function (from opMap
) to the current grouped trace. The result is stored in summaryValues
.ret.TraceSet[groupKey]
is replaced with summaryValues
.Header
of the ret
DataFrame
is rebuilt. Each column in the header now corresponds to one of the Summary
operations, with offsets from 0 to len(req.Summary) - 1
.DataFrame with GroupBy Applied (ret) | v If len(req.Summary) > 0: For each groupKey, trace in ret.TraceSet: summaryValues = new Trace of length len(req.Summary) For i, op in enumerate(req.Summary): summaryValues[i] = opMap[op].summaryOperation(trace) ret.TraceSet[groupKey] = summaryValues Adjust ret.Header to match Summary operations | v Final Pivoted DataFrame (ret)
Operations (Operation
type and opMap
):
The module defines a set of standard operations like Sum
, Avg
, Geo
, Std
, Count
, Min
, Max
.
Operation
is a string constant.opMap
is a map where each Operation
key maps to an operationFunctions
struct. This struct holds two function pointers:groupByOperation
: Takes a types.TraceSet
(a group of traces) and returns a single aggregated types.Trace
. These functions are typically sourced from the go/calc
module.summaryOperation
: Takes a single []float32
(a trace) and returns a single float32
summary value. These functions are typically sourced from go/vec32
or defined locally (like stdDev
).opMap
with the appropriate implementation functions.Error Handling:
Pivot
function returns an error if req.Valid()
fails or if an error occurs during grouping (e.g., a GroupBy
key is not found in the ParamSet
of the input DataFrame). Context cancellation is also handled, allowing long-running pivot operations to be interrupted. Errors are wrapped using skerr.Wrap
to provide context.pivot.go
: This is the main file containing all the logic for the pivot functionality.
Request
struct: Defines the parameters for a pivot operation. Its design allows for flexible grouping and summarization.Operation
type and constants: Define the set of available aggregation operations.opMap
variable: A critical data structure mapping Operation
types to their respective implementation functions for both grouping and summarizing. This is the heart of how different operations are dispatched.Pivot
function: The primary public function that performs the pivot operation. Its step-by-step process of grouping, applying the main operation, and then optionally applying summary operations is central to its functionality.groupKeyFromTraceKey
function: A helper function responsible for constructing the group identifier for each trace based on the GroupBy
keys. It handles cases where a trace might not have all the required keys.Valid()
method on Request
: Ensures that the pivot request is well-formed before processing begins.pivot_test.go
: Contains unit tests for the pivot
module.
testify
assertion library and defines test cases that cover different aspects of the Request
validation, groupKeyFromTraceKey
logic, and the Pivot
function itself with various combinations of Operation
and Summary
settings. The dataframeForTesting()
helper function provides a consistent dataset for testing.This module is designed to be a general-purpose tool for transforming and understanding large datasets of traces by allowing users to aggregate data along arbitrary dimensions and apply various statistical operations.
The /go/progress
module provides a mechanism for tracking the progress of long-running tasks on the backend and exposing this information to the UI. This is crucial for user experience in applications where operations like data queries or complex computations can take a significant amount of time. Without progress tracking, users might perceive the application as unresponsive or encounter timeouts.
Many backend operations, such as those initiated by API endpoints like /frame/start
or /dryrun/start
, are asynchronous. The initial HTTP request might return quickly, but the actual work continues in the background. This module addresses the need to:
The core idea is to represent the state of a long-running task as a Progress
object. This object can be updated by the task as it executes. A Tracker
then manages multiple Progress
objects, making them accessible via HTTP polling.
Key Components:
progress.go
: Defines the Progress
interface and its concrete implementation progress
.
Progress
Interface: This is the central abstraction for a single long-running task.Message(key, value string)
: Allows the task to report arbitrary key-value string pairs. This is flexible enough to accommodate diverse progress information (e.g., current step, commit being processed, number of items filtered). If a key already exists, its value is updated.Results(interface{})
: Stores intermediate or final results of the task. The interface{}
type allows any JSON-serializable data to be stored. This is useful for showing partial results or accumulating data incrementally.Error(string)
: Marks the task as failed and stores an error message.Finished()
: Marks the task as successfully completed.FinishedWithResults(interface{})
: Atomically sets the results and marks the task as finished. This is preferred over separate Results()
and Finished()
calls to avoid race conditions where the UI might poll between the two calls.Status() Status
: Returns the current status (Running
, Finished
, Error
).URL(string)
: Sets the URL that the client should poll for further updates. This is typically set by the Tracker
.JSON(w io.Writer) error
: Serializes the current progress state (status, messages, results, next URL) into JSON and writes it to the provided writer.progress
struct (concrete implementation):sync.Mutex
to ensure thread-safe updates to its internal SerializedProgress
state. This is critical because long-running tasks often execute in separate goroutines, and the Progress
object might be accessed concurrently by the task updating its state and by the Tracker
serving HTTP requests.SerializedProgress
struct, which is designed for easy JSON serialization.Progress
object starts in the Running
state. Once it transitions to Finished
or Error
, it becomes immutable. Any attempt to modify it (e.g., calling Message()
or Results()
again) will result in a panic. This design simplifies reasoning about the lifecycle of a task's progress.SerializedProgress
struct: Defines the JSON structure sent to the client. It includes the Status
, an array of Message
(key-value pairs), the Results
(if any), and the URL
for the next poll.Status
enum: Running
, Finished
, Error
.tracker.go
: Defines the Tracker
interface and its concrete implementation tracker
.
Tracker
Interface: Manages a collection of Progress
objects.Add(prog Progress)
: Registers a new Progress
object with the tracker. The tracker assigns a unique ID to this progress and sets its polling URL.Handler(w http.ResponseWriter, r *http.Request)
: An HTTP handler function that clients use to poll for progress updates. It extracts the progress ID from the request URL, retrieves the corresponding Progress
object, and sends its JSON representation.Start(ctx context.Context)
: Starts a background goroutine for periodic cleanup of completed tasks from the cache.tracker
struct (concrete implementation):lru.Cache
: Uses a Least Recently Used (LRU) cache (github.com/hashicorp/golang-lru
) to store cacheEntry
objects.basePath
: A string prefix for the polling URLs (e.g., /_/status/
). Each progress object gets a unique ID appended to this base path to form its polling URL.cacheEntry
struct: Wraps a Progress
object and a Finished
timestamp. The timestamp is used by the cleanup routine to determine when a completed task can be removed from the cache.Start
method launches a goroutine that periodically calls singleStep
.singleStep
iterates through the cache:Finished
timestamp in a cacheEntry
when the corresponding Progress
object transitions out of the Running
state.Finished
or Error
state for longer than cacheDuration
(currently 5 minutes). This prevents the cache from holding onto completed tasks indefinitely.github.com/google/uuid
to generate unique IDs for each tracked Progress
. This makes the polling URLs distinct and hard to guess.Starting and Tracking a Long-Running Task:
Backend HTTP Handler (e.g., /api/start_long_task) | | 1. Create a new Progress object: | prog := progress.New() | | 2. Add it to the global Tracker instance: | trackerInstance.Add(prog) // Tracker sets prog.URL() internally | | 3. Respond to the initial HTTP request with the Progress JSON. | // The client now has prog.URL() to poll. | prog.JSON(w) | V Goroutine (executing the long-running task) | | 1. Periodically update progress: | prog.Message("Step", "Processing item X") | prog.Message("PercentComplete", "30%") | prog.Results(partialData) // Optional: intermediate results | | 2. When finished: | If error: | prog.Error("Something went wrong") | Else: | prog.FinishedWithResults(finalData)
Client Polling for Updates:
Client (e.g., browser UI) | | 1. Receives initial response with prog.URL (e.g., /_/status/some-uuid) | | 2. Makes a GET request to prog.URL V Backend Tracker.Handler | | 1. Extracts "some-uuid" from the request path. | | 2. Looks up the Progress object in its cache using "some-uuid". | If not found --> HTTP 404 Not Found | | 3. Calls prog.JSON(w) to send the current state. V Client | | 1. Receives JSON with current status, messages, results. | | 2. If Status is "Running", schedules another poll to prog.URL. | | 3. If Status is "Finished" or "Error", displays final results/error and stops polling.
Tracker Cache Management (Background Process):
Tracker.Start() | V Goroutine (periodic execution, e.g., every minute) | | Calls tracker.singleStep() | | | V | Iterate through cache entries: | - If Progress.Status() is not "Running" AND cacheEntry.Finished is zero: | Set cacheEntry.Finished = now() | - If cacheEntry.Finished is not zero AND now() > cacheEntry.Finished + cacheDuration: | Remove entry from cache | - Update metrics (numEntriesInCache) | V (Loop back to periodic execution)
This system provides a robust and flexible way to communicate the progress of backend tasks to the user interface, improving the overall user experience for operations that might otherwise seem opaque or unresponsive. The use of JSON for data interchange makes it easy for web frontends to consume the progress information.
The psrefresh
module is designed to manage and provide access to paramtools.ParamSet
instances, which are collections of key-value pairs representing the parameters of traces in a performance monitoring system. The primary goal is to efficiently retrieve and cache these parameter sets, especially for frequently accessed queries, to reduce database load and improve response times.
The module addresses the need for up-to-date parameter sets by periodically fetching data from a trace store (represented by the OPSProvider
interface). It combines parameter sets from recent time intervals (tiles) to provide a comprehensive view of available parameters.
A key challenge is handling potentially large and complex parameter sets. To mitigate this, the module offers a caching layer (CachedParamSetRefresher
). This caching mechanism is configurable and can pre-populate caches (e.g., local in-memory or Redis) with filtered parameter sets based on predefined query levels. This pre-population significantly speeds up queries that match these common filter patterns.
Key Components and Responsibilities:
psrefresh.go
:
OPSProvider
and ParamSetRefresher
.OPSProvider
: Abstractly represents a source of ordered parameter sets (e.g., a trace data store). It provides methods to get the latest “tile” (a time-based segment of data) and the parameter set for a specific tile. This abstraction allows psrefresh
to be independent of the underlying data storage implementation.ParamSetRefresher
: Defines the contract for components that can provide the full parameter set and parameter sets filtered by a query. It also includes a Start
method to initiate the refresh process.defaultParamSetRefresher
, which is the standard implementation of ParamSetRefresher
.OPSProvider
. It merges parameter sets from a configurable number of recent tiles to create a comprehensive view.refresh
) that periodically calls oneStep
. The oneStep
method fetches the latest tile, then iterates backward through the configured number of previous tiles, retrieving and merging their parameter sets using paramtools.ParamSet.AddParamSet
. The resulting merged set is then normalized and stored.sync.Mutex
is used to protect concurrent access to the ps
(paramtools.ReadOnlyParamSet) field, ensuring thread safety when GetAll
is called.GetParamSetForQuery
delegates the actual filtering and counting of traces to a dataframe.DataFrameBuilder
, demonstrating a separation of concerns.UpdateQueryValueWithDefaults
is a helper to automatically add default parameter selections to queries if configured, simplifying common query patterns.cachedpsrefresh.go
:
CachedParamSetRefresher
, which wraps a defaultParamSetRefresher
and adds a caching layer.cache.Cache
instance (which could be local, Redis, etc.) and a defaultParamSetRefresher
.PopulateCache
: This is a crucial method that proactively fills the cache. It uses the QueryCacheConfig
(part of config.QueryConfig
) to determine which levels of parameter sets to cache.psRefresher
.PreflightQuery
(via the dfBuilder
) to get the filtered parameter set and the count of matching traces.populateChildLevel
to cache parameter sets for combinations of Level 1 and Level 2 parameters.paramSetKey
and countKey
, ensuring a consistent naming scheme.GetParamSetForQuery
: When a query is made, getParamSetForQueryInternal
first tries to retrieve the result from the cache.getParamSetKey
). It only attempts to serve from the cache if the query matches the configured cache levels (1 or 2 parameters, potentially adjusted for default parameters).paramtools.ParamSet
from the cached string and retrieves the count.psRefresher.GetParamSetForQuery
.StartRefreshRoutine
: This method starts a goroutine that periodically calls PopulateCache
to keep the cached data fresh.Key Workflows:
Initialization and Periodic Refresh (Default Refresher):
NewDefaultParamSetRefresher(opsProvider, ...) -> pf pf.Start(refreshPeriod) -> pf.oneStep() // Initial fetch -> opsProvider.GetLatestTile() -> latestTile -> LOOP (numParamSets times): -> opsProvider.GetParamSet(tile) -> individualPS -> mergedPS.AddParamSet(individualPS) -> tile = tile.Prev() -> mergedPS.Normalize() -> pf.ps = mergedPS.Freeze() -> GO pf.refresh() -> LOOP (every refreshPeriod): -> pf.oneStep() // Subsequent fetches
Cache Population (Cached Refresher):
NewCachedParamSetRefresher(defaultRefresher, cacheImpl) -> cr cr.StartRefreshRoutine(cacheRefreshPeriod) -> cr.PopulateCache() // Initial population -> defaultRefresher.GetAll() -> fullPS -> // For each configured Level 1 key/value: -> qValues = {level1Key: [level1Value]} -> defaultRefresher.UpdateQueryValueWithDefaults(qValues) // If applicable -> query.New(qValues) -> lv1Query -> defaultRefresher.dfBuilder.PreflightQuery(ctx, lv1Query, fullPS) -> count, filteredPS -> psCacheKey = paramSetKey(qValues, [level1Key]) -> cr.addToCache(ctx, psCacheKey, filteredPS.ToString(), count) -> // If Level 2 is configured: -> cr.populateChildLevel(ctx, level1Key, level1Value, filteredPS, level2Key, level2Values) -> // For each configured Level 2 value: -> qValues = {level1Key: [level1Value], level2Key: [level2Value]} -> ... (similar PreflightQuery and addToCache) -> GO LOOP (every cacheRefreshPeriod): -> cr.PopulateCache() // Subsequent cache refreshes
Querying with Cache: cr.GetParamSetForQuery(ctx, queryObj, queryValues) -> cr.getParamSetForQueryInternal(ctx, queryObj, queryValues) -> cr.getParamSetKey(queryValues) -> cacheKey, err -> IF cacheKey is valid AND exists: -> cache.GetValue(ctx, cacheKey) -> cachedParamSetString -> cache.GetValue(ctx, countKey(cacheKey)) -> cachedCountString -> paramtools.FromString(cachedParamSetString) -> paramSet -> strconv.ParseInt(cachedCountString) -> count -> RETURN count, paramSet, nil -> ELSE (cache miss or invalid key for caching): -> defaultRefresher.GetParamSetForQuery(ctx, queryObj, queryValues) -> count, paramSet, err -> RETURN count, paramSet, err
The use of config.QueryConfig
and config.Experiments
allows for instance-specific tuning of caching behavior (which keys/values to pre-populate) and handling of default parameters. The separation between defaultParamSetRefresher
and CachedParamSetRefresher
promotes modularity, allowing the caching layer to be optional or replaced with different caching strategies if needed.
The redis
module in Skia Perf is designed to manage interactions with Redis instances, primarily to support and optimize the query UI. It leverages Redis for caching frequently accessed data, thereby improving the responsiveness and performance of the Perf frontend.
The core idea is to periodically fetch information about available Redis instances within a Google Cloud Project and then interact with a specific, configured Redis instance to store or retrieve cached data. This cached data typically represents results of expensive computations or frequently requested data points, like recent trace data for specific queries.
Key Responsibilities and Components:
redis.go
: This is the central file of the module.
RedisWrapper
interface: Defines the contract for Redis-related operations. This abstraction allows for easier testing and potential future replacements of the underlying Redis client implementation. The key methods are:StartRefreshRoutine
: Initiates a background process (goroutine) that periodically discovers and interacts with the configured Redis instance.ListRedisInstances
: Retrieves a list of all Redis instances available within a specified GCP project and location.RedisClient
struct: This is the concrete implementation of the RedisWrapper
interface.gcp_redis.CloudRedisClient
for interacting with the Google Cloud Redis API (e.g., listing instances).tracestore.TraceStore
, which is likely used to fetch the data that needs to be cached in Redis.tilesToCache
field suggests that the caching strategy might involve pre-calculating and storing “tiles” of data, which is a common pattern in Perf systems for displaying graphs over time.NewRedisClient
: The constructor for RedisClient
.StartRefreshRoutine
: - Why: To ensure that Perf is always aware of the correct Redis instance to use and to periodically update the cache. Network configurations or instance details might change, and this routine helps adapt to such changes. - How: It takes a refreshPeriod
and a config.InstanceConfig
(which is actually redis_client.RedisConfig
in the current implementation, indicating the target project, zone, and instance name). It then starts a goroutine that, at regular intervals defined by refreshPeriod
: _ Calls ListRedisInstances
to get all Redis instances in the configured project/zone. _ Iterates through the instances to find the one matching the config.Instance
name. * If the target instance is found, it calls RefreshCachedQueries
. [StartRefreshRoutine] | V (Goroutine - Ticks every 'refreshPeriod') | V [ListRedisInstances] -> (GCP API Call) -> [List of Redis Instances] | V (Find Target Instance by Name) | V (If Target Found) [RefreshCachedQueries]
ListRedisInstances
:gcpClient
(an instance of cloud.google.com/go/redis/apiv1.CloudRedisClient
) to make an API call to GCP to list instances under the given parent
(e.g., “projects/my-project/locations/us-central1”). It iterates through the results and returns a slice of redispb.Instance
objects.RefreshCachedQueries
:instance.Host
and instance.Port
) using github.com/redis/go-redis/v9
.r.mutex.Lock()
) to prevent concurrent modifications to the cache or shared resources, though the current implementation only has placeholder logic.GET
a key named “FullPS”.SET
s the key “FullPS” to the current time, with an expiration of 30 seconds.TODO(wenbinzhang)
and tilesToCache
): This method is expected to be expanded to:traceStore
.tilesToCache
parameter suggests it might pre-cache a certain number of recent “tiles” of trace data.mocks/RedisWrapper.go
: This file contains a mock implementation of the RedisWrapper
interface, generated by the mockery
tool.
RedisWrapper
. By using a mock, tests can simulate various Redis behaviors (e.g., successful connection, instance not found, errors) without needing an actual Redis instance or GCP connectivity.RedisWrapper
struct that embeds mock.Mock
from the testify
library. For each method in the RedisWrapper
interface, there's a corresponding method in the mock that records calls and can be configured to return specific values or errors, allowing test authors to define expected interactions.Design Decisions and Rationale:
RedisWrapper
): Using an interface decouples the rest of the Perf system from the concrete Redis client implementation. This is good for:mocks
package.RedisWrapper
without affecting its consumers.StartRefreshRoutine
provides a more robust approach.go-redis
).cloud.google.com/go/redis/apiv1
for GCP infrastructure management.github.com/redis/go-redis/v9
for standard Redis data operations. This ensures reliance on well-maintained and feature-rich libraries.Workflow: Cache Refresh Process
The primary workflow driven by this module is the periodic refresh of cached data:
System Starts | V Initialize RedisClient (NewRedisClient) | V Call StartRefreshRoutine | V [Background Goroutine - Loop every 'refreshPeriod'] | |--> 1. List GCP Redis Instances (ListRedisInstances) | - Input: GCP project, location | - Output: List of *redispb.Instance | |--> 2. Identify Target Redis Instance | - Based on configuration (e.g., instance name) | |--> 3. If Target Instance Found: Refresh Cache (RefreshCachedQueries) | |--> a. Connect to Target Redis (using go-redis) | - Host, Port from *redispb.Instance | |--> b. Determine data to cache (e.g., recent trace data for popular queries) | - Likely involves `traceStore` | |--> c. Write data to Redis (SET commands) | - Use appropriate keys and expiration times | |--> (Current placeholder: SET "FullPS" = current_time with 30s TTL)
This module provides the foundational components for integrating Redis as a caching layer in Skia Perf, aiming to improve UI performance by serving frequently requested data quickly from an in-memory store. The current implementation focuses on instance discovery and has placeholder logic for the actual caching, which is expected to be expanded based on Perf's specific caching needs.
The /go/regression
module is responsible for detecting, storing, and managing performance regressions in Skia. It analyzes performance data over time, identifies significant changes (regressions or improvements), and provides mechanisms for triaging and tracking these changes.
Core Functionality & Design:
The primary goal is to automatically flag performance changes that might indicate a problem or an unexpected improvement. This involves:
Key Components & Files:
detector.go
: This file contains the core logic for processing regression detection requests.
ProcessRegressions
is the main entry point. It takes a RegressionDetectionRequest
(which specifies the alert configuration and the time domain to analyze) and a DetectorResponseProcessor
callback.GroupBy
parameters into multiple, more specific requests using allRequestsFromBaseRequest
. This allows for targeted analysis of specific trace groups.dfiter.DataFrameIterator
, which provides dataframes for analysis.tooMuchMissingData
) to ensure the reliability of the detection.clustering2.CalculateClusterSummaries
or individual StepFit via StepFit
) to identify clusters of traces exhibiting similar behavior. The choice of K (number of clusters for K-Means) can be automatic or user-specified.RegressionDetectionResponse
objects containing the cluster summaries and the relevant data frame. These responses are passed to the DetectorResponseProcessor
.shortcutFromKeys
for easier referencing.RegressionDetectionRequest -> Expand (if GroupBy) -> Multiple Requests | V For each Request: DataFrameIterator -> DataFrame -> Filter Traces -> Apply Clustering (KMeans or StepFit) | | V V Shortcut Creation <- ClusterSummaries -> DetectorResponseProcessor
regression.go
: Defines the primary data structures for representing regressions and their triage status.
Regression
: The central struct holding Low
and High
ClusterSummary
objects (from clustering2
), the FrameResponse
(data context), and TriageStatus
for both low and high. It also includes fields for the newer regression2
schema (like Id
, CommitNumber
, AlertId
, MedianBefore
, MedianAfter
).TriageStatus
: Represents whether a regression is Untriaged
, Positive
(expected/acceptable), or Negative
(a bug).AllRegressionsForCommit
: A container for all regressions found for a specific commit, keyed by the alert ID.Merge
: A method to combine information from two Regression
objects, typically used when new data provides a more significant regression for an existing alert.types.go
: Defines the Store
interface, which abstracts the persistence layer for regressions.
Store
interface specifies methods for:Range
: Retrieving regressions within a commit range.SetHigh
/SetLow
: Storing newly detected high/low regressions.TriageHigh
/TriageLow
: Updating the triage status of regressions.Write
: Bulk writing of regressions.GetRegressionsBySubName
, GetByIDs
: Retrieving regressions based on subscription names or specific IDs (primarily for the regression2
schema).GetOldestCommit
, GetRegression
: Utility methods for fetching specific data.DeleteByCommit
: Removing regressions associated with a commit.stepfit.go
: Implements an alternative regression detection strategy that analyzes each trace individually using step fitting.
GroupBy
is used in an alert, or when K-Means clustering is not the desired approach. It focuses on finding significant steps in individual time series.StepFit
function iterates through each trace in the input DataFrame
.stepfit.GetStepFitAtMid
to determine if there's a significant step (low or high) around the midpoint of the trace.stddevThreshold
and interesting
parameters), the trace is added to either the low
or high
ClusterSummary
.low
and high
summaries collect all traces that show a downward or upward step, respectively.ParamSummaries
) are generated for the keys within these clusters.fromsummary.go
: Provides a utility function to convert a RegressionDetectionResponse
into a Regression
object.
Regression
type used for storage and display.RegressionFromClusterResponse
takes a RegressionDetectionResponse
, an alerts.Alert
configuration, and a perfgit.Git
instance.ClusterSummary
objects in the response.Low
or High
fields of the Regression
object. It prioritizes the regression with the largest absolute magnitude if multiple are found.Submodules:
continuous/
(continuous.go
): Manages the continuous, background detection of regressions.
Continuous
struct: Holds dependencies like perfgit.Git
, regression.Store
, alerts.ConfigProvider
, notify.Notifier
, etc.Run()
: The main entry point, which starts either event-driven or polling-based regression detection.RunEventDrivenClustering
):FileIngestionTopicName
indicating new data ingestion (ingestevents.IngestEvent
).getTraceIdConfigsForIngestEvent
(which calls matchingConfigsFromTraceIDs
).matchingConfigsFromTraceIDs
refines alert queries if GroupBy
is present to be more specific to the incoming trace.ProcessAlertConfig
(or ProcessAlertConfigForTraces
if StepFitGrouping
is used) for each matching config and the specific traces.RunContinuousClustering
):pollingDelay
), fetches all alert configurations using buildConfigAndParamsetChannel
.ProcessAlertConfig
for each configuration.ProcessAlertConfig()
:GroupBy
alerts to ensure the query is valid and returns data.regression.ProcessRegressions
to perform the actual detection.clusterResponseProcessor
(which is reportRegressions
) is called with the detection results.reportRegressions()
:RegressionDetectionResponse
), it determines the commit and previous commit details.updateStoreAndNotification
to persist the regression and send notifications.updateStoreAndNotification()
:regression.Store
.store.SetLow
or store.SetHigh
) and sends a notification via notifier.RegressionFound
. The notification ID is stored with the regression.notifier.UpdateNotification
.EventDrivenRegressionDetection
flag.Pub/Sub Message (New Data) -> Decode IngestEvent -> Get Matching Alert Configs | V For each (Config, Matched Traces): ProcessAlertConfig -> regression.ProcessRegressions | V reportRegressions -> updateStoreAndNotification | | V V Store Notifier
migration/
(migrator.go
): Handles the data migration from an older regressions
table schema to the newer regressions2
schema.
Regression2Schema
) aims to store regression data more granularly, typically one row per detected step (high or low), rather than combining high and low for the same commit/alert into a single JSON blob.RegressionMigrator
: Contains instances of the legacy sqlregressionstore.SQLRegressionStore
and the new sqlregression2store.SQLRegression2Store
.RunPeriodicMigration
: Sets up a ticker to periodically run RunOneMigration
.RunOneMigration
/ migrateRegressions
:legacyStore.GetRegressionsToMigrate
).Regression
object:regression2
schema (e.g., Id
, PrevCommitNumber
, MedianBefore
, MedianAfter
, IsImprovement
, ClusterType
) if they are not already present from the legacy data. This is crucial as the sqlregression2store.WriteRegression
expects these.sqlregression2store.WriteRegression
function might split a single legacy Regression
object (if it has both High
and Low
components) into two separate entries in the Regressions2
table, one for HighClusterType
and one for LowClusterType
.Regressions
table as migrated using legacyStore.MarkMigrated
, storing the new regression ID.sqlregressionstore/
: Implements the regression.Store
interface using a generic SQL database. This is the older SQL storage mechanism.
SQLRegressionStore
: The main struct, holding a database connection pool (pool.Pool
) and prepared SQL statements. It supports different SQL dialects (e.g., CockroachDB via statements
, Spanner via spannerStatements
).sqlregressionstore/schema/RegressionSchema.go
) typically stores one row per (commit_number, alert_id)
pair. The actual regression.Regression
object (which might contain both high and low details, along with the frame) is serialized into a JSON string and stored in a regression TEXT
column.readModifyWrite
: A core helper function that encapsulates the common pattern of reading a Regression
from the DB, allowing a callback to modify it, and then writing it back. This is done within a transaction to prevent lost updates. If mustExist
is true, it errors if the regression isn't found; otherwise, it creates a new one.SetHigh
/SetLow
: Use readModifyWrite
to update the High
or Low
part of the JSON-serialized Regression
object. They also update the triage status to Untriaged
if it was previously None
.TriageHigh
/TriageLow
: Use readModifyWrite
to update the HighStatus
or LowStatus
within the JSON-serialized Regression
.GetRegressionsToMigrate
: Fetches regressions that haven't been migrated to the regression2
schema.MarkMigrated
: Updates a row to indicate it has been migrated, storing the new regression_id
from the regression2
table.Regression
object as JSON can make querying for specific aspects of the regression (e.g., only high regressions, or regressions with a specific triage status) less efficient and more complex. This is one of the motivations for the sqlregression2store
.sqlregression2store/
: Implements the regression.Store
interface using a newer SQL schema (Regressions2
).
sqlregressionstore
by storing regression data in a more normalized and queryable way.SQLRegression2Store
: The main struct.sqlregression2store/schema/Regression2Schema.go
): Designed to store each regression step (high or low) as a separate row. Key columns include id
(UUID, primary key), commit_number
, prev_commit_number
, alert_id
, creation_time
, median_before
, median_after
, is_improvement
, cluster_type
(e.g., “high”, “low”), cluster_summary
(JSONB), frame
(JSONB), triage_status
, and triage_message
.writeSingleRegression
: The core writing function. It takes a regression.Regression
object and writes its relevant parts (either high or low, but not both in the same DB row) to the Regressions2
table.convertRowToRegression
: Converts a database row from Regressions2
back into a regression.Regression
object. Depending on the cluster_type
in the row, it populates either the High
or Low
part of the Regression
object.SetHigh
/SetLow
:updateBasedOnAlertAlgo
.updateBasedOnAlertAlgo
: This function is crucial. It considers the Algo
type of the alert (KMeansGrouping
vs. StepFitGrouping
).KMeansGrouping
, it expects to potentially update an existing regression for the same (commit_number, alert_id)
as new data might refine the cluster. It uses readModifyWriteCompat
to achieve this.StepFitGrouping
(individual trace analysis), it generally expects to create a new regression entry if one doesn‘t exist for the exact frame, avoiding updates to pre-existing ones unless it’s truly a new detection.updateFunc
passed to updateBasedOnAlertAlgo
populates the necessary fields in the regression.Regression
object (e.g., setting r.High
or r.Low
, and calling populateRegression2Fields
).populateRegression2Fields
: This helper populates the fields specific to the Regressions2
schema (like PrevCommitNumber
, MedianBefore
, MedianAfter
, IsImprovement
) from the ClusterSummary
and FrameResponse
within the Regression
object.WriteRegression
(used by migrator): If a legacy Regression
object has both High
and Low
components, this function splits it and calls writeSingleRegression
twice, creating two rows in Regressions2
.Range
: When retrieving regressions, if multiple rows from Regressions2
correspond to the same (commit_number, alert_id)
(e.g., one for high, one for low), it merges them back into a single regression.Regression
object for compatibility with how the rest of the system might expect the data.Overall Workflow Example (Simplified):
continuous.go
):Continuous
identifies relevant alerts.Alert
configurations.ProcessAlertConfig
is called.detector.go
):ProcessRegressions
fetches data, builds DataFrame
s.stepfit.go
) is applied.RegressionDetectionResponse
s are generated.continuous.go
calls back into regression
store):reportRegressions
processes these responses.updateStoreAndNotification
interacts with a regression.Store
implementation (e.g., sqlregression2store.go
):SetLow
or SetHigh
on the store.sqlregression2store
) writes the data to the Regressions2
table, potentially creating a new row or updating an existing one based on the alert's algorithm type.The system is designed to be modular, with interfaces like regression.Store
and alerts.ConfigProvider
allowing for flexibility in implementation details. The migration path from sqlregressionstore
to sqlregression2store
highlights the evolution towards a more structured and queryable data model for regressions.
The samplestats
module is designed to perform statistical analysis on sets of performance data, specifically to identify significant changes between two sample sets, often referred to as “before” and “after” states. This is crucial for detecting regressions or improvements in performance metrics over time or across different code versions.
The core functionality revolves around comparing these two sets of samples for each trace (a unique combination of parameters identifying a specific test or metric). It calculates various statistical metrics for each set and then employs statistical tests to determine if the observed differences are statistically significant.
Key Design Choices and Implementation Details:
Config
struct provides a centralized way to control the analysis process. This includes setting the alpha level, choosing the statistical test, enabling outlier removal, and deciding whether to include all traces in the output or only those with significant changes. This configurability makes the module adaptable to various analysis needs.Result
struct encapsulates the outcome of the analysis, including a list of Row
structs (one per trace) and a count of skipped traces. Each Row
contains the trace identifier, its parameters, the calculated metrics for both “before” and “after” samples, the percentage delta, the p-value, and any informational notes (e.g., errors during statistical test calculation). This structured output facilitates further processing or display of the results.Delta
. This allows users to quickly identify the most impactful changes. The Order
type and functions like ByName
, ByDelta
, and Reverse
provide a flexible sorting mechanism.Responsibilities and Key Components:
analyze.go
: This is the heart of the module.
Analyze
function: This is the primary entry point. It takes the Config
and two maps of samples (before
and after
, where keys are trace IDs and values are parser.Samples
).calculateMetrics
(from metrics.go
) for both “before” and “after” samples.Config.Test
setting, it performs either the Mann-Whitney U test or the Two Sample Welch's t-test using functions from the github.com/aclements/go-moremath/stats
library.p < alpha
, it calculates the percentage Delta
between the means. Otherwise, Delta
is NaN
.Row
struct with all the calculated information.Config.All
is false.Row
s based on Config.Order
(or by Delta
if no order is specified) using the Sort
function from sort.go
.Result
struct containing the list of Row
s and the count of skipped traces.Config
struct: Defines the parameters that control the analysis, such as Alpha
for p-value cutoff, Order
for sorting, IQRR
for outlier removal, All
for including all results, and Test
for selecting the statistical test.Result
struct: Encapsulates the output of the Analyze
function, holding the Rows
of analysis data and the Skipped
count.Row
struct: Represents the analysis results for a single trace, including its name, parameters, “before” and “after” Metrics
, the percentage Delta
, the P
value, and any Note
.metrics.go
: This file is responsible for calculating basic statistical metrics from a given set of sample values.
calculateMetrics
function: Takes a Config
(primarily to check IQRR
) and parser.Samples
.Config.IQRR
is true, it applies the Interquartile Range Rule to filter out outliers from samples.Values
. The values within 1.5 * IQR from the first and third quartiles are retained.Mean
, StdDev
(standard deviation), and Percent
(coefficient of variation: StdDev / Mean * 100
) of the (potentially filtered) values.Metrics
struct, along with the (potentially filtered) Values
.Metrics
struct: Holds the calculated Mean
, StdDev
, raw Values
(after potential outlier removal), and Percent
(coefficient of variation).sort.go
: This file provides utilities for sorting the results (Row
slices).
Order
type: A function type func(rows []Row, i, j int) bool
defining a less-than comparison for sorting Row
s.ByName
function: An Order
implementation that sorts rows alphabetically by Row.Name
.ByDelta
function: An Order
implementation that sorts rows by Row.Delta
. It specifically places NaN
delta values (insignificant changes) at the beginning.Reverse
function: A higher-order function that takes an Order
and returns a new Order
that represents the reverse of the input order.Sort
function: A convenience function that sorts a slice of Row
s in place using sort.SliceStable
and a given Order
.Illustrative Workflow (Simplified Analyze
Process):
Input: before_samples, after_samples, config For each trace_id in (before_samples keys + after_samples keys): If trace_id not in before_samples OR trace_id not in after_samples: Increment skipped_count Continue before_metrics = calculateMetrics(config, before_samples[trace_id]) after_metrics = calculateMetrics(config, after_samples[trace_id]) If config.Test == UTest: p_value = MannWhitneyUTest(before_metrics.Values, after_metrics.Values) Else (config.Test == TTest): p_value = TwoSampleWelchTTest(before_metrics.Values, after_metrics.Values) alpha = config.Alpha (or defaultAlpha if config.Alpha is 0) If p_value < alpha: delta = ((after_metrics.Mean / before_metrics.Mean) - 1) * 100 Else: delta = NaN If NOT config.All: Continue // Skip if not showing all results and change is not significant Add new Row{Name: trace_id, Delta: delta, P: p_value, ...} to results_list Sort results_list using config.Order (or ByDelta by default) Return Result{Rows: results_list, Skipped: skipped_count}
The sheriffconfig
module is responsible for managing configurations for Skia Perf's anomaly detection and alerting system. These configurations, known as “Sheriff Configs,” are defined in Protocol Buffer format and are typically stored in LUCI Config. This module handles fetching these configurations, validating them, and transforming them into a format suitable for storage and use by other Perf components, specifically the alerts
and subscription
modules.
The core idea is to allow users to define rules for which performance metrics they care about and how anomalies in those metrics should be detected and handled. This provides a flexible and centralized way to manage alerting for a large number of performance tests.
Key Responsibilities and Components:
Protocol Buffer Definitions (/proto/v1
):
sheriff_config.proto
: Defines the main messages like SheriffConfig
, Subscription
, AnomalyConfig
, and Rules
.SheriffConfig
: The top-level message, containing a list of Subscription
s. This represents the entire set of alerting configurations for a Perf instance.Subscription
: Represents a user‘s or team’s interest in a specific set of metrics. It includes details for creating bug reports (e.g., contact email, bug component, labels, priority, severity) and a list of AnomalyConfig
s that define how to detect anomalies for the metrics covered by this subscription.AnomalyConfig
: Specifies the parameters for anomaly detection for a particular subset of metrics. This includes:Rules
: Define which metrics this AnomalyConfig
applies to, using match
and exclude
patterns. These patterns are query strings (e.g., “master=ChromiumPerf&benchmark=Speedometer2”).step
(algorithm for step detection), radius
(commits to consider), threshold
(sensitivity), minimum_num
(number of interesting traces to trigger an alert), sparse
(handling of missing data), k
(for K-Means clustering), group_by
(for breaking down clustering), direction
(up, down, or both), action
(no action, triage, or bisect), and algo
(clustering algorithm like StepFit or KMeans).Rules
: Contains lists of match
and exclude
strings. Match strings define positive criteria for selecting metrics, while exclude strings define negative criteria. The combination allows for precise targeting of metrics.sheriff_config.pb.go
: The Go code generated from sheriff_config.proto
. This provides the Go structs and methods to work with these configurations programmatically.generate.go
: Contains go:generate
directives used to regenerate sheriff_config.pb.go
whenever sheriff_config.proto
changes. This ensures the Go code stays in sync with the proto definition.Validation (/validate
):
validate.go
: This is crucial for ensuring the integrity and correctness of Sheriff Configurations before they are processed or stored. It performs a series of checks:match
and exclude
strings in Rules
are well-formed query strings. It checks for valid regex if a value starts with ~
. It also enforces that exclude patterns only target a single key-value pair.AnomalyConfig
has at least one match
pattern.name
, contact_email
, bug_component
, and instance
are present. It also checks that each subscription has at least one AnomalyConfig
.Subscription
and that all subscription names within a config are unique.DeserializeProto
: A helper function to convert a base64 encoded string (as typically retrieved from LUCI Config) into a SheriffConfig
protobuf message.Service (/service
):
service.go
: This component orchestrates the process of fetching Sheriff Configurations from LUCI Config, processing them, and storing them in the database.New
function: Initializes the sheriffconfigService
, taking dependencies like a database connection pool (sql.Pool
), subscription.Store
, alerts.Store
, and a luciconfig.ApiClient
. If no luciconfig.ApiClient
is provided, it creates one.ImportSheriffConfig
method: This is the main entry point for importing configurations.luciconfig.ApiClient
to fetch configurations from a specified LUCI Config path (e.g., “skia-sheriff-configs.cfg”).processConfig
.subscription_pb.Subscription
objects into the subscriptionStore
and all alerts.SaveRequest
objects into the alertStore
within a single database transaction. This ensures atomicity – either all changes are saved, or none are.processConfig
method:pb.SheriffConfig
protobuf message using prototext.Unmarshal
.pb.SheriffConfig
using validate.ValidateConfig
.pb.Subscription
in the config:instance
field, only processing those matching the service's configured instance (e.g., “chrome-internal”). This allows multiple Perf instances to share a config file but only import relevant subscriptions.makeSubscriptionEntity
to convert the pb.Subscription
into a subscription_pb.Subscription
(the format used by the subscription
module).subscriptionStore
. If it does, it means this specific version of the subscription has already been imported, so it‘s skipped. This prevents redundant database writes and processing if the LUCI config file hasn’t actually changed for that subscription.makeSaveRequests
to generate alerts.SaveRequest
objects for each alert defined within that subscription.makeSubscriptionEntity
function: Transforms a pb.Subscription
(from Sheriff Config proto) into a subscription_pb.Subscription
(for the subscription
datastore), mapping fields and applying default priorities/severities if not specified.makeSaveRequests
function:pb.AnomalyConfig
within a pb.Subscription
.match
rule within the pb.AnomalyConfig.Rules
:buildQueryFromRules
to construct the actual query string that will be used to select metrics for this alert.createAlert
to create an alerts.Alert
object, populating it with parameters from the pb.AnomalyConfig
and the parent pb.Subscription
.alerts.Alert
in an alerts.SaveRequest
along with the subscription name and revision.createAlert
function: Populates an alerts.Alert
struct. This involves:AnomalyConfig_Step
, AnomalyConfig_Direction
, AnomalyConfig_Action
, AnomalyConfig_Algo
) to their corresponding internal types used by the alerts
module (e.g., alerts.Direction
, types.RegressionDetectionGrouping
, types.StepDetection
, types.AlertAction
). This is done using maps like directionMap
, clusterAlgoMap
, etc.radius
, minimum_num
, sparse
, k
, group_by
if they are not explicitly set in the AnomalyConfig
.buildQueryFromRules
function: Constructs a canonical query string from a match
string and a list of exclude
strings. It parses them as URL query parameters, combines them (with !
for excludes), sorts the parts alphabetically, and joins them with &
. This ensures that equivalent rules always produce the same query string.getPriorityFromProto
and getSeverityFromProto
functions: Convert the enum values for priority and severity from the proto definition to the integer values expected by the subscription
module, applying defaults if the proto value is “unspecified.”StartImportRoutine
and ImportSheriffConfigOnce
: Provide functionality to periodically fetch and import configurations, making the system self-updating when LUCI configs change.Workflow: Importing a Sheriff Configuration
LUCI Config Change (e.g., new revision of skia-sheriff-configs.cfg) | v Sheriffconfig Service (triggered by timer or manual call) | |--- 1. luciconfigApiClient.GetProjectConfigs("skia-sheriff-configs.cfg") --> Fetches raw config content + revision | v For each config file content: | |--- 2. processConfig(configContent, revision) | | | |--- 2a. prototext.Unmarshal(configContent) --> pb.SheriffConfig | | | |--- 2b. validate.ValidateConfig(pb.SheriffConfig) --> Error or OK | | | v | For each pb.Subscription in pb.SheriffConfig: | | | |--- 2c. If subscription.Instance != service.Instance --> Skip | | | |--- 2d. subscriptionStore.GetSubscription(name, revision) --> ExistingSubscription? | | | |--- 2e. If ExistingSubscription == nil (new or updated): | | | | | |--- makeSubscriptionEntity(pb.Subscription, revision) --> subscription_pb.Subscription | | | | | |--- makeSaveRequests(pb.Subscription, revision) | | | | | | | v | | | For each pb.AnomalyConfig in pb.Subscription: | | | | | | | v | | | For each matchRule in pb.AnomalyConfig.Rules: | | | | | | | |--- buildQueryFromRules(matchRule, excludeRules) --> queryString | | | | | | | |--- createAlert(queryString, pb.AnomalyConfig, pb.Subscription, revision) --> alerts.Alert | | | | | | | ---> Collect alerts.SaveRequest | | | | | ---> Collect subscription_pb.Subscription | v Database Transaction (BEGIN) | |--- 3. subscriptionStore.InsertSubscriptions(collected_subscriptions) | |--- 4. alertStore.ReplaceAll(collected_save_requests) | Database Transaction (COMMIT or ROLLBACK)
This module acts as a critical bridge, translating human-readable (and machine-parsable via proto) alerting definitions into the concrete data structures used by Perf's backend alerting and subscription systems. The validation step is key to preventing malformed configurations from breaking the alerting pipeline. The revision checking mechanism ensures efficiency by only processing changes.
The shortcut
module provides functionality for creating, storing, and retrieving “shortcuts”. A shortcut is essentially a named list of trace keys. These trace keys typically represent specific performance metrics or configurations. The primary purpose of shortcuts is to provide a convenient way to refer to a collection of traces with a short, memorable identifier, rather than having to repeatedly specify the full list of keys. This is particularly useful for sharing links to specific views in the Perf UI or for programmatic access to predefined sets of performance data.
The core component is the Store
interface, defined in shortcut.go
. This interface abstracts the underlying storage mechanism, allowing different implementations to be used (e.g., in-memory for testing, SQL database for production). The key operations defined by the Store
interface are:
Insert
: Adds a new shortcut to the store. It takes an io.Reader
containing the shortcut data (typically JSON) and returns a unique ID for the shortcut.InsertShortcut
: Similar to Insert
, but takes a Shortcut
struct directly.Get
: Retrieves a shortcut given its ID.GetAll
: Returns a channel that streams all stored shortcuts. This is useful for tasks like data migration.DeleteShortcut
: Removes a shortcut from the store.A Shortcut
itself is a simple struct containing a slice of strings, where each string is a trace key.
The generation of shortcut IDs is handled by the IDFromKeys
function. This function takes a Shortcut
struct, sorts its keys alphabetically (to ensure that the order of keys doesn't affect the ID), and then computes an MD5 hash of the concatenated keys. A prefix “X” is added to this hash for historical reasons, maintaining compatibility with older systems. This deterministic ID generation ensures that the same set of keys will always produce the same shortcut ID.
Workflow for creating and retrieving a shortcut:
Creation: Client Code
---(JSON data or Shortcut struct)---> Store.Insert
or Store.InsertShortcut
Store
---(Generates ID using IDFromKeys, marshals to JSON if needed)---> Underlying Storage (e.g., SQL DB)
Underlying Storage
---> Store
---(Returns Shortcut ID)---> Client Code
Retrieval: Client Code
---(Shortcut ID)---> Store.Get
Store
---(Queries by ID)---> Underlying Storage (e.g., SQL DB)
Underlying Storage
---(Returns stored JSON or data)---> Store
Store
---(Unmarshals to Shortcut struct, sorts keys)---> Client Code
(receives Shortcut struct)
The sqlshortcutstore
subdirectory provides a concrete implementation of the Store
interface using an SQL database (specifically designed for CockroachDB, as indicated by test setup and migration references). The sqlshortcutstore.go
file contains the logic for interacting with the database, including SQL statements for inserting, retrieving, and deleting shortcuts. Shortcut data is stored as JSON strings in the database. The schema for the Shortcuts
table is implicitly defined by the SQL statements and further clarified in sqlshortcutstore/schema/schema.go
, which defines a ShortcutSchema
struct mirroring the table structure (though this struct is primarily for documentation or ORM-like purposes and not directly used in the raw SQL interaction in sqlshortcutstore.go
).
Testing is a significant aspect of this module:
shortcut_test.go
contains unit tests for the IDFromKeys
function, ensuring its correctness and deterministic behavior.shortcuttest
provides a suite of common tests (InsertGet
, GetNonExistent
, GetAll
, DeleteShortcut
) that can be run against any implementation of the shortcut.Store
interface. This promotes consistency and ensures that different store implementations behave as expected. The InsertGet
test, for example, verifies that a stored shortcut can be retrieved and that the keys are sorted upon retrieval, even if they were not sorted initially.sqlshortcutstore_test.go
utilizes the tests from shortcuttest
to validate the SQLShortcutStore
implementation against a test database.mocks/Store.go
provides a mock implementation of the Store
interface, generated by the mockery
tool. This is useful for testing components that depend on shortcut.Store
without needing a real storage backend.The go/sql
module serves as the central hub for managing the SQL database schema used by the Perf application. It defines the structure of the database tables and provides utilities for schema generation, validation, and migration. This module ensures that the application's database schema is consistent, well-defined, and can evolve smoothly over time.
Key Responsibilities and Components:
Schema Definition (schema.go
, spanner/schema_spanner.go
):
CREATE TABLE
statements that define the structure of all tables used by Perf. Having the schema defined in code (generated from Go structs) provides a single source of truth and allows for easier version control and programmatic manipulation.schema.go
: Defines the schema for CockroachDB.spanner/schema_spanner.go
: Defines the schema for Spanner. Spanner has slightly different SQL syntax and features (e.g., TTL INTERVAL
), necessitating a separate schema definition.tosql
utility (see below). This ensures that the SQL schema accurately reflects the Go struct definitions in other modules (e.g., perf/go/alerts/sqlalertstore/schema
).CREATE TABLE
statements, these files also export slices of strings representing the column names for each table. This can be useful for constructing SQL queries programmatically.Table Struct Definition (tables.go
):
Tables
which aggregates all the individual table schema structs from various Perf sub-modules (like alerts
, anomalygroup
, git
, etc.).Tables
struct serves as the input to the tosql
schema generator. By referencing schema structs from other modules, it ensures that the generated SQL schema is consistent with how data is represented and manipulated throughout the application. The //go:generate
directives at the top of this file trigger the tosql
utility to regenerate the schema files when necessary.Schema Generation Utility (tosql/main.go
):
schema.go
and spanner/schema_spanner.go
) from the Go struct definitions.sql.Tables
struct (defined in tables.go
) as input and uses the go/sql/exporter
module to translate the Go struct tags and field types into corresponding SQL CREATE TABLE
statements. It supports different SQL dialects (CockroachDB and Spanner) and can handle specific features like Spanner's TTL (Time To Live) for tables. The schemaTarget
flag controls which database dialect is generated.Expected Schema and Migration (expectedschema/
):
Why: As the application evolves, the database schema needs to change. This submodule manages schema migrations, ensuring that the live database can be updated to new versions without downtime or data loss. It also validates that the current database schema matches an expected version.
How:
embed.go
: This file uses go:embed
to embed JSON representations of the current (schema.json
, schema_spanner.json
) and previous (schema_prev.json
, schema_prev_spanner.json
) expected database schemas. These JSON files are generated by the exportschema
utility. Load()
and LoadPrev()
functions provide access to these deserialized schema descriptions.
migrate.go
: This is the core of the schema migration logic.
FromLiveToNext
, FromNextToLive
, and their Spanner equivalents) that describe how to upgrade the database from the “previous” schema version to the “next” (current) schema version, and how to roll back that change. Crucially, schema changes must be backward and forward compatible because during a deployment, old and new versions of the application might run concurrently.ValidateAndMigrateNewSchema
is the key function. It:actual == next
, no migration is needed.actual == prev
and actual != next
, it executes the FromLiveToNext
SQL statements to upgrade the database schema.actual
matches neither prev
nor next
, it indicates an unexpected schema state and returns an error, preventing application startup. This is a critical safety check.ValidateAndMigrateNewSchema
) -> New instances (frontend, ingesters)Deployment Starts | V Maintenance Task Runs | +------------------------------------+ | Calls ValidateAndMigrateNewSchema | +------------------------------------+ | V Is schema == previous_expected_schema? --Yes--> Apply `FromLiveToNext` SQL | No | V V Is schema == current_expected_schema? ---Yes---> Migration Successful / No Action | No V Error: Schema mismatch! Halt. | V New Application Instances Start (if migration was successful)
Test files (migrate_test.go
, migrate_spanner_test.go
): These files contain unit tests to verify the schema migration logic for both CockroachDB and Spanner. They test scenarios where no migration is needed, migration is required, and the schema is in an invalid state.
Schema Export Utility (exportschema/main.go
):
expectedschema
submodule needs JSON representations of the “current” and “previous” database schemas to perform validation and migration. This utility generates these JSON files.sql.Tables
struct (for CockroachDB) or spanner.Schema
(for Spanner) and uses the go/sql/schema/exportschema
module to serialize the schema description into a JSON format. The output of this utility is typically checked into version control as schema.json
, schema_prev.json
, etc., within the expectedschema
directory. The typical workflow for a schema change involves:alerts.AlertSchema
).go generate ./...
within perf/go/sql/
to regenerate schema.go
and spanner/schema_spanner.go
.expectedschema/schema.json
to expectedschema/schema_prev.json
(and similarly for Spanner).exportschema
binary (e.g., bazel run //perf/go/sql/exportschema -- --out perf/go/sql/expectedschema/schema.json
) to generate the new expectedschema/schema.json
.FromLiveToNext
and FromNextToLive
SQL statements in expectedschema/migrate.go
.sql_test.go
(LiveSchema
, DropTables
) if necessary.Testing Utilities (sqltest/sqltest.go
):
NewCockroachDBForTests
: Sets up a connection to a local CockroachDB instance (managed by cockroachdb_instance.Require
), creates a new temporary database for the test, applies the current sql.Schema
, and registers a cleanup function to drop the database after the test.NewSpannerDBForTests
: Similarly, sets up a connection to a local Spanner emulator (via PGAdapter, required by pgadapter.Require
), applies the current spanner.Schema
, and prepares it for tests.Schema Tests (sql_test.go
):
DropTables
(to clean up) and LiveSchema
/ LiveSchemaSpanner
. LiveSchema
represents the schema before the latest change defined in expectedschema/migrate.go
's FromLiveToNext
.DropTables
to ensure a clean slate.LiveSchema
to simulate the state of the database before the pending migration.expectedschema.FromLiveToNext
(or its Spanner equivalent).sql.Schema
(or spanner.Schema
) directly to a fresh database (which represents the target state). They should be identical.This comprehensive approach to schema management ensures that Perf's database can be reliably deployed, maintained, and evolved. The separation of concerns (schema definition, generation, validation, migration, and testing) makes the system robust and easier to understand.
The stepfit
module is designed to analyze time-series data, specifically performance traces, to detect significant changes or “steps.” It employs various statistical algorithms to determine if a step up (performance improvement), a step down (performance regression), or no significant change has occurred in the data. This module is crucial for automated performance monitoring, allowing for the identification of impactful changes in system behavior.
The core idea is to fit a step function to the input trace data. A step function is a simple function that is constant except for a single jump (the “step”) at a particular point (the “turning point”). The module calculates the best fit for such a function and then evaluates the characteristics of this fit to determine the nature and significance of the step.
Key Components and Logic:
The primary entity in this module is the StepFit
struct. It encapsulates the results of the step detection analysis:
LeastSquares
: This field stores the Least Squares Error (LSE) of the fitted step function. A lower LSE generally indicates a better fit of the step function to the data. It's important to note that not all step detection algorithms calculate or use LSE; in such cases, this field is set to InvalidLeastSquaresError
.TurningPoint
: This integer indicates the index in the input trace where the step function changes its value. It essentially marks the location of the detected step.StepSize
: This float represents the magnitude of the change in the step function. A negative StepSize
implies a step up in the trace values (conventionally a performance regression, e.g., increased latency). Conversely, a positive StepSize
indicates a step down (conventionally a performance improvement, e.g., decreased latency).Regression
: This value is a metric used to quantify the significance or “interestingness” of the detected step. Its calculation varies depending on the chosen stepDetection
algorithm.OriginalStep
algorithm, it's calculated as StepSize / LSE
(or StepSize / stddevThreshold
if LSE is too small). A larger absolute value of Regression
implies a more significant step.AbsoluteStep
, PercentStep
, and CohenStep
, Regression
is directly related to the StepSize
(or a normalized version of it).MannWhitneyU
, Regression
represents the p-value of the test.Status
: This is an enumerated type (StepFitStatus
) indicating the overall assessment of the step:LOW
: A step down was detected, often interpreted as a performance improvement.HIGH
: A step up was detected, often interpreted as a performance regression.UNINTERESTING
: No significant step was found.The main function responsible for performing the analysis is GetStepFitAtMid
. It takes the following inputs:
trace
: A slice of float32
representing the time-series data to be analyzed.stddevThreshold
: A threshold for standard deviation. This is used in the OriginalStep
algorithm for normalizing the trace and as a floor for standard deviation in other algorithms like CohenStep
to prevent division by zero or near-zero values.interesting
: A threshold value used to determine if a calculated Regression
value is significant enough to be classified as HIGH
or LOW
. The exact interpretation of this threshold depends on the stepDetection
algorithm.stepDetection
: An enumerated type (types.StepDetection
) specifying which algorithm to use for step detection.Workflow of GetStepFitAtMid
:
Initialization and Preprocessing:
StepFit
struct is initialized with Status
set to UNINTERESTING
.minTraceSize
(currently 3), the function returns the initialized StepFit
as there isn't enough data to analyze.stepDetection
is types.OriginalStep
, the input trace
is duplicated and normalized (mean centered and scaled by its standard deviation, unless the standard deviation is below stddevThreshold
).stepDetection
types, if the trace has an odd length, the last element is dropped to make the trace length even. This is because these algorithms typically compare the first half of the trace with the second half.Step Detection Algorithm Execution: The function then proceeds based on the selected stepDetection
algorithm. The core logic involves splitting the (potentially modified) trace roughly in half at the TurningPoint
(which is len(trace) / 2
) and comparing statistics of the two halves.
- **`types.OriginalStep`:** - Calculates the mean of the first half (`y0`) and the second half (`y1`) of the (normalized) trace. - Computes the Sum of Squared Errors (SSE) for fitting `y0` to the first half and `y1` to the second half. The `LeastSquares` error (`lse`) is derived from this SSE. - `StepSize` is `y0 - y1`. - `Regression` is calculated as `StepSize / lse` (or `StepSize /
stddevThresholdif
lse` is too small). Note: The original implementation has a slight deviation from the standard definition of standard error in this calculation.
- **`types.AbsoluteStep`:** - `StepSize` is `y0 - y1`. - `Regression` is simply the `StepSize`. - The step is considered interesting if the absolute value of `StepSize` meets the `interesting` threshold. - **`types.Const`:** - This algorithm behaves differently. It focuses on the absolute value of the trace point at the `TurningPoint` (`trace[i]`). - `StepSize` is `abs(trace[i]) - interesting`. - `Regression` is `-1 * abs(trace[i])`. This is done so that larger deviations (regressions) result in more negative `Regression` values, which are then flagged as `HIGH`. - **`types.PercentStep`:** - `StepSize` is `(y0 - y1) / y0`, representing the percentage change relative to the mean of the first half. - Handles potential `Inf` or `NaN` results from the division (e.g., if `y0` is zero). - `Regression` is the `StepSize`. - **`types.CohenStep`:** - Calculates Cohen's d, a measure of effect size. - `StepSize` is `(y0 - y1) / s_pooled`, where `s_pooled` is the pooled standard deviation of the two halves (or `stddevThreshold` if `s_pooled` is too small or NaN). - `Regression` is the `StepSize`. - **`types.MannWhitneyU`:** - Performs a Mann-Whitney U test (a non-parametric test) to determine if the two halves of the trace come from different distributions. - `StepSize` is `y0 - y1`. - `Regression` is the p-value of the test. - `LeastSquares` is set to the U-statistic from the test.
Status Determination:
types.MannWhitneyU
:Regression
(p-value) is less than or equal to the interesting
threshold (e.g., 0.05), a significant difference is detected.Status
(HIGH
or LOW
) is then determined by the sign of StepSize
. If StepSize
is negative (step up), Status
is HIGH
. Otherwise, it's LOW
.Regression
value is then negated if the status is HIGH
to align with the convention that more negative values are “worse.”Regression
is greater than or equal to interesting
, Status
is LOW
.Regression
is less than or equal to -interesting
, Status
is HIGH
.Status
remains UNINTERESTING
.Return Result: The populated StepFit
struct, containing LeastSquares
, TurningPoint
, StepSize
, Regression
, and Status
, is returned.
Design Rationale:
OriginalStep
, AbsoluteStep
, PercentStep
, CohenStep
, MannWhitneyU
) provides flexibility. Different datasets and performance characteristics may be better suited to different statistical approaches. For instance, MannWhitneyU
is non-parametric and makes fewer assumptions about the data distribution, which can be beneficial for noisy or non-Gaussian data. AbsoluteStep
and PercentStep
offer simpler, more direct ways to define a regression based on absolute or relative changes.GetStepFitAtMid
function consolidates the logic for all supported algorithms, making it easier to manage and extend.StepFit
Structure: The StepFit
struct provides a well-defined way to communicate the results of the analysis, separating the raw metrics (like StepSize
, LeastSquares
) from the final interpretation (Status
).interesting
Threshold: The interesting
parameter allows users to customize the sensitivity of the step detection. This is crucial because what constitutes a “significant” change can vary greatly depending on the context of the performance metric being monitored.stddevThreshold
: This parameter helps in handling cases with very low variance, preventing numerical instability (like division by zero) and ensuring that normalization in OriginalStep
behaves reasonably.GetStepFitAtMid
name implies that the step detection is focused around the middle of the trace. This is a common approach for detecting a single, prominent step. More complex scenarios with multiple steps would require different techniques.Why specific implementation choices?
OriginalStep
: Normalizing the trace in the OriginalStep
algorithm (as described in the linked blog post) aims to make the detection less sensitive to the absolute scale of the data and more focused on the relative change.OriginalStep
: For algorithms other than OriginalStep
, ensuring an even trace length by potentially dropping the last point simplifies the division of the trace into two equal halves for comparison.Inf
and NaN
in PercentStep
: Explicitly checking for and handling Inf
and NaN
values that can arise from division by zero (when y0
is zero) makes the PercentStep
calculation more robust.Regression
as p-value for MannWhitneyU
: Using the p-value as the Regression
metric for MannWhitneyU
directly reflects the statistical significance of the observed difference between the two halves of the trace. The interesting
threshold then acts as the significance level (alpha).InvalidLeastSquaresError
: This constant provides a clear way to indicate when LSE is not applicable or not calculated by a particular algorithm, avoiding confusion with a calculated LSE of 0 or a negative value.In essence, the stepfit
module provides a toolkit for identifying abrupt changes in performance data, offering different lenses (algorithms) through which to view and quantify these changes. The design prioritizes flexibility in algorithm choice and user-configurable sensitivity to cater to diverse performance analysis needs.
The subscription
module manages alerting configurations, known as subscriptions, for anomalies detected in performance data. It provides the means to define, store, and retrieve these configurations.
The core concept is that a “subscription” dictates how the system should react when an anomaly is found. This includes details like which bug tracker component to file an issue under, what labels to apply, who to CC on the bug, and the priority/severity of the issue. This allows for automated and consistent handling of performance regressions.
Subscriptions are versioned using an infra_internal
Git hash (revision). This allows for tracking changes to subscription configurations over time and ensures that the correct configuration is used based on the state of the infrastructure code.
Key Components and Files:
store.go
: Defines the Store
interface. This interface is the central abstraction for interacting with subscription data. It dictates the operations that any concrete subscription storage implementation must provide. This design choice allows for flexibility in the underlying storage mechanism (e.g., SQL database, in-memory store for testing).
GetSubscription
: Retrieves a specific version of a subscription.GetActiveSubscription
: Retrieves the currently active version of a subscription by its name. This is likely the most common retrieval method for active alerting.InsertSubscriptions
: Allows for batch insertion of new subscriptions. This is typically done within a database transaction to ensure atomicity – either all subscriptions are inserted, or none are. This is crucial when updating configurations, as it prevents a partially updated state. The implementation in sqlsubscriptionstore
deactivates all existing subscriptions before inserting the new ones as active, effectively replacing the entire active set.GetAllSubscriptions
: Retrieves all historical versions of all subscriptions.GetAllActiveSubscriptions
: Retrieves all currently active subscriptions. This is useful for systems that need to know all current alerting rules.proto/v1/subscription.proto
: Defines the structure of a Subscription
using Protocol Buffers. This is the canonical data model for subscriptions.
name
, revision
, bug_labels
, hotlists
, bug_component
, bug_priority
, bug_severity
, bug_cc_emails
, contact_email
. Each field directly maps to a configuration aspect for bug filing and contact information.sqlsubscriptionstore/sqlsubscriptionstore.go
: Provides a concrete implementation of the Store
interface using an SQL database (specifically designed for CockroachDB, as indicated by the use of pgx
).
Store
interface. When inserting subscriptions, it first deactivates all existing subscriptions and then inserts the new ones as active. This ensures that only the latest set of configurations is considered active.is_active
boolean column in the database schema (sqlsubscriptionstore/schema/schema.go
) is key to this “active version” concept.sqlsubscriptionstore/schema/schema.go
: Defines the SQL table schema for storing subscriptions.
name
and revision
. This allows multiple versions of the same named subscription to exist, identified by their revision. The is_active
field differentiates the current version from historical ones.mocks/Store.go
: Contains a mock implementation of the Store
interface, generated by the mockery
tool.
Store
interface without requiring an actual database connection. This makes tests faster, more reliable, and isolates the unit under test.Key Workflows:
Updating Subscriptions: This typically happens when configurations in infra_internal
are changed.
External Process (e.g., config syncer) | v Reads new subscription definitions (likely from files) | v Parses definitions into []*pb.Subscription | v Calls store.InsertSubscriptions(ctx, newSubscriptions, tx) | |--> [SQL Transaction Start] | | | v | sqlsubscriptionstore: Deactivate all existing subscriptions (UPDATE Subscriptions SET is_active=false WHERE is_active=true) | | | v | sqlsubscriptionstore: Insert each new subscription with is_active=true (INSERT INTO Subscriptions ...) | | | v |--> [SQL Transaction Commit/Rollback]
This ensures that the update is atomic. If any part fails, the transaction is rolled back, leaving the previous set of active subscriptions intact.
Anomaly Detection Triggering Alerting: Anomaly Detector | v Identifies an anomaly and the relevant subscription name (e.g., based on metric patterns) | v Calls store.GetActiveSubscription(ctx, subscriptionName) | v sqlsubscriptionstore: Retrieves the active subscription (SELECT ... FROM Subscriptions WHERE name=$1 AND is_active=true) | v Anomaly Detector uses the pb.Subscription details (bug component, labels, etc.) to file a bug.
This module provides a robust and versioned way to manage alerting rules, ensuring that performance regressions are handled consistently and routed appropriately. The separation of interface and implementation, along with the use of Protocol Buffers, contributes to a maintainable and extensible system.
The tracecache
module provides a mechanism for caching trace identifiers (trace IDs) associated with specific tiles and queries. This caching layer significantly improves performance by reducing the need to repeatedly compute or fetch trace IDs, which can be a computationally expensive operation.
Core Functionality & Design Rationale:
The primary purpose of tracecache
is to store and retrieve lists of trace IDs. Trace IDs are represented as paramtools.Params
, which are essentially key-value pairs that uniquely identify a specific trace within the performance monitoring system.
The caching strategy is built around the concept of a “tile” and a “query.”
query.Query
, defines the specific parameters used to filter traces. Different queries will yield different sets of trace IDs.By combining the tile number and a string representation of the query, a unique cache key is generated. This ensures that cached data is specific to the exact combination of commit range and filter criteria.
The module relies on an external caching implementation provided via the go/cache.Cache
interface. This design choice promotes flexibility, allowing different caching backends (e.g., in-memory, Redis, Memcached) to be used without modifying the tracecache
logic itself. This separation of concerns is crucial for adapting to various deployment environments and performance requirements.
Key Components:
traceCache.go
: This is the sole file in the module and contains the implementation of the TraceCache
struct and its associated methods.TraceCache
struct:cache.Cache
. This is the underlying cache client used for storing and retrieving data.New(cache cache.Cache) *TraceCache
:TraceCache
. It takes a cache.Cache
instance as an argument, which will be used for all caching operations. This dependency injection allows the caller to provide any cache implementation that conforms to the cache.Cache
interface.CacheTraceIds(ctx context.Context, tileNumber types.TileNumber, q *query.Query, traceIds []paramtools.Params) error
:cacheKey
using the tileNumber
and the query.Query
.traceIds
(a slice of paramtools.Params
) are then serialized into a JSON string using the toJSON
helper function. This serialization is necessary because most cache backends store data as strings or byte arrays. JSON is chosen for its human-readability and widespread support.cacheClient.SetValue
method to store the JSON string under the generated cacheKey
.GetTraceIds(ctx context.Context, tileNumber types.TileNumber, q *query.Query) ([]paramtools.Params, error)
:cacheKey
in the same way as CacheTraceIds
.cacheClient.GetValue
.cacheJson
is empty), it returns nil
for both the trace IDs and the error, indicating a cache miss.paramtools.Params
using json.Unmarshal
.traceIdCacheKey(tileNumber types.TileNumber, q query.Query) string
:tileNumber
(an integer) and a string representation of the query.Query
(obtained via q.KeyValueString()
) separated by an underscore. This format ensures uniqueness and provides some human-readable context within the cache keys.toJSON(obj interface{}) (string, error)
:[]paramtools.Params
before caching.Workflow for Caching Trace IDs:
tileNumber
, query.Query
, []paramtools.Params
(trace IDs to cache)CacheTraceIds
is called.traceIdCacheKey(tileNumber, query)
generates a unique key. tileNumber + "_" + query.KeyValueString() ---> cacheKey
toJSON(traceIds)
serializes the list of trace IDs into a JSON string. []paramtools.Params --json.Marshal--> jsonString
t.cacheClient.SetValue(ctx, cacheKey, jsonString)
stores the JSON string in the underlying cache.Workflow for Retrieving Trace IDs:
tileNumber
, query.Query
GetTraceIds
is called.traceIdCacheKey(tileNumber, query)
generates the cache key (same logic as above). tileNumber + "_" + query.KeyValueString() ---> cacheKey
t.cacheClient.GetValue(ctx, cacheKey)
attempts to retrieve the value from the cache. cacheClient --GetValue(cacheKey)--> jsonString (or empty if not found)
jsonString
is empty (cache miss): Return nil
, nil
.jsonString
is not empty (cache hit): json.Unmarshal([]byte(jsonString), &traceIds)
deserializes the JSON string back into []paramtools.Params
. jsonString --json.Unmarshal--> []paramtools.Params
[]paramtools.Params
and nil
error.The tracefilter
module provides a mechanism to organize and filter trace data based on their hierarchical paths. The core idea is to represent traces within a tree structure, where each node in the tree corresponds to a segment of the trace's path. This allows for efficient filtering of traces, specifically to identify “leaf” traces – those that do not have any further sub-paths.
This approach is particularly useful in scenarios where traces have a parent-child relationship implied by their path structure. For instance, in performance analysis, a trace like /root/p1/p2/p3/t1
might represent a specific test (t1
) under a series of nested configurations (p1
, p2
, p3
). If there's another trace /root/p1/p2
, it could be considered a “parent” or an aggregate trace. The tracefilter
helps in identifying only the most specific, or “leaf,” traces, effectively filtering out these higher-level parent traces.
The primary component is the TraceFilter
struct.
TraceFilter
struct:
traceKey
: A string identifier associated with the trace path ending at this node. For the root of the tree, this is initialized to “HEAD”.value
: The string value of the current path segment this node represents.children
: A map where keys are the next path segments and values are pointers to child TraceFilter
nodes. This map forms the branches of the tree.children
allows for efficient lookup and addition of child nodes based on the next path segment.traceKey
at each node allows associating an identifier with a complete path as it's being built.NewTraceFilter()
function:
TraceFilter
tree.TraceFilter
node. The traceKey
is set to “HEAD” as a sentinel value for the root, and its children
map is initialized as empty, ready to have paths added to it.AddPath(path []string, traceKey string)
method:
Purpose: Adds a new trace, defined by its path
(a slice of strings representing path segments) and its unique traceKey
, to the filter tree.
How it works:
path
.path
already exists as a child of the current node, it moves to that existing child.TraceFilter
node is created for that segment, its value
is set to the segment string, its traceKey
is set to the input traceKey
, and it's added to the children
map of the current node.path
.Why this design?
traceKey
with each newly created node ensures that even intermediate nodes (which might later become leaves if no further sub-paths are added) have an associated key.Example: Adding path ["root", "p1", "p2"] with key "keyA" Initial Tree: (HEAD) After AddPath(["root", "p1", "p2"], "keyA"): (HEAD) | +-- ("root", key="keyA") | +-- ("p1", key="keyA") | +-- ("p2", key="keyA") <- Leaf node initially
If we then add ["root", "p1", "p2", "t1"]
with key "keyB"
:
(HEAD) | +-- ("root", key="keyB") // traceKey updated if path is prefix | +-- ("p1", key="keyB") | +-- ("p2", key="keyB") | +-- ("t1", key="keyB") <- New leaf node
Note: The traceKey
of an existing node is updated by AddPath
if the new path being added shares that node as a prefix. This ensures that the traceKey
stored at a node corresponds to the longest path ending at that node if it's also a prefix of other paths. However, the primary use of GetLeafNodeTraceKeys
relies on the traceKey
of nodes that become leaves.
GetLeafNodeTraceKeys()
method:
Purpose: Retrieves the traceKey
s of all traces that are considered “leaf” nodes in the tree. A leaf node is a node that has no children.
How it works:
len(tf.children) == 0
), its traceKey
is considered a leaf key and is added to the result list.Why this design?
Workflow for GetLeafNodeTraceKeys: Start at (CurrentNode) | V Is CurrentNode a leaf (no children)? | +-- YES --> Add CurrentNode.traceKey to results | +-- NO --> For each ChildNode in CurrentNode.children: | V Recursively call GetLeafNodeTraceKeys on ChildNode | V Append results from ChildNode to overall results | V Return aggregated results
Consider the following traces and their paths:
traceA
: path ["config", "test_group", "test1"]
, key "keyA"
traceB
: path ["config", "test_group"]
, key "keyB"
traceC
: path ["config", "test_group", "test2"]
, key "keyC"
traceD
: path ["config", "other_group", "test3"]
, key "keyD"
Tree Construction (AddPath
calls):
tf.AddPath(["config", "test_group", "test1"], "keyA")
tf.AddPath(["config", "test_group"], "keyB")
"test_group"
initially created by keyA
will have its traceKey
updated to "keyB"
.tf.AddPath(["config", "test_group", "test2"], "keyC")
tf.AddPath(["config", "other_group", "test3"], "keyD")
The tree would look something like this (simplified, showing relevant traceKeys for leaf potential):
(HEAD) | +-- ("config") | +-- ("test_group", traceKey likely updated by "keyB" during AddPath) | | | +-- ("test1", traceKey="keyA") <-- Leaf | | | +-- ("test2", traceKey="keyC") <-- Leaf | +-- ("other_group") | +-- ("test3", traceKey="keyD") <-- Leaf
Filtering (GetLeafNodeTraceKeys()
call):
GetLeafNodeTraceKeys()
is called on the root:"config"
."test_group"
. This node has children ("test1"
and "test2"
), so its key ("keyB"
) is not added."test1"
. This is a leaf. "keyA"
is added."test2"
. This is a leaf. "keyC"
is added."other_group"
."test3"
. This is a leaf. "keyD"
is added.The result would be ["keyA", "keyC", "keyD"]
. Notice that "keyB"
is excluded because the path ["config", "test_group"]
has sub-paths (.../test1
and .../test2
), making it a non-leaf node in the context of trace specificity.
This module provides a clean and efficient way to identify the most granular traces in a dataset where hierarchy is defined by path structure.
The tracesetbuilder
module is designed to efficiently construct a types.TraceSet
and its corresponding paramtools.ReadOnlyParamSet
from multiple, potentially disparate, sets of trace data. This is particularly useful when dealing with performance data that might arrive in chunks (e.g., from different “Tiles” of data) and needs to be aggregated into a coherent view across a series of commits.
The core challenge this module addresses is the concurrent and distributed nature of processing trace data. If multiple traces with the same identifier (key) were processed by different workers simultaneously without coordination, it could lead to race conditions and incorrect data. Similarly, simply locking the entire TraceSet
for each update would create a bottleneck.
The tracesetbuilder
solves this by employing a worker pool (mergeWorkers
). The key design decision here is to distribute the work based on the trace key. Each trace key is hashed (using crc32.ChecksumIEEE
), and this hash determines which mergeWorker
is responsible for that specific trace. This ensures that all data points for a single trace are always processed by the same worker, thereby avoiding the need for explicit locking at the individual trace level within the worker. Each mergeWorker
maintains its own types.TraceSet
and paramtools.ParamSet
.
Key Components and Workflow:
TraceSetBuilder
:
- **Responsibilities:** - Manages a pool of `mergeWorker` instances. - Provides the `Add` method to ingest new trace data. - Provides the `Build` method to consolidate results from all workers and return the final `TraceSet` and `ReadOnlyParamSet`. - Provides the `Close` method to shut down the worker pool. - **`New(size int)`:** Initializes the `TraceSetBuilder`. The `size` parameter is crucial as it defines the expected length of each trace in the final, consolidated `TraceSet`. This allows the builder to pre-allocate trace slices of the correct length, filling in missing data points as necessary. It creates `numWorkers` instances of `mergeWorker`. - **`Add(commitNumberToOutputIndex map[types.CommitNumber]int32, commits
[]provider.Commit, traces types.TraceSet)`:** This is the entry point for feeding data into the builder.
traces
: A types.TraceSet
representing a chunk of data (e.g., from a single tile). -commits
: A slice of provider.Commit
objects corresponding to the data points in thetraces
.commitNumberToOutputIndex
: A map that dictates where each data point from the input traces
(identified by its types.CommitNumber
) should be placed in the final output trace. This mapping is essential for correctly aligning data points that might come from different sources or represent different commit ranges.traces
:paramtools.Params
.request
struct containing the key, params, the trace data itself, thecommitNumberToOutputIndex
map, and thecommits
slice.numWorkers
.request
to thech
channel of the selected mergeWorker
.sync.WaitGroup
is incremented for each trace added, ensuring Build
waits for all processing to complete.Build(ctx context.Context)
:Add
operations to be processed by the workers (using t.wg.Wait()
).mergeWorkers
.traceSet
andparamSet
from eachmergeWorker
into a single, finaltypes.TraceSet
andparamtools.ParamSet
.paramSet
to create a paramtools.ReadOnlyParamSet
.TraceSet
andReadOnlyParamSet
.Close()
: Iterates through the mergeWorkers
and closes their respective input channels (ch
). This signals the worker goroutines to terminate once they have processed all pending requests.mergeWorker
:
request
objects sent to its channel.types.TraceSet
and paramtools.ParamSet
.TraceSet
with new data points, placing them correctly according to request.commitNumberToOutputIndex
.ParamSet
.newMergeWorker(wg *sync.WaitGroup, size int)
: Creates a mergeWorker
and starts its goroutine.types.TraceSet
and paramtools.ParamSet
.request
objects from its ch
channel.request
:m.traceSet
for the given req.key
. If creating, it uses types.NewTrace(size)
to ensure the trace has the correct final length.req.commits
and uses req.commitNumberToOutputIndex
to determine the correct destination index in its local trace for each data point in req.trace
.req.params
to its m.paramSet
.sync.WaitGroup
(m.wg.Done()
) to signal completion of this piece of work.Process(req *request)
: Sends a request to the worker's channel.Close()
: Closes the worker's input channel.request
struct:
mergeWorker
. It encapsulates the trace key, its parsed parameters, the actual trace data segment, the mapping of commit numbers to output indices, and the corresponding commit metadata.Workflow Diagram:
TraceSetBuilder.New(outputTraceLength) | V +-----------------------------------------------------------------------+ | TraceSetBuilder (manages WaitGroup and pool of mergeWorkers) | +-----------------------------------------------------------------------+ | ^ | Add(commitMap1, commits1, traces1) | Build() waits for WaitGroup | Add(commitMap2, commits2, traces2) | V | +-----------------------------------------------------------------------+ | For each trace in input: | | 1. Parse key -> params | | 2. Create 'request' struct | | 3. Hash key -> workerIndex | | 4. Send 'request' to mergeWorkers[workerIndex].ch | | 5. Increment WaitGroup | +-----------------------------------------------------------------------+ | | | ... (numWorkers times) V V V +--------+ +--------+ +--------+ | mergeW_0 | | mergeW_1 | | mergeW_N | (Each runs in its own goroutine) | .ch | | .ch | | .ch | | .traceSet| | .traceSet| | .traceSet| | .paramSet| | .paramSet| | .paramSet| +--------+ +--------+ +--------+ ^ ^ ^ | Process request: | | - Get/Create local trace for req.key (length: outputTraceLength) | | - For each point in req.trace: | | - Use req.commitNumberToOutputIndex[commitNum] to find dstIdx | | - localTrace[dstIdx] = req.trace[srcIdx] | | - Add req.params to local paramSet | | - Decrement WaitGroup | | | | --------------------- (When TraceSetBuilder.Build() is called) | V +-----------------------------------------------------------------------+ | TraceSetBuilder.Build(): | | 1. Wait for all 'Add' operations (WaitGroup.Wait()) | | 2. Create finalTraceSet, finalParamSet | | 3. For each mergeWorker: | | - Merge worker.traceSet into finalTraceSet | | - Merge worker.paramSet into finalParamSet | | 4. Normalize and Freeze finalParamSet | | 5. Return finalTraceSet, finalParamSet (ReadOnly) | +-----------------------------------------------------------------------+ | V +-----------------------------------------------------------------------+ | TraceSetBuilder.Close(): | | - Close channels of all mergeWorkers (signals them to terminate) | +-----------------------------------------------------------------------+
The use of numWorkers
and channelBufferSize
are constants that can be tuned for performance based on the expected workload and system resources. The CRC32 hash provides a reasonably good distribution of keys across workers, minimizing the chance of one worker becoming a bottleneck. The sync.WaitGroup
is essential for ensuring that the Build
method doesn't prematurely try to aggregate results before all input data has been processed by the workers.
The design allows for efficient, concurrent processing of large volumes of trace data by partitioning the work based on trace identity and then merging the results, making it suitable for building comprehensive views of performance metrics over time.
The tracestore
module defines interfaces and implementations for storing and retrieving performance trace data. It's a core component of the Perf system, enabling the analysis of performance metrics over time and across different configurations.
The primary goal of tracestore
is to provide an efficient and scalable way to manage large volumes of trace data. This involves:
tracestore
uses an inverted index. This index maps key-value pairs to the trace IDs that contain them within each tile.go/cache/memcached
) for broader caching strategies.tracecache
for caching the results of QueryTracesIDOnly
to speed up repeated queries.TraceStore
, MetadataStore
, TraceParamStore
) to allow for different backend implementations. This promotes flexibility and testability. The primary implementation provided is sqltracestore
, which uses an SQL database.TraceStore
handles the core logic of reading and writing trace values and their associated parameters.MetadataStore
manages metadata associated with source files (e.g., links to dashboards or logs).TraceParamStore
specifically handles the mapping between trace IDs (MD5 hashes of trace names) and their full parameter sets. This separation helps in optimizing storage and retrieval for these distinct types of data.The tracestore
module is primarily defined by a set of interfaces and their SQL-based implementations.
tracestore.go
This file defines the main TraceStore
interface. It outlines the contract for any system that wants to store and retrieve performance traces. Key responsibilities include:
WriteTraces
, WriteTraces2
): Ingesting new performance data points. Each data point is associated with a specific commit, a set of parameters (defining the trace, e.g., config=8888,arch=x86
), a value, the source file it came from, and a timestamp.WriteTraces
method is designed to handle potentially large batches of data efficiently. Implementations often involve chunking data and performing parallel writes to the underlying storage.WriteTraces2
is a newer variant, potentially for different storage schemas or optimizations (e.g., denormalizing common params directly into the trace values table as seen in TraceValues2Schema
).ReadTraces
, ReadTracesForCommitRange
): Retrieving trace data for specific keys (trace names) within a given tile or commit range.QueryTraces
, QueryTracesIDOnly
):QueryTraces
allows searching for traces based on a query.Query
object (which specifies parameter key-value pairs). It returns the actual trace values and associated commit information.QueryTracesIDOnly
is an optimization that returns only the paramtools.Params
(effectively the identifying parameters) of traces matching a query. This is useful when only the list of matching traces is needed, not their values.GetLatestTile
, TileNumber
, TileSize
, CommitNumberOfTileStart
): Provides methods for interacting with the tiled storage system.GetParamSet
): Retrieving the paramtools.ReadOnlyParamSet
for a specific tile. A ParamSet represents all unique key-value pairs present in the traces within that tile, which is crucial for UI elements like query builders.GetSource
, GetLastNSources
, GetTraceIDsBySource
): Retrieving information about the origin of trace data, such as the ingested file name.metadatastore.go
This file defines the MetadataStore
interface. Its responsibility is to manage metadata associated with source files.
InsertMetadata
: Stores links or other metadata for a given source file name.GetMetadata
: Retrieves the stored metadata for a source file. This can be used, for example, to link from a data point back to the original log file or a specific dashboard view related to the data ingestion.traceparamstore.go
This file defines the TraceParamStore
interface. This store is dedicated to managing the relationship between a trace's unique identifier (typically an MD5 hash of its full parameter string) and the actual paramtools.Params
object.
WriteTraceParams
: Stores the mapping from trace IDs to their parameter sets. This is done to avoid repeatedly parsing or storing the full parameter string for every data point of a trace.ReadParams
: Retrieves the paramtools.Params
for a given set of trace IDs.sqltracestore
This submodule provides the SQL-based implementation of the TraceStore
, MetadataStore
, and TraceParamStore
interfaces.
sqltracestore.go
: Implements the TraceStore
interface.
sqltracestore/schema/schema.go
) involving tables like TraceValues
(for actual metric values), Postings
(the inverted index), ParamSets
(per-tile parameter information), and SourceFiles
.WriteTraces
is called, it performs several actions:SourceFiles
table with the new source filename if it's not already present.ParamSets
table for the current tile with any new key-value pairs from the incoming traces. This uses a cache to avoid redundant writes.TraceValues
table (or TraceValues2
for WriteTraces2
). _ If the trace ID and its key-value pairs are not already in the Postings
table for the current tile (checked via cache), it inserts them. _ Stores the mapping of the trace ID to its paramtools.Params
in the TraceParams
table via the TraceParamStore
. All these writes are typically batched and parallelized for efficiency.QueryTracesIDOnly
):ParamSet
for the target tile.query.Query
and the tile's ParamSet
.restrictByCounting
): It attempts to optimize the query by first running COUNT(*)
queries for each part of the query plan. The part of the plan that matches the fewest traces (below a threshold) is then used to fetch its corresponding trace IDs. These IDs are then used to construct a restrictClause
(e.g., AND trace_id IN (...)
) that is appended to the queries for the other parts of the plan. This significantly speeds up queries where one filter is much more selective than others.Postings
table (using the restrictClause
if applicable) to get a stream of matching traceIDForSQL
.traceIDForSQL
from each part of the plan are then intersected (using newIntersect
) to find the trace IDs that satisfy all AND conditions of the query.TraceParamStore
to fetch their full paramtools.Params
.QueryTraces
, ReadTraces
): Once the trace IDs (and thus their full names) are known (either from QueryTracesIDOnly
or directly provided), it queries the TraceValues
table to fetch the actual floating-point values for those traces within the specified commit range or tile. It also fetches commit information from the Commits
table.enableFollowerReads
configuration, which adds AS OF SYSTEM TIME '-5s'
to certain read queries, allowing them to potentially hit read replicas and reduce load on the primary, at the cost of slightly stale data.spanner.go
) to account for syntax differences or performance characteristics (e.g., UPSERT
vs. ON CONFLICT
).sqlmetadatastore.go
: Implements the MetadataStore
interface. It uses an Metadata
SQL table that links a source_file_id
(from SourceFiles
) to a JSONB column storing the metadata map.
sqltraceparamstore.go
: Implements the TraceParamStore
interface. It uses a TraceParams
SQL table that stores trace_id
(bytes) and their corresponding params
(JSONB). Writes are chunked and can be parallelized.
intersect.go
: Provides helper functions (newIntersect
, newIntersect2
) to compute the intersection of multiple sorted channels of traceIDForSQL
. This is crucial for implementing the AND logic in QueryTracesIDOnly
. It builds a binary tree of newIntersect2
operations for efficiency, avoiding slower reflection-based approaches.
schema/schema.go
: Defines Go structs that mirror the SQL table schemas. This is used for documentation and potentially could be used with ORM-like tools if needed, though the current implementation uses direct SQL templating.
TraceValuesSchema
: Stores individual data points (value, commit, source file) keyed by trace ID.TraceValues2Schema
: An alternative/extended schema for trace values, potentially denormalizing common parameters like benchmark
, bot
, test
, etc., for direct querying.SourceFilesSchema
: Maps source file names to integer IDs.ParamSetsSchema
: Stores the unique key-value pairs present in each tile.PostingsSchema
: The inverted index, mapping (tile, key-value) to trace IDs.MetadataSchema
: Stores JSON metadata for source files.TraceParamsSchema
: Maps trace IDs (MD5 hashes) to their full paramtools.Params
(stored as JSON).spanner.go
: Contains SQL templates and specific configurations (like parallel pool sizes for writes) tailored for Google Cloud Spanner.
mocks
TraceStore.go
: Provides a mock implementation of the TraceStore
interface, generated by the mockery
tool. This is essential for unit testing components that depend on TraceStore
without needing a full database setup.Caller (e.g., ingester) -> TraceStore.WriteTraces(ctx, commitNumber, params[], values[], paramset, sourceFile, timestamp) | `-> SQLTraceStore.WriteTraces | | 1. Tile Calculation: tileNumber = TileNumber(commitNumber) | | 2. Source File ID: | `-> updateSourceFile(ctx, sourceFile) -> sourceFileID | (Queries SourceFiles table, inserts if not exists) | | 3. ParamSet Update (for the tile): | For each key, value in paramset: | If not in cache(tileNumber, key, value): | Add to batch for ParamSets table insertion | Execute batch insert into ParamSets, update cache | | 4. For each trace (params[i], values[i]): | | a. Trace ID Calculation: traceID_md5_hex = md5(query.MakeKey(params[i])) | | | | b. Store Trace Params: | | `-> TraceParamStore.WriteTraceParams(ctx, {traceID_md5_hex: params[i]}) | | (Inserts into TraceParams table if not exists) | | | | c. Add to TraceValues Batch: (traceID_md5_hex, commitNumber, values[i], sourceFileID) | | | | d. Postings Update (for the tile): | | If not in cache(tileNumber, traceID_md5_hex): // Marks this whole trace as processed for postings | | For each key, value in params[i]: | | Add to batch for Postings table: (tileNumber, "key=value", traceID_md5_hex) | | 5. Execute batch insert into TraceValues (or TraceValues2) | | 6. Execute batch insert into Postings, update postings cache
QueryTracesIDOnly
)Caller -> TraceStore.QueryTracesIDOnly(ctx, tileNumber, query) | `-> SQLTraceStore.QueryTracesIDOnly | | 1. Get ParamSet for tile: | `-> GetParamSet(ctx, tileNumber) -> tileParamSet | (Checks OPS cache, falls back to querying ParamSets table) | | 2. Generate Query Plan: plan = query.QueryPlan(tileParamSet) | (If plan is empty or invalid for tile, return empty channel) | | 3. Optimization (restrictByCounting): | | For each part of 'plan' (key, or_values[]): | | `-> DB: COUNT(*) FROM Postings WHERE tile_number=... AND key_value IN (...) LIMIT threshold | | Find the plan part (minKey, minValues) with the smallest count (if count < threshold). | | If any count is 0, plan is skippable. | | If minKey found: | | `-> DB: SELECT trace_id FROM Postings WHERE tile_number=... AND key_value IN (minValues) | | `-> restrictClause = "AND trace_id IN (result_ids...)" | | 4. Execute Query for each plan part (concurrently): | For each key, values[] in 'plan' (excluding minKey if restrictClause is used): | `-> DB: SELECT trace_id FROM Postings | WHERE tile_number=tileNumber AND key_value IN ("key=value1", "key=value2"...) | [restrictClause] | ORDER BY trace_id | -> channel_for_key_N (stream of traceIDForSQL) | | 5. Intersect Results: | `-> newIntersect(ctx, [channel_for_key_1, channel_for_key_2,...]) -> finalTraceIDsChannel (stream of unique traceIDForSQL) | | 6. Fetch Full Params (concurrently, in chunks): | For each batch of unique traceIDForSQL from finalTraceIDsChannel: | `-> TraceParamStore.ReadParams(ctx, batch_of_ids) -> map[traceID]Params | For each Params in map: | Send Params to output channel | `-> Returns output channel of paramtools.Params
This structured approach, combining interfaces with a robust SQL implementation, allows tracestore
to serve as a reliable and performant foundation for Perf's data storage needs.
High-Level Overview
The /go/tracing
module is responsible for initializing and configuring tracing capabilities within the Perf application. It leverages the OpenCensus library to provide distributed tracing, allowing developers to understand the flow of requests across different services and components. This is crucial for debugging performance issues, identifying bottlenecks, and gaining insights into the application's behavior in a distributed environment.
Design Decisions and Implementation Choices
The core design principle behind this module is to centralize tracing initialization. This ensures consistency in how tracing is set up across different parts of the application.
Conditional Initialization: The Init
function provides different initialization paths based on whether the application is running in a local
development environment or a deployed environment.
loggingtracer.Initialize()
is called. This likely configures a simpler, console-based tracer. The rationale is that in local development, detailed, distributed tracing might be overkill, and logging traces to the console is often sufficient for debugging.tracing.Initialize
function from the shared go.skia.org/infra/go/tracing
library is used. This enables more sophisticated tracing, likely integrating with a backend tracing system like Jaeger or Stackdriver Trace.Configuration-Driven Sampling: The cfg.TraceSampleProportion
(of type config.InstanceConfig
) determines the sampling rate for traces. This allows administrators to control the volume of trace data generated, balancing the need for detailed information with the cost and overhead of storing and processing traces. A value of 0.0
would likely disable tracing, while 1.0
would trace every request.
Automatic Project ID Detection: The autoDetectProjectID
constant being an empty string suggests that the underlying tracing.Initialize
function is capable of automatically determining the Google Cloud Project ID when running in a GCP environment. This simplifies configuration as the project ID doesn't need to be explicitly passed.
Metadata Enrichment: The map[string]interface{}
passed to tracing.Initialize
includes:
podName
: This value is retrieved from the MY_POD_NAME
environment variable. This is a common practice in Kubernetes environments to identify the specific pod generating the trace, which is invaluable for pinpointing issues.instance
: This is derived from cfg.InstanceName
. This helps differentiate traces originating from different Perf instances (e.g., “perf-prod”, “perf-staging”).Responsibilities and Key Components/Files
tracing.go
: This is the sole file in this module and contains the Init
function.
Init(local bool, cfg *config.InstanceConfig) error
function:local
boolean flag and an InstanceConfig
pointer as input. 2. If local
is true
, it calls loggingtracer.Initialize()
. This indicates a preference for a simpler, possibly console-based, tracing mechanism for local development. local=true ----> loggingtracer.Initialize()
3. If local
is false
, it proceeds to initialize tracing for a deployed environment. - It retrieves the TraceSampleProportion
from the cfg
. - It retrieves the InstanceName
from cfg
to be used as an attribute. - It calls tracing.Initialize
from the shared go.skia.org/infra/go/tracing
library. - It passes the sampling proportion, autoDetectProjectID
(an empty string, relying on automatic detection), and a map of attributes (podName
from the environment and instance
from the config). local=false | V Read cfg.TraceSampleProportion Read cfg.InstanceName Read os.Getenv("MY_POD_NAME") | V tracing.Initialize(sample_proportion, "", {podName, instance})
go.skia.org/infra/go/tracing
) for common functionality, promoting code reuse.Dependencies:
//go/tracing
(likely go.skia.org/infra/go/tracing
): This is the core shared tracing library providing the Initialize
function for robust, distributed tracing. It handles the actual setup of exporters (e.g., to Stackdriver, Jaeger) and samplers.//go/tracing/loggingtracer
: This dependency provides a simpler tracer implementation, probably for logging traces to standard output, suitable for local development environments where a full-fledged tracing backend might not be available or necessary.//perf/go/config
: This module provides the InstanceConfig
struct, which contains application-specific configuration, including the TraceSampleProportion
and InstanceName
used by the tracing initialization. This decouples tracing configuration from the tracing logic itself.Key Workflows/Processes
Tracing Initialization Workflow:
Application Startup | V Call perf/go/tracing.Init(isLocal, instanceConfig) | +---- isLocal is true? ----> Call loggingtracer.Initialize() --> Tracing active (console/simple) | | | V | Application proceeds | +---- isLocal is false? ---> Read TraceSampleProportion from instanceConfig Read InstanceName from instanceConfig Read MY_POD_NAME environment variable | V Call shared go.skia.org/infra/go/tracing.Initialize(...) with sampling rate and attributes (podName, instance) | V Tracing active (distributed, e.g., Stackdriver) | V Application proceeds
This workflow illustrates how the Init
function adapts the tracing setup based on the execution context (local vs. deployed) and external configuration. The goal is to provide appropriate tracing capabilities with minimal boilerplate in the rest of the application.
The /go/trybot
module is responsible for managing performance data generated by trybots. Trybots are automated systems that run tests on code changes (patches or changelists) before they are merged into the main codebase. This module handles the ingestion, storage, and retrieval of these trybot results, allowing developers and performance engineers to analyze the performance impact of proposed code changes.
The core idea is to provide a way to compare the performance characteristics of a pending change against the baseline performance of the current codebase. This helps in identifying potential performance regressions or improvements early in the development cycle.
/go/trybot/trybot.go
This file defines the central data structure TryFile
.
TryFile
: This struct represents a single file containing trybot results.CL
: The identifier of the changelist (e.g., a Gerrit change ID). This is crucial for associating results with a specific code change.PatchNumber
: The specific patchset within the changelist. Code review systems often allow multiple iterations (patchsets) for a single changelist.Filename
: The name of the file where the trybot results are stored, often including a scheme like gs://
indicating its location (e.g., in Google Cloud Storage).Timestamp
: When the result file was created. This is important for tracking and ordering results./go/trybot/ingester
This submodule is responsible for taking raw result files and transforming them into the TryFile
format that the rest of the system understands.
/go/trybot/ingester/ingester.go
: Defines the Ingester
interface.
Ingester
interface: Specifies a contract for components that can process incoming files (represented by file.File
) and produce a stream of trybot.TryFile
objects. The Start
method initiates this processing, typically in a background goroutine. This design allows for different sources or formats of trybot results to be plugged into the system./go/trybot/ingester/gerrit/gerrit.go
: Provides a concrete implementation of the Ingester
interface, specifically for handling trybot results originating from Gerrit code reviews.
Gerrit
struct: Implements ingester.Ingester
. It uses a parser.Parser
(from /perf/go/ingest/parser
) to understand the content of the result files.New
function: Constructor for the Gerrit
ingester.Start
method:file.File
objects.parser.ParseTryBot
. This method extracts the changelist ID (issue
) and patchset number.trybot.TryFile
is created with the extracted CL, patch number, filename, and creation timestamp.TryFile
is then sent to an output channel.parseCounter
, parseFailCounter
) to track the success and failure rates of parsing.files
) and output (ret
) facilitates asynchronous processing, meaning the ingester can process files as they become available without blocking other operations./go/trybot/store
This submodule is responsible for persisting and retrieving TryFile
information and the associated performance measurements.
/go/trybot/store/store.go
: Defines the TryBotStore
interface.TryBotStore
interface: This interface outlines the contract for storing and retrieving trybot data. This abstraction allows different database backends (e.g., CockroachDB, in-memory stores for testing) to be used.Write(ctx context.Context, tryFile trybot.TryFile) error
: Persists a TryFile
and its associated data.List(ctx context.Context, since time.Time) ([]ListResult, error)
: Retrieves a list of unique changelist/patchset combinations that have been processed since a given time. ListResult
contains the CL
(as a string) and Patch
number.Get(ctx context.Context, cl types.CL, patch int) ([]GetResult, error)
: Fetches all performance results for a specific changelist and patch number. GetResult
contains the TraceName
(a unique identifier for a specific metric and parameter combination) and its measured Value
./go/trybot/store/mocks/TryBotStore.go
: Provides a mock implementation of TryBotStore
, generated by the mockery
tool. This is essential for unit testing components that depend on TryBotStore
without needing a real database./go/trybot/results
This submodule focuses on loading and preparing trybot results for analysis and presentation, often by comparing them to baseline data.
/go/trybot/results/results.go
: Defines the structures for requesting and representing analyzed trybot results.
Kind
type (TryBot
, Commit
): Distinguishes whether the analysis request is for trybot data (pre-submit) or for data from an already landed commit (post-submit). This allows the system to handle both scenarios.TryBotRequest
struct: Represents a request from a client (e.g., a UI) to get analyzed performance data. It includes the Kind
, CL
and PatchNumber
(for TryBot
kind), CommitNumber
and Query
(for Commit
kind). The Query
is used to filter the traces to be analyzed when looking at landed commits.TryBotResult
struct: Contains the analysis results for a single trace.Params
: The key-value parameters that uniquely identify the trace.Median
, Lower
, Upper
, StdDevRatio
: Statistical measures derived from the trace data. StdDevRatio
is a key metric indicating how much a new value deviates from the historical distribution, helping to flag regressions or improvements.Values
: A slice of recent historical values for the trace, with the last value being either the trybot result or the value at the specified commit.TryBotResponse
struct: The overall response to a TryBotRequest
.Header
: Column headers for the data, typically representing commit information.Results
: A slice of TryBotResult
for each analyzed trace.ParamSet
: A collection of all unique parameter key-value pairs present in the results, useful for filtering in a UI.Loader
interface: Defines a contract for components that can take a TryBotRequest
and produce a TryBotResponse
. This involves fetching relevant data, performing statistical analysis, and formatting it./go/trybot/results/dfloader/dfloader.go
: Implements the results.Loader
interface using a dataframe.DataFrameBuilder
. DataFrames are a common way to represent tabular data for analysis.
Loader
struct: Holds references to a dataframe.DataFrameBuilder
(for constructing DataFrames from trace data), a store.TryBotStore
(for fetching trybot-specific measurements), and perfgit.Git
(for resolving commit information).TraceHistorySize
constant: Defines how many historical data points to load for each trace for comparison.New
function: Constructor for the Loader
.Load
method: This is the core logic for generating the TryBotResponse
.Determine Timestamp
: If the request is for a Commit
, it fetches the commit details (including its timestamp) using perfgit.Git
. Otherwise, it uses the current time.Parse Query
: If the request kind is Commit
, the provided Query
string is parsed. An empty query for a Commit
request is an error.Fetch Baseline Data (DataFrame)
:Kind
is Commit
: It uses dfb.NewNFromQuery
to load a DataFrame containing the last TraceHistorySize+1
data points for traces matching the query, up to the commit's timestamp. The “+1” is to hold the value at the commit itself or to be a placeholder.Kind
is TryBot
: a. It first calls store.Get
to retrieve the specific trybot measurements for the given CL
and PatchNumber
. b. It then extracts the trace names from these trybot results. c. It calls dfb.NewNFromKeys
to load a DataFrame with TraceHistorySize+1
historical data points for these specific trace names. d. Crucially, it then replaces the last value in each trace within the DataFrame with the corresponding value obtained from the store.Get
call. This effectively injects the trybot's measurement into the historical context for comparison. e. If a trybot result exists for a trace that has no historical data in the DataFrame, that trace is removed from the analysis, and rebuildParamSet
is flagged.Prepare Response Header
: The DataFrame‘s header (commit information) is used for the response. If it’s a TryBot
request, the last header entry (representing the trybot data point) has its Offset
set to types.BadCommitNumber
to indicate it's not a landed commit.Calculate Statistics
: For each trace in the DataFrame:paramtools.Params
.vec32.StdDevRatio
is called with the trace values (which now includes the trybot value at the end if applicable). This function calculates the median, lower/upper bounds, and the standard deviation ratio.results.TryBotResult
is created.StdDevRatio
calculation fails (e.g., insufficient data), the trace is skipped, and rebuildParamSet
is flagged.Sort Results
: The TryBotResult
slice is sorted by StdDevRatio
in descending order. This prioritizes potential regressions (high positive ratio) and significant improvements (high negative ratio).Normalize ParamSet
: If rebuildParamSet
is true (due to missing traces or parsing errors), the ParamSet
for the response is regenerated from the final set of TryBotResult
s.results.TryBotResponse
is assembled and returned./go/trybot/samplesloader
This submodule deals with loading raw sample data from trybot result files. Sometimes, instead of just a single aggregated value, trybots might output multiple raw measurements (samples) for a metric.
/go/trybot/samplesloader/samplesloader.go
: Defines the SamplesLoader
interface.
SamplesLoader
interface: Specifies a method Load(ctx context.Context, filename string) (parser.SamplesSet, error)
that takes a filename (URL to the result file) and returns a parser.SamplesSet
. A SamplesSet
is a map where keys are trace identifiers and values are parser.Samples
(which include parameters and a slice of raw float64 sample values)./go/trybot/samplesloader/gcssamplesloader/gcssamplesloader.go
: Implements SamplesLoader
for files stored in Google Cloud Storage (GCS).
loader
struct: Holds a gcs.GCSClient
for interacting with GCS and a parser.Parser
.New
function: Constructor for the GCS samples loader.Load
method:filename
(which is a GCS URL like gs://bucket/path/file.json
) to extract the bucket and path.storageClient
to read the content of the file from GCS.format.ParseLegacyFormat
(assuming a specific JSON structure for these sample files).parser.SamplesSet
using parser.GetSamplesFromLegacyFormat
.A simplified workflow could look like this:
File Arrival: A new trybot result file appears (e.g., uploaded to GCS).
New File (e.g., in GCS)
Ingestion: An ingester.Ingester
(like ingester.gerrit.Gerrit
) detects and processes this file.
File --> [Gerrit Ingester] --parses--> trybot.TryFile{CL, PatchNum, Filename, Timestamp}
Storage: The TryFile
metadata and potentially the parsed values are written to the store.TryBotStore
.
trybot.TryFile --> [TryBotStore.Write] --> Database
(The actual performance values might be stored alongside the TryFile
metadata or linked via the Filename
if they are in a separate detailed file).
Analysis Request: A user or an automated system requests analysis for a particular CL/Patch via a UI or API, sending a results.TryBotRequest
.
UI/API --sends--> results.TryBotRequest{Kind=TryBot, CL="123", PatchNumber=1}
Data Loading and Comparison: The results.dfloader.Loader
handles this request. results.TryBotRequest | v [dfloader.Loader.Load] | +--(A)--> [TryBotStore.Get(CL, PatchNum)] --> Trybot specific values (Value_T) for traces T1, T2... | +--(B)--> [DataFrameBuilder.NewNFromKeys(traceNames=[T1,T2...])] --> Historical data for T1, T2... | (e.g., [V1_hist1, V1_hist2, ..., V1_histN, _placeholder_]) | +--(C)--> Combine: Replace _placeholder_ with Value_T | (e.g., for T1: [V1_hist1, V1_hist2, ..., V1_histN, V1_T]) | +--(D)--> Calculate StdDevRatio, Median, etc. for each trace | +--(E)--> Sort results | v results.TryBotResponse (sent back to UI/API)
This module is crucial for proactive performance monitoring, enabling teams to catch performance regressions before they land in the main codebase, by systematically ingesting, storing, and analyzing the performance data generated during the pre-submit testing phase. The use of interfaces for storage (TryBotStore
), ingestion (Ingester
), and results loading (results.Loader
) makes the system flexible and extensible.
The go/ts
module serves as a utility to generate TypeScript definition files from Go structs. This is crucial for maintaining type safety and consistency between the Go backend and the TypeScript frontend, particularly when dealing with JSON data structures that are exchanged between them. The core problem this module solves is bridging the gap between Go‘s static typing and TypeScript’s type system for data interchange, ensuring that changes in Go struct definitions are automatically reflected in the frontend's TypeScript types.
The primary component is the main.go
file. Its responsibility is to:
-o
) where the generated TypeScript file will be written.go2ts.Generator
: This is the core engine from the go/go2ts
library responsible for the Go-to-TypeScript conversion.GenerateNominalTypes = true
: This setting likely ensures that the generated TypeScript types are nominal (i.e., types are distinct based on their name, not just their structure), which can provide stronger type checking.AddIgnoreNil
: This is used for specific Go types like paramtools.Params
, paramtools.ParamSet
, paramtools.ReadOnlyParamSet
, and types.TraceSet
. This suggests that nil
values for these types in Go should likely be treated as optional or nullable fields in TypeScript, or perhaps excluded from the generated types if they are always expected to be non-nil when serialized.generator.AddMultiple
to register a wide array of Go structs from various perf
submodules (e.g., alerts
, chromeperf
, clustering2
, frontend/api
, regression
). These are the structs that are serialized to JSON and consumed by the frontend. By registering them, the generator knows which Go types to convert into corresponding TypeScript interfaces or types.addMultipleUnions
helper function and generator.AddUnionToNamespace
are used to register Go union types (often represented as a collection of constants or an interface implemented by several types). This ensures that TypeScript enums or union types are generated, reflecting the possible values or types a Go field can hold. The typeName
argument in unionAndName
and the namespace argument in AddUnionToNamespace
control how these unions are named and organized in the generated TypeScript.generator.AddToNamespace
is used to group related types under a specific namespace in the generated TypeScript, improving organization (e.g., pivot.Request{}
is added to the pivot
namespace).generator.Render(w)
writes the generated TypeScript definitions to the specified output file.The design decision to use a dedicated program for this generation task, rather than manual synchronization or other methods, highlights the importance of automation and reducing the likelihood of human error in keeping backend and frontend types aligned. The reliance on the go/go2ts
library centralizes the core conversion logic, making this module a consumer and orchestrator of that library for the specific needs of the Skia Perf application.
A key workflow is triggered by the //go:generate
directive at the top of main.go
: //go:generate bazelisk run --config=mayberemote //:go -- run . -o ../../modules/json/index.ts
This command, when go generate
is run (typically as part of a build process), executes the compiled go/ts
program.
Workflow:
perf
submodule that is serialized to JSON for the UI.go generate
within the go/ts
module's directory (or a higher-level directory that includes it).go:generate
directive executes the main
function in go/ts/main.go
.main.go
-> Uses go2ts.Generator
-> Registers relevant Go structs and unions.go2ts.Generator
-> Analyzes registered Go types -> Generates corresponding TypeScript definitions.main.go
-> Writes the TypeScript definitions to ../../modules/json/index.ts
.The choice of specific structs and unions registered in main.go
reflects the data contracts between the Perf backend and its frontend UI. Any Go struct that is part of an API response or request payload handled by the frontend needs to be included here.
This module defines core data types used throughout the Perf application. These types provide a standardized way to represent fundamental concepts related to commits, performance data (traces), and alert configurations. The design prioritizes clarity, type safety, and consistency across different parts of the system.
CommitNumber
(types.go
): Represents a unique, sequential identifier for a commit within a repository.
CommitNumber(0)
.int32
. It includes an Add
method for safe offsetting and a BadCommitNumber
constant (-1
) to represent invalid or non-existent commit numbers.CommitNumberSlice
(types.go
): A utility type to enable sorting of CommitNumber
slices, which is useful for various data processing and display tasks.TileNumber
(types.go
): Represents an index for a “tile” in the TraceStore
. Performance data (traces) are often stored in chunks or tiles for efficient storage and retrieval.
int32
. Functions like TileNumberFromCommitNumber
and TileCommitRangeForTileNumber
manage the mapping between commit numbers and tile numbers based on a configurable tileSize
. The Prev()
method allows navigation to the preceding tile, and BadTileNumber
(-1
) indicates an invalid tile.Workflow: Commit to Tile Mapping
CommitNumber ----(tileSize)----> TileNumberFromCommitNumber() ----> TileNumber | V TileCommitRangeForTileNumber() ----> (StartCommit, EndCommit)
Trace
(types.go
): Represents a sequence of performance measurements, typically corresponding to a specific metric over a series of commits.
[]float32
. The NewTrace
function initializes a trace of a given length with a special vec32.MISSING_DATA_SENTINEL
value, which is crucial for distinguishing between actual zero values and missing data points. This leverages the go.skia.org/infra/go/vec32
package for optimized float32 vector operations.TraceSet
(types.go
): A collection of Trace
s, keyed by a string identifier (trace ID).
map[string]Trace
.RegressionDetectionGrouping
(types.go
): An enumeration defining how traces are grouped for regression detection.
KMeansGrouping
(cluster traces by shape) and StepFitGrouping
(analyze each trace individually for steps). ToClusterAlgo
provides a safe way to convert strings to this type.StepDetection
(types.go
): An enumeration defining the algorithms used to detect significant steps (changes) in individual traces or cluster centroids.
OriginalStep
, AbsoluteStep
, PercentStep
, CohenStep
, and MannWhitneyU
. ToStepDetection
ensures type-safe conversion from strings.AlertAction
(types.go
): An enumeration defining the actions to be taken when an anomaly (potential regression) is detected by an alert configuration.
NoAction
, FileIssue
, and Bisection
.Domain
(types.go
): Specifies the range of commits over which an operation (like regression detection) should be performed.
N
(number of commits) and End
(timestamp for the end of the range) or an Offset
(a specific commit number).ProgressCallback
(types.go
): A function type used to provide feedback on the progress of long-running operations.
func(message string)
.CL
(types.go
): Represents a Change List identifier (e.g., a GitHub Pull Request number).
string
.AnomalyDetectionNotifyType
(types.go
): Defines the notification mechanism for anomalies.
IssueNotify
(send to issue tracker) and NoneNotify
(no notification).ProjectId
(types.go
): Represents a project identifier.
string
with a predefined list AllProjectIds
.AllMeasurementStats
(types.go
): A list of valid statistical suffixes that can be part of performance measurement keys (e.g., “avg”, “max”).
[]string
slice.The unit tests in types_test.go
focus on validating the logic of CommitNumber
arithmetic and the mapping between CommitNumber
and TileNumber
, ensuring the core indexing mechanisms are correct.
The /go/ui
module is responsible for handling frontend requests and preparing data for display in the Perf UI. Its primary purpose is to bridge the gap between user interactions on the frontend (e.g., selecting time ranges, defining queries, or applying formulas) and the backend data sources and processing logic.
This module is designed to be the central point for fetching and transforming performance data into a format that can be readily consumed by the UI. It orchestrates interactions with various other modules, such as those responsible for accessing Git history (/go/git
), building dataframes (/go/dataframe
), handling data shortcuts (/go/shortcut
), and calculating derived metrics (/go/calc
).
The key rationale behind this module's existence is to encapsulate the complexity of data retrieval and preparation, providing a clean and consistent API for the frontend. This separation of concerns allows the frontend to focus on presentation and user interaction, while the backend handles the intricacies of data access and manipulation.
The main workflow involves receiving a FrameRequest
from the frontend, processing it to fetch and transform data, and then returning a FrameResponse
containing the prepared data and display instructions.
/go/ui/frame/frame.go
: This is the core file of the module.FrameRequest
) and backend responses (FrameResponse
). FrameRequest
captures user inputs like time ranges, queries, formulas, and pivot table configurations. FrameResponse
packages the resulting data, along with display hints and any relevant messages.FrameRequest
objects. This involves dispatching tasks to other modules based on the request parameters. For example, it uses the dataframe.DataFrameBuilder
to fetch data based on queries or trace keys, the calc
module to evaluate formulas, and the pivot
module to restructure data for pivot tables.REQUEST_TIME_RANGE
) or a fixed number of recent commits (REQUEST_COMPACT
).anomalies.Store
and associates them with the relevant traces in the response. This can be done based on time ranges or commit revision numbers.ProcessFrameRequest
function is the main entry point for handling a request. It creates a frameRequestProcess
struct to manage the state of the request processing.DataFrame
.reportError
to ensure consistent logging and error propagation.progress.Progress
interface, allowing the frontend to display updates during long-running requests.REQUEST_TIME_RANGE
and REQUEST_COMPACT
request types caters to different user needs: exploring specific historical periods versus viewing the latest trends.FrameResponse
aims to provide users with immediate context about significant performance changes alongside the raw data. The system supports fetching anomalies based on either time or revision ranges, offering flexibility depending on how anomalies are tracked and stored.ResponseFromDataFrame
function acts as a final assembly step, taking a processed DataFrame
and enriching it with SKP change information, display mode, and handling potential truncation.A typical request processing flow might look like this:
Frontend Request (FrameRequest) | V ProcessFrameRequest() in frame.go | +------------------------------+-----------------------------+--------------------------+ | | | | V V V V (If Queries exist) (If Formulas exist) (If Keys exist) (If Pivot requested) doSearch() doCalc() doKeys() pivot.Pivot() | | | | V V V V dfBuilder.NewFromQuery...() calc.Eval() with dfBuilder.NewFromKeys...() Restructure DataFrame rowsFromQuery/Shortcut() | | | | +------------------------------+-----------------------------+--------------------------+ | V DataFrame construction and merging | V (If anomaly search enabled) addTimeBasedAnomaliesToResponse() OR addRevisionBasedAnomaliesToResponse() | V anomalyStore.GetAnomalies...() | V ResponseFromDataFrame() | V getSkps() (Find significant file changes) | V Truncate response if too large | V Set DisplayMode | V Backend Response (FrameResponse) | V Frontend UI
The urlprovider
module is designed to generate URLs for various pages within the Perf application. This centralized approach ensures consistency in URL generation across different parts of the application and simplifies the process of linking to specific views with pre-filled parameters. The key motivation is to abstract away the complexities of URL query parameter construction and to provide a simple interface for generating links to common Perf views like “Explore”, “MultiGraph”, and “GroupReport”.
The core component of this module is the URLProvider
struct. An instance of URLProvider
is initialized with a perfgit.Git
object. This dependency is crucial because some URL generation, particularly for time-range-based views, requires fetching commit information (specifically timestamps) from the Git repository to define the “begin” and “end” parameters of the URL.
urlprovider.go
: This file contains the primary logic for the URL provider.URLProvider
struct: Holds a reference to a perfgit.Git
instance. This allows it to interact with the Git repository to fetch commit details needed for constructing time-based query parameters.New(perfgit perfgit.Git) *URLProvider
: This constructor function creates and returns a new instance of URLProvider
. It takes a perfgit.Git
object as an argument, which is stored within the struct. This design choice makes the URLProvider
stateful with respect to its Git interaction capabilities.Explore(...) string
: This method generates a URL for the “Explore” page (/e/
).getQueryParams
to construct the common query parameters like begin
, end
, and disable_filter_parent_traces
. The begin
and end
timestamps are derived from the provided startCommitNumber
and endCommitNumber
by querying the perfGit
instance. The end
timestamp is intentionally shifted forward by one day to ensure that anomalies at the very end of the selected range are visible on the graph.parameters
map (which contains key-value pairs for filtering traces) into a URL-encoded query string using GetQueryStringFromParameters
. This encoded string is assigned to the queries
parameter of the final URL.queryParams
(passed as url.Values
) can be merged into the URL./e/?
.MultiGraph(...) string
: This method generates a URL for the “MultiGraph” page (/m/
).Explore
, it uses getQueryParams
to build the common time-range and filtering parameters.shortcut
parameter with the provided shortcutId
.queryParams
can also be merged./m/?
.GroupReport(param string, value string) string
: This static function generates a URL for the “Group Report” page (/u/
).Explore
and MultiGraph
, it does not inherently depend on a time range derived from commits, nor does it require complex parameter encoding.param
against a predefined list of allowed parameters (anomalyGroupID
, anomalyIDs
, bugID
, rev
, sid
). This is a security and correctness measure to prevent arbitrary parameters from being injected.param
is valid, it constructs a simple URL with the provided param
and value
.param
is invalid.URLProvider
) because it doesn't need access to the perfGit
instance or any other state within URLProvider
. This simplifies its usage for cases where only a group report URL is needed without initializing a full URLProvider
.getQueryParams(...) url.Values
: This private helper method is responsible for creating the base set of query parameters common to Explore
and MultiGraph
.fillCommonParams
to set the begin
and end
parameters based on commit numbers.disable_filter_parent_traces=true
if requested.queryParams
provided by the caller.fillCommonParams(...)
: This private helper populates the begin
and end
timestamp parameters in the provided url.Values
.perfGit
instance to look up the Commit
objects corresponding to the startCommitNumber
and endCommitNumber
. The timestamps from these commits are then used. As mentioned earlier, the end
timestamp is adjusted by adding one day. This separation of concerns keeps the main Explore
and MultiGraph
methods cleaner.GetQueryStringFromParameters(parameters map[string][]string) string
: This helper method converts a map of string slices (representing query parameters where a single key can have multiple values) into a URL-encoded query string.Generating an “Explore” Page URL:
Caller provides: context, startCommitNum, endCommitNum, filterParams, disableFilterParent, otherQueryParams | v URLProvider.Explore() | +-------------------------------------+ | | v v getQueryParams() GetQueryStringFromParameters(filterParams) | | +--> fillCommonParams() +--> Encode filterParams | | | | +--> perfGit.CommitFromCommitNumber() -> Get start timestamp | | | | +--> perfGit.CommitFromCommitNumber() -> Get end timestamp, add 1 day | | | | +----------------------------------------+ | | | v | Combine begin, end, disableFilterParent, otherQueryParams into url.Values | | +-------------------------------------+ | v Combine base URL ("/e/?"), common query params, and encoded filterParams string | v Return final URL string
Generating a “MultiGraph” Page URL:
Caller provides: context, startCommitNum, endCommitNum, shortcutId, disableFilterParent, otherQueryParams | v URLProvider.MultiGraph() | v getQueryParams() | +--> fillCommonParams() | | | +--> perfGit.CommitFromCommitNumber() -> Get start timestamp | | | +--> perfGit.CommitFromCommitNumber() -> Get end timestamp, add 1 day | | | +----------------------------------------+ | | | v | Combine begin, end, disableFilterParent, otherQueryParams into url.Values | v Add "shortcut=shortcutId" to url.Values | v Combine base URL ("/m/?") and all query params | v Return final URL string
Generating a “Group Report” Page URL:
Caller provides: paramName, paramValue | v urlprovider.GroupReport() | v Validate paramName against allowed list | +-- (Valid) --> Construct URL: "/u/?" + paramName + "=" + paramValue | | | v | Return URL string | +-- (Invalid) --> Return "" (empty string)
The design emphasizes reusability of common parameter generation logic (getQueryParams
, fillCommonParams
) and clear separation of concerns for generating URLs for different Perf pages. The dependency on perfgit.Git
is explicitly managed through the URLProvider
struct, making it clear when Git interaction is necessary.
The userissue
module is responsible for managing the association between specific data points in Perf (identified by a trace key and a commit position) and Buganizer issues. This allows users to flag specific performance regressions or anomalies and link them directly to a tracking issue.
The core of this module is the Store
interface, which defines the contract for persisting and retrieving these user-issue associations. The primary implementation of this interface is sqluserissuestore
, which leverages a SQL database (specifically CockroachDB in this context) to store the data.
Key Responsibilities and Components:
store.go
: This file defines the central UserIssue
struct and the Store
interface.
UserIssue
struct: Represents a single association. It contains:UserId
: The email of the user who made the association.TraceKey
: A string uniquely identifying a performance metric's trace (e.g., “,arch=x86,config=Release,test=MyTest,”).CommitPosition
: An integer representing a specific point in the commit history where the data point exists.IssueId
: The numerical ID of the Buganizer issue.Store
interface: This interface dictates the operations that any backing store for user issues must support:Save(ctx context.Context, req *UserIssue) error
: Persists a new UserIssue
association. The implementation must handle potential conflicts, such as trying to save a duplicate entry (same trace key and commit position).Delete(ctx context.Context, traceKey string, commitPosition int64) error
: Removes an existing user-issue association based on its unique trace key and commit position. It should handle cases where the specified association doesn't exist.GetUserIssuesForTraceKeys(ctx context.Context, traceKeys []string, startCommitPosition int64, endCommitPosition int64) ([]UserIssue, error)
: Retrieves all UserIssue
associations for a given set of trace keys within a specified range of commit positions. This is crucial for displaying these associations on performance graphs or reports.sqluserissuestore/sqluserissuestore.go
: This is the SQL-backed implementation of the Store
interface.
go.skia.org/infra/go/sql/pool
for managing database connections.listUserIssues
, use Go's text/template
package to dynamically construct the IN
clause for multiple traceKeys
. This is a common pattern to avoid SQL injection vulnerabilities and handle variadic inputs efficiently.Save
: Inserts a new row into the UserIssues
table. It includes a last_modified
timestamp.Delete
: First, it attempts to retrieve the issue to ensure it exists before attempting deletion. This provides a more informative error message if the record is not found.GetUserIssuesForTraceKeys
: Constructs a SQL query using a template to select issues matching the provided trace keys and commit position range. It then iterates over the query results and populates a slice of UserIssue
structs.sqluserissuestore/schema/schema.go
: This file defines the Go struct UserIssueSchema
which directly maps to the SQL table schema for UserIssues
.
user_id TEXT NOT NULL
trace_key TEXT NOT NULL
commit_position INT NOT NULL
issue_id INT NOT NULL
last_modified TIMESTAMPTZ DEFAULT now()
PRIMARY KEY(trace_key, commit_position)
: The combination of trace_key
and commit_position
uniquely identifies a user issue, preventing multiple issues from being associated with the exact same data point.mocks/Store.go
: This contains a mock implementation of the Store
interface, generated using the testify/mock
library.
userissue.Store
without requiring a live database connection. It allows developers to define expected calls and return values for the store's methods.Workflow Example: Saving a User Issue
userissue.UserIssue
struct and calls the Save
method on an instance of userissue.Store
(likely sqluserissuestore.UserIssueStore
). User Request (UI) | v API Endpoint | v Backend Handler | | Creates userissue.UserIssue{UserId:"...", TraceKey:"...", CommitPosition:123, IssueId:45678} v userissue.Store.Save(ctx, &issue) | v sqluserissuestore.UserIssueStore.Save() | | Constructs SQL: INSERT INTO UserIssues (...) VALUES ($1, $2, $3, $4, $5) v SQL Database (UserIssues Table) <-- Row inserted
Workflow Example: Retrieving User Issues for a Chart
GetUserIssuesForTraceKeys
on the userissue.Store
. Chart Display Request (UI) | | Provides: traceKeys=["trace1", "trace2"], startCommit=100, endCommit=200 v API Endpoint | v Backend Handler | v userissue.Store.GetUserIssuesForTraceKeys(ctx, traceKeys, startCommit, endCommit) | v sqluserissuestore.UserIssueStore.GetUserIssuesForTraceKeys() | | Constructs SQL: SELECT ... FROM UserIssues WHERE trace_key IN ('trace1', 'trace2') AND commit_position>=100 AND commit_position<=200 v SQL Database (UserIssues Table) | | Returns rows matching the query v Backend Handler | | Formats response v API Endpoint | v UI (displays issue markers on chart)
The design emphasizes a clear separation of concerns with the Store
interface, allowing for different storage backends if necessary (though SQL is the current and likely long-term choice). The SQL implementation is straightforward, using parameterized queries for security and templates for dynamic query construction where appropriate.
This module defines and implements Temporal workflows for automating tasks related to performance anomaly detection and analysis in Skia Perf. It orchestrates interactions between various services like the AnomalyGroup service, Culprit service, and Gerrit service to achieve end-to-end automation. The primary goal is to streamline the process of identifying performance regressions, finding their root causes (culprits), and notifying relevant parties.
The workflows are designed to be resilient and fault-tolerant, leveraging Temporal's capabilities for retries and state management. This ensures that even if individual steps or external services encounter transient issues, the overall process can continue and eventually complete.
The module is structured into a public API (workflows.go
) and an internal implementation package (internal/
).
workflows.go
:
ProcessCulprit
, MaybeTriggerBisection
): These string constants are the canonical names used to invoke the respective workflows via the Temporal client. Using constants helps avoid typos and ensures consistency.ProcessCulpritParam
, ProcessCulpritResult
, MaybeTriggerBisectionParam
, MaybeTriggerBisectionResult
): These structs define the data that needs to be passed into a workflow and the data that a workflow is expected to return upon completion. They ensure type safety and clarity in communication.internal/
package: This package contains the actual implementation of the workflows and their associated activities. Activities are the building blocks of Temporal workflows, representing individual units of work that can be executed, retried, and timed out independently.
options.go
:
regularActivityOptions
: Defines default options (e.g., 1-minute timeout, 10 retry attempts) for standard activities that are expected to complete quickly, like API calls to other services.childWorkflowOptions
: Defines options for child workflows (e.g., 12-hour execution timeout, 4 retry attempts). This longer timeout accommodates potentially resource-intensive tasks like bisections which involve compilation and testing.maybe_trigger_bisection.go
:
MaybeTriggerBisectionWorkflow
, which is the core logic for deciding whether to automatically find the cause of a performance regression (bisection) or to simply report the anomaly._WAIT_TIME_FOR_ANOMALIES
, e.g., 30 minutes). This allows time for related anomalies to be detected and grouped together, potentially providing a more comprehensive picture before taking action. Wait for more anomalies ->
Load Anomaly Group (Activity) AnomalyGroup Service <---> Workflow
GroupAction
field of the anomaly group: - If BISECT
: a. Load Top Anomaly: Fetches the most significant anomaly within the group. b. Resolve Commit Hashes: Converts the start and end commit positions of the anomaly into Git commit hashes using an activity that interacts with a Gerrit/Crrev service. Get Commit Hashes (Activity) Gerrit/Crrev Service <---> Workflow
c. Launch Bisection (Child Workflow): Triggers a separate CulpritFinderWorkflow
(defined in the pinpoint/go/workflows
module) as a child workflow. This child workflow is responsible for performing the actual bisection. - A unique ID is generated for the Pinpoint job. - The child workflow is configured with ParentClosePolicy: ABANDON
, meaning it will continue running even if this parent workflow terminates. This is crucial because bisections can be long-running. - Callback parameters are passed to the child workflow so it knows how to report its findings back (e.g., which Anomaly Group ID it's associated with, which Culprit service to use). Launch Pinpoint Bisection Workflow -----------------> Pinpoint.CulpritFinderWorkflow (Child)
d. Update Anomaly Group: Records the ID of the launched bisection job back into the AnomalyGroup. Update Anomaly Group with Bisection ID (Activity) AnomalyGroup Service <---> Workflow
- If REPORT
: a. Load Top Anomalies: Fetches a list of the top N anomalies in the group. b. Notify User: Calls an activity that uses the Culprit service to file a bug or send a notification about these anomalies. Notify User of Anomalies (Activity) Culprit Service <--------> Workflow
parseStatisticNameFromChart
, benchmarkStoriesNeedUpdate
, updateStoryDescriptorName
: These functions handle specific data transformations needed to correctly format parameters for the Pinpoint bisection request, often due to legacy conventions or differences in how metrics are named.process_culprit.go
:
ProcessCulpritWorkflow
, which handles the results of a completed bisection (i.e., when one or more culprits are identified).Persist Culprit (Activity) Culprit Service <--------> Workflow
Notify User of Culprit (Activity) Culprit Service <--------> Workflow
ParsePinpointCommit
: Handles the parsing of repository URLs from the Pinpoint commit format (e.g., https://{host}/{project}.git
) into separate host and project components required by the Culprit service.anomalygroup_service_activity.go
:
LoadAnomalyGroupByID
: Fetches an anomaly group by its ID.FindTopAnomalies
: Retrieves the most significant anomalies within a group.UpdateAnomalyGroup
: Updates an existing anomaly group (e.g., to add a bisection ID).culprit_service_activity.go
:
anomalygroup_service_activity.go
, this encapsulates communication with the Culprit service.PeristCulprit
: Stores culprit information.NotifyUserOfCulprit
: Notifies users about a found culprit (e.g., by creating a bug).NotifyUserOfAnomaly
: Notifies users about a set of anomalies (used when the group action is REPORT
).gerrit_service_activity.go
:
GetCommitRevision
: Takes a commit position (as an integer) and returns its corresponding Git hash.worker/main.go
:
main
function sets up the worker, connects it to the Temporal server, and registers the workflows and activities it's capable of handling.localhost.dev
or a production queue name). Workflows and activities are dispatched to workers listening on the correct task queue.ProcessCulpritWorkflow
and MaybeTriggerBisectionWorkflow
with the worker, associating them with their public names (e.g., workflows.ProcessCulprit
).CulpritServiceActivity
, AnomalyGroupServiceActivity
, GerritServiceActivity
) with the worker.1. Anomaly Group Processing and Potential Bisection (MaybeTriggerBisectionWorkflow
)
External Trigger (e.g., new AnomalyGroup created) | v Start MaybeTriggerBisectionWorkflow(AG_ID) | +----------------------------------+ | Wait (e.g., 30 mins) | +----------------------------------+ | v LoadAnomalyGroupByID(AG_ID) ----> AnomalyGroup Service | +-----------+ | GroupAction?| +-----------+ / \ / \ BISECT REPORT | | v v FindTopAnomalies(AG_ID, Limit=1) FindTopAnomalies(AG_ID, Limit=10) | | v v GetCommitRevision(StartCommit) --> Gerrit Anomalies --> Convert to CulpritService format | | v v GetCommitRevision(EndCommit) --> Gerrit NotifyUserOfAnomaly(AG_ID, Anomalies) --> Culprit Service | v Execute Pinpoint.CulpritFinderWorkflow (Child) | (Async, ParentClosePolicy=ABANDON) | Params: {StartHash, EndHash, Config, Benchmark, Story, ... | CallbackParams: {AG_ID, CulpritServiceURL, GroupingTaskQueue}} | v UpdateAnomalyGroup(AG_ID, BisectionID) --> AnomalyGroup Service | v End Workflow
2. Processing Bisection Results (ProcessCulpritWorkflow
)
This workflow is typically triggered as a callback by the Pinpoint CulpritFinderWorkflow
when it successfully identifies a culprit.
Pinpoint.CulpritFinderWorkflow completes | (Calls back to Temporal, invoking ProcessCulpritWorkflow) v Start ProcessCulpritWorkflow(Commits, AG_ID, CulpritServiceURL) | +----------------------------------+ | Convert Pinpoint Commits to | | Culprit Service Format | | (Parse Repository URLs) | +----------------------------------+ | v PersistCulprit(Commits, AG_ID) --------> Culprit Service | (Returns CulpritIDs) v NotifyUserOfCulprit(CulpritIDs, AG_ID) -> Culprit Service | (Returns IssueIDs, e.g., bug numbers) v End Workflow
The /integration
module provides a dataset and tools for conducting integration tests on the Perf performance monitoring system. Its primary purpose is to offer a controlled and reproducible environment for verifying the ingestion and processing capabilities of Perf.
The core of this module is the data
subdirectory. This directory houses a collection of JSON files, each representing performance data associated with specific commits from the perf-demo-repo
(https://github.com/skia-dev/perf-demo-repo.git). These files are structured according to the format.Format
schema defined in go.skia.org/infra/perf/go/ingest/format
. This standardized format is crucial as it allows Perf's ‘dir’ type ingester to directly consume these files. The dataset is intentionally designed to include a mix of valid data points and specific error conditions:
perf-demo-repo
.demo_data_commit_10.json
) contains a git_hash
that does not correspond to an actual commit in the perf-demo-repo
. This allows testing how Perf handles data associated with unknown or invalid commit identifiers.malformed.json
is intentionally not a valid JSON file. This is used to test Perf's error handling capabilities when encountering incorrectly formatted input data.The generation of these data files is handled by generate_data.go
. This Go program is responsible for creating the JSON files in the data
directory. It uses a predefined list of commit hashes from the perf-demo-repo
and generates random but plausible performance metrics for each. The inclusion of this generator script is important because it allows developers to easily modify, expand, or regenerate the test dataset if the testing requirements change or if new scenarios need to be covered. The script uses math/rand
for generating some variability in the measurement values, ensuring the data isn't entirely static while still being predictable.
The key workflow for utilizing this module in an integration test scenario would look something like this:
/integration/data
directory. Perf Instance --> Ingester (type: 'dir') --> /integration/data/*.json
malformed.json
file.The BUILD.bazel
file defines how the components of this module are built.
data
filegroup
makes the JSON test files available to other parts of the system, specifically for use in performance testing (//perf:__subpackages__
).integration_lib
go_library
encapsulates the logic from generate_data.go
.integration
go_binary
provides an executable to run generate_data.go
, allowing for easy regeneration of the test data.In essence, the /integration
module provides a self-contained, version-controlled set of test data and a mechanism to regenerate it. This is crucial for ensuring the stability and correctness of Perf‘s data ingestion pipeline by providing a consistent baseline for integration testing. The choice to include both valid and intentionally erroneous data points allows for comprehensive testing of Perf’s data handling capabilities, including its robustness in the face of invalid input.
The /jupyter
module provides tools and examples for interacting with Skia's performance data, specifically data from perf.skia.org
. The primary goal is to enable users to programmatically query, analyze, and visualize performance metrics using the power of Python libraries like Pandas, NumPy, and Matplotlib within a Jupyter Notebook environment.
The core functionality revolves around fetching and processing performance data. This is achieved by providing Python functions that abstract the complexities of interacting with the perf.skia.org
API. This allows users to focus on the data analysis itself rather than the underlying data retrieval mechanisms.
Key Components/Files:
/jupyter/Perf+Query.ipynb
: This is a Jupyter Notebook that serves as both an example and a utility library.
Why: It demonstrates how to use the provided Python functions to query performance data. It also contains the definitions of these key functions, making it a self-contained environment for performance analysis. The notebook format is chosen for its interactive nature, allowing users to execute code snippets, see results immediately, and experiment with different queries and visualizations.
How:
perf_calc(formula)
: This function is designed to evaluate a specific formula against the performance data. It takes a string formula
(e.g., 'count(filter(\"\"))'
) as input. The formula is sent to the perf.skia.org
backend for processing. This function is useful when you need to perform calculations or aggregations on the data directly on the server side before retrieving it.
perf_query(query)
: This function allows for more direct querying of performance data based on key-value pairs. It takes a query string (e.g., 'source_type=skp&sub_result=min_ms'
) that specifies the parameters for data retrieval. This is suitable when you want to fetch raw or filtered trace data.
perf_impl(body)
: This is an internal helper function used by both perf_calc
and perf_query
. It handles the actual HTTP communication with perf.skia.org
. It first determines the time range for the query (typically the last 50 commits by default) by fetching initial page data. Then, it sends the query or formula to the /_/frame/start
endpoint, polls the /_/frame/status
endpoint until the request is successful, and finally retrieves the results from /_/frame/results
. The results are then processed into a Pandas DataFrame, which is a powerful data structure for analysis in Python. A special value 1e32
from the backend (often representing missing or invalid data) is converted to np.nan
(Not a Number) for better handling in Pandas.
paramset()
: This utility function fetches the available parameter set from perf.skia.org
. This is useful for discovering the possible values for different dimensions like ‘model’, ‘test’, ‘cpu_or_gpu’, etc., which can then be used to construct more targeted queries.
Examples: The notebook is rich with examples showcasing how to use perf_calc
and perf_query
, plot the resulting DataFrames using Pandas' built-in plotting capabilities or Matplotlib directly, normalize data, calculate means, and perform more complex analyses like finding the noisiest hardware models or comparing CPU vs. GPU performance for specific tests. These examples serve as practical starting points for users.
Workflow (Simplified perf_impl
):
Client (Jupyter Notebook)
-- GET /_/initpage/
--> perf.skia.org
(Get time bounds)
perf.skia.org
-- Initial Data (JSON)
--> Client
Client
-- POST /_/frame/start (with query/formula & time bounds)
--> perf.skia.org
perf.skia.org
-- Request ID (JSON)
--> Client
Client
-- GET /_/frame/status/{ID}
--> perf.skia.org
(Loop until ‘Success’)
perf.skia.org
-- Status (JSON)
--> Client
Client
-- GET /_/frame/results/{ID}
--> perf.skia.org
perf.skia.org
-- Performance Data (JSON)
--> Client
Client (Python)
: Parse JSON -> Create Pandas DataFrame -> Return DataFrame to user.
/jupyter/README.md
: This file provides instructions on setting up the necessary Python environment to run Jupyter Notebooks and the required libraries (Pandas, SciPy, Matplotlib).
virtualenv
) is recommended to isolate project dependencies and avoid conflicts.pip
, python-dev
, and python-virtualenv
using apt-get
(assuming a Debian-based Linux system). It then shows how to create a virtual environment, activate it, upgrade pip
, and install jupyter
, notebook
, scipy
, pandas
, and matplotlib
within that isolated environment. Finally, it explains how to run the Jupyter Notebook server and deactivate the environment when done. This ensures a reproducible and clean setup for users wanting to utilize the Perf+Query.ipynb
notebook.The design emphasizes ease of use for data analysts and developers who need to interact with Skia's performance data. By leveraging Jupyter Notebooks, it provides an interactive and visual way to explore performance trends and issues. The abstraction of API calls into simple Python functions (perf_calc
, perf_query
) significantly lowers the barrier to entry for accessing this rich dataset.
The /lint
module is responsible for ensuring code quality and consistency within the project by integrating and configuring JSHint, a popular JavaScript linting tool.
The primary goal of this module is to provide a standardized way to identify and report potential errors, stylistic issues, and anti-patterns in the JavaScript codebase. This helps maintain code readability, reduces the likelihood of bugs, and promotes adherence to established coding conventions.
The core component of this module is the reporter.js
file. This file defines a custom reporter function that JSHint will use to format and output the linting results.
The decision to implement a custom reporter stems from the need to present linting errors in a clear, concise, and actionable format. Instead of relying on JSHint‘s default output, which might be too verbose or not ideally suited for the project’s workflow, reporter.js
provides a tailored presentation.
The reporter
function within reporter.js
takes an array of error objects (res
) as input, where each object represents a single linting issue found by JSHint. It then iterates through these error objects and constructs a formatted string for each error. The format chosen is filename:line:character message
, which directly points developers to the exact location of the issue in the source code.
For example: src/myFile.js:10:5 Missing semicolon
This specific format is chosen for its commonality in development tools and its ease of integration with various editors and IDEs, allowing developers to quickly navigate to the reported errors.
After processing all errors, if any were found, the reporter
function aggregates the formatted error strings and prints them to the standard output (process.stdout.write
). Additionally, it appends a summary line indicating the total number of errors found, ensuring that developers have a quick overview of the linting status. The pluralization of “error” vs. “errors” is also handled for grammatical correctness.
The workflow can be visualized as:
JSHint analysis --[error objects]--> reporter.js --[formatted errors & summary]--> stdout
By controlling the output format, this module ensures that linting feedback is consistently presented and easily digestible, contributing to a more efficient development process. The design prioritizes providing actionable information to developers, enabling them to address code quality issues promptly.
This module is responsible for managing SQL database schema migrations for Perf. Perf utilizes SQL backends to store various data, including trace data, shortcuts, and alerts. As the application evolves, the database schema may need to change. This module provides the mechanism to apply these changes and to upgrade existing databases to the schema expected by the current Perf version.
The core of this system relies on the github.com/golang-migrate/migrate/v4
library. This library provides a robust framework for versioning database schemas and applying migrations in a controlled manner.
The key design principle is to have a versioned set of SQL scripts for each supported SQL dialect. This allows Perf to:
Each SQL dialect (e.g., CockroachDB) has its own subdirectory within the /migrations
module. The naming convention for these directories is critical: they must match the values defined in sql.Dialect
.
Inside each dialect-specific directory, migration files are organized by version.
0001_
, 0002_
)..up.
file (e.g., 0001_create_initial_tables.up.sql
): Contains SQL statements to apply the schema changes for that version..down.
file (e.g., 0001_create_initial_tables.down.sql
): Contains SQL statements to revert the schema changes introduced by the corresponding .up.
file.This paired approach ensures that migrations can be applied and rolled back smoothly.
Key Files and Responsibilities:
README.md
: Provides a high-level overview of the migration system, explaining its purpose and the use of the golang-migrate/migrate
library. It also details the directory structure and file naming conventions for migration scripts.cockroachdb/
: This directory contains the migration scripts specifically for the CockroachDB dialect.cockroachdb/0001_create_initial_tables.up.sql
: This is the first migration script for CockroachDB. It defines the initial schema for Perf, creating tables such as TraceValues
, SourceFiles
, ParamSets
, Postings
, Shortcuts
, Alerts
, Regressions
, and Commits
. The table definitions include primary keys, indexes, and column types tailored for efficient data storage and retrieval specific to Perf's needs (e.g., storing trace data, associating traces with source files, managing alert configurations, and tracking commit history). The schema is designed to support the various functionalities of Perf, such as querying traces by parameters, retrieving trace values over commit ranges, and linking regressions to specific alerts and commits.cockroachdb/0001_create_initial_tables.down.sql
: This file is intended to contain SQL statements to drop the tables created by its corresponding .up.
script. However, as a safety precaution against accidental data loss, it is currently empty. The design acknowledges the potential danger of automated table drops in a production environment.cdb.sql
: This is a utility SQL script designed for developers to interact with and test queries against a CockroachDB instance populated with Perf data. It includes sample INSERT
statements to populate tables with test data and various SELECT
queries demonstrating common data retrieval patterns used by Perf. This file is not part of the automated migration process but serves as a helpful tool for development and debugging. It showcases how to query for traces based on parameters, retrieve trace values, find the most recent tile, and get source file information. It also includes examples of more complex queries involving INTERSECT
and JOIN
operations, reflecting the kinds of queries Perf might execute.test.sql
: Similar to cdb.sql
, this script is for testing and experimentation, but it's tailored for a SQLite database. It creates a schema similar to the CockroachDB one (though potentially simplified or with slight variations due to dialect differences) and populates it with test data. It contains a series of CREATE TABLE
, INSERT
, and SELECT
statements that developers can use to quickly set up a local test environment and verify SQL logic.batch-delete.sh
and batch-delete.sql
: These files provide a mechanism for performing batch deletions of specific parameter data from the ParamSets
table in a CockroachDB instance.batch-delete.sql
: Contains the DELETE
SQL statement. It is designed to be edited directly to specify the deletion criteria (e.g., tile_number
, param_key
, param_value
ranges) and the LIMIT
for the number of rows deleted in each batch. This batching approach is crucial for deleting large amounts of data without overwhelming the database or causing long-running transactions.batch-delete.sh
: A shell script that repeatedly executes batch-delete.sql
using the cockroach sql
command-line tool. It runs in a loop with a short sleep interval, allowing for controlled, iterative deletion. This script assumes that a port-forward to the CockroachDB instance is already established. This utility is likely used for data cleanup or maintenance tasks that require removing specific, potentially large, datasets.Migration Workflow (Conceptual):
When Perf starts or when a migration command is explicitly run:
Determine Current Schema Version: The golang-migrate/migrate
library connects to the database and checks the current schema version (often stored in a dedicated migrations table managed by the library itself).
Identify Target Schema Version: This is typically the highest version number found among the migration files for the configured SQL dialect.
Apply Pending Migrations:
- If the current schema version is lower than the target version, the library iteratively executes the `.up.sql` files in ascending order of their version numbers, starting from the version immediately following the current one, up to the target version. - Each successful `.up.` migration updates the schema version in the database. Example: Current Version = 0, Target Version = 2 `DB State (v0) --> Run
0001**.up.sql --> DB State (v1) --> Run 0002**.up.sql --> DB State (v2)`
Rollback Migrations (if needed):
- If a user needs to revert to an older schema version, the library can execute the `.down.sql` files in descending order. Example: Current Version = 2, Target Rollback Version = 0 `DB State (v2) -->
Run 0002**.down.sql --> DB State (v1) --> Run 0001**.down.sql --> DB State (v0)`
The BUILD.bazel
file defines a filegroup
named cockroachdb
which bundles all files under the cockroachdb/
subdirectory. This is likely used by other parts of the Perf build system, perhaps to package these migration scripts or make them accessible to the Perf application when it needs to perform migrations.
The modules
directory contains a collection of frontend TypeScript modules that constitute the building blocks of the Perf web application's user interface. These modules primarily define custom HTML elements (web components) and utility functions for various UI functionalities, data processing, and interaction with backend services. The architecture emphasizes modularity, reusability, and a component-based approach, largely leveraging the Lit library for creating custom elements and elements-sk
for common UI widgets.
The design philosophy encourages separation of concerns:
dataframe
and progress
manage data fetching, processing, and state.paramtools
, pivotutil
, cid
, and trybot
provide common functionalities for data manipulation, key parsing, and specific calculations.themes
module ensures a consistent visual appearance, building upon infra-sk
's theming capabilities.json
module defines TypeScript interfaces that mirror backend Go structures, ensuring type safety in client-server communication.This modular structure aims to create a maintainable and scalable frontend codebase. Each module typically includes its core logic, associated styles, demo pages for isolated development and testing, and unit/integration tests.
A significant portion of the modules is dedicated to creating custom HTML elements that serve as interactive UI components. These elements often encapsulate complex behavior and interactions, simplifying their use in higher-level page components.
Data Visualization and Interaction:
plot-simple-sk
: A custom-built canvas-based plotting element for rendering interactive line graphs, optimized for performance with features like dual canvases, Path2D objects, and k-d trees for point proximity.plot-google-chart-sk
: An alternative plotting element that wraps the Google Charts library, offering a rich set of features and interactivity like panning, zooming, and trace visibility toggling.plot-summary-sk
: Displays a summary plot (often using Google Charts) and allows users to select a range, which is useful for overview and drill-down scenarios.chart-tooltip-sk
: Provides a detailed, interactive tooltip for data points on charts, showing commit information, anomaly details, and actions like bisection or requesting traces.graph-title-sk
: Displays a structured title for graphs, showing key-value parameter pairs associated with the plotted data.word-cloud-sk
: Visualizes key-value pairs and their frequencies as a textual list with proportional bars.Alert and Regression Management:
alert-config-sk
: A UI for creating and editing alert configurations, including query definition, detection algorithms, and notification settings.alerts-page-sk
: A page for viewing, creating, and managing all alert configurations.cluster-summary2-sk
: Displays a detailed summary of a performance cluster, including a plot, statistics, and triage controls.anomalies-table-sk
: Renders a sortable and interactive table of detected performance anomalies, allowing for grouping and bulk actions like triage and graphing.anomaly-sk
: Displays detailed information about a single performance anomaly.triage-status-sk
: A simple button-like element indicating the current triage status of a cluster and allowing users to initiate the triage process.triage-menu-sk
: Provides a menu for bulk triage actions on selected anomalies, including assigning bugs or marking them as ignored.new-bug-dialog-sk
: A dialog for filing new bugs related to anomalies, pre-filling details.existing-bug-dialog-sk
: A dialog for associating anomalies with existing bug reports.user-issue-sk
: Manages the association of user-reported Buganizer issues with specific data points.bisect-dialog-sk
: A dialog for initiating a Pinpoint bisection process to find the commit causing a regression.pinpoint-try-job-dialog-sk
: A (legacy) dialog for initiating Pinpoint A/B try jobs to request additional traces.triage-page-sk
: A page dedicated to viewing and triaging regressions based on time range and filters.regressions-page-sk
: A page for viewing regressions associated with specific “subscriptions” (e.g., sheriff configs).subscription-table-sk
: Displays details of a subscription and its associated alerts.revision-info-sk
: Displays information about anomalies detected around a specific revision.Data Input and Selection:
query-sk
: A comprehensive UI for constructing complex queries by selecting parameters and their values.paramset-sk
: Displays a set of parameters and their values, often used to summarize a query or data selection.query-chooser-sk
: Combines paramset-sk
(for summary) and query-sk
(in a dialog) for a compact query selection experience.query-count-sk
: Shows the number of items matching a given query, fetching this count from a backend endpoint.commit-detail-picker-sk
: Allows users to select a specific commit from a range, typically presented in a dialog with date range filtering.commit-detail-panel-sk
: Displays a list of commit details, making them selectable.commit-detail-sk
: Displays information about a single commit with action buttons.calendar-input-sk
: A date input field combined with a calendar picker dialog.calendar-sk
: A standalone interactive calendar widget.day-range-sk
: Allows selection of a “begin” and “end” date.domain-picker-sk
: Allows selection of a data domain either by date range or by a number of recent commits.test-picker-sk
: A guided, multi-step picker for selecting tests or traces by sequentially choosing parameter values.picker-field-sk
: A text input field with a filterable dropdown menu of predefined options, built using Vaadin ComboBox.algo-select-sk
: A dropdown for selecting a clustering algorithm.split-chart-menu-sk
: A menu for selecting an attribute by which to split a chart.pivot-query-sk
: A UI for configuring pivot table requests (group by, operations, summaries).triage2-sk
: A set of three buttons for selecting a triage status (positive, negative, untriaged).tricon2-sk
: An icon that visually represents one of the three triage states.Data Display and Structure:
pivot-table-sk
: Displays pivoted DataFrame data in a sortable table.json-source-sk
: A dialog for viewing the raw JSON source data for a specific trace point.ingest-file-links-sk
: Displays relevant links (e.g., to Swarming, Perfetto) associated with an ingested data point.point-links-sk
: Displays links from ingestion files and generates commit range links between data points.commit-range-sk
: Dynamically generates a URL to a commit range viewer based on begin and end commits.Scaffolding and Application Structure:
perf-scaffold-sk
: Provides the consistent layout, header, and navigation sidebar for all Perf application pages.explore-simple-sk
: The core element for exploring and visualizing performance data, including querying, plotting, and anomaly interaction.explore-sk
: Wraps explore-simple-sk
, adding features like user authentication, default configurations, and optional integration with test-picker-sk
.explore-multi-sk
: Allows displaying and managing multiple explore-simple-sk
graphs simultaneously, with shared controls and shortcut management.favorites-dialog-sk
: A dialog for adding or editing bookmarked “favorites” (named URLs).favorites-sk
: Displays and manages a user's list of favorites.Backend Interaction and Data Processing Utilities:
cid/cid.ts
: Provides lookupCids
to fetch detailed commit information based on commit numbers.common/plot-builder.ts
& common/plot-util.ts
: Utilities for transforming DataFrame
and TraceSet
data into formats suitable for plotting libraries (especially Google Charts) and for creating consistent chart options.common/test-util.ts
: Sets up mocked API responses (fetch-mock
) for various backend endpoints, facilitating isolated testing and demo page development.const/const.ts
: Defines shared constants, notably MISSING_DATA_SENTINEL
for representing missing data points, ensuring consistency with the backend.csv/index.ts
: Converts DataFrame
objects into CSV format for data export.dataframe/index.ts
& dataframe/dataframe_context.ts
: Core logic for managing and manipulating DataFrame
objects. DataFrameRepository
(a LitElement context provider) handles fetching, caching, merging, and providing DataFrame
and DataTable
objects to consuming components.dataframe/traceset.ts
: Utilities for extracting and formatting information from trace keys within DataFrames/DataTables, such as generating chart titles and legends.errorMessage/index.ts
: A wrapper around elements-sk
's errorMessage
to display persistent error messages by default.json/index.ts
: Contains TypeScript interfaces and types that define the structure of JSON data exchanged with the backend, crucial for type safety and often auto-generated from Go structs.paramtools/index.ts
: Client-side utilities for creating, parsing, and manipulating ParamSet
objects and structured trace keys (e.g., makeKey
, fromKey
, queryFromKey
).pivotutil/index.ts
: Utilities for validating pivot table requests (pivot.Request
) and providing descriptions for pivot operations.progress/progress.ts
: Implements startRequest
for initiating and polling the status of long-running server-side tasks, providing progress updates to the UI.trace-details-formatter/traceformatter.ts
: Provides TraceFormatter
implementations (default and Chrome-specific) for converting trace parameter sets to display strings and vice-versa for querying.trybot/calcs.ts
: Calculates and aggregates stddevRatio
values from Perf trybot results, grouping them by parameter to identify performance impacts.trybot-page-sk
: A page for analyzing performance regressions based on commit or trybot run, using trybot/calcs
for analysis.window/index.ts
: Utilities related to the browser window
object, including parsing build tag information from window.perf.image_tag
.Core Architectural Patterns:
ElementSk
from infra-sk
.lit-html
) and reactive updates.stateReflector
(from infra-sk
) is frequently used to synchronize component state with URL query parameters, enabling bookmarking and shareable views (e.g., alerts-page-sk
, explore-simple-sk
, triage-page-sk
).@lit/context
) are used for providing shared data down the component tree without prop drilling, notably in dataframe/dataframe_context.ts
for DataFrame
objects.query-sk
emits query-change
, triage-status-sk
emits start-triage
).fetch
API is used for backend communication. Promises and async/await
are standard for handling these asynchronous operations. Spinners (spinner-sk
) provide user feedback during loading.BUILD.bazel
files. This allows for better organization and easier maintenance.*-demo.html
, *-demo.ts
) for isolated development and visual testing, Karma unit tests (*_test.ts
), and Puppeteer end-to-end/screenshot tests (*_puppeteer_test.ts
). fetch-mock
is extensively used in demos and tests to simulate backend responses.This comprehensive set of modules forms a rich ecosystem for building and maintaining the Perf application's frontend, with a strong emphasis on modern web development practices and reusability.
The alert
module is responsible for validating the configuration of alerts within the Perf system. Its primary function is to ensure that alert definitions adhere to a set of predefined rules, guaranteeing their proper functioning and preventing errors. This module plays a crucial role in maintaining the reliability of the alerting system by catching invalid configurations before they are deployed.
The core design principle behind this module is simplicity and focused responsibility. Instead of incorporating complex validation logic directly into other parts of the system (like the UI or backend services that handle alert creation/modification), this module provides a dedicated, reusable validation function. This promotes modularity and makes the validation logic easier to maintain and update.
The choice of using a simple function (validate
) that returns a string (empty for valid, error message for invalid) is intentional. This approach is straightforward to understand and integrate into various parts of the application. It avoids throwing exceptions for validation failures, which can sometimes complicate control flow, and instead provides clear, human-readable feedback.
The current validation is intentionally minimal, focusing on the essential requirement of a non-empty query. This is a pragmatic approach, starting with the most critical validation and allowing for the addition of more complex rules as the system evolves. The dependency on //perf/modules/json:index_ts_lib
indicates that the structure of an Alert
is defined externally, and this module consumes that definition.
index.ts
: This is the central file of the module.Alert
configurations.validate(alert: Alert): string
function:Alert
object (as defined in the ../json
module) as input.alert
object. Currently, it verifies that the query
property of the Alert
is present and not an empty string.Alert
configuration is valid. If any check fails, it returns a string containing a descriptive error message indicating why the Alert
is considered invalid. This message is intended to be user-friendly and help in correcting the configuration.Alert Validation Workflow:
External System (e.g., UI, API) -- Passes Alert object --> [alert/index.ts: validate()] | V [ Is alert.query non-empty? ] | +--------------------------+--------------------------+ | (Yes) | (No) V V [ Returns "" (empty string) ] [ Returns "An alert must have a non-empty query." ] | | V V External System <-- Receives validation result -- [ Interprets result (valid/invalid) ]
This workflow illustrates how an external system would interact with the validate
function. The external system provides an Alert
object, and the validate
function returns a string. The external system then uses this string to determine if the alert configuration is valid and can proceed accordingly (e.g., save the alert, display an error to the user).
The alert-config-sk
module provides a custom HTML element, <alert-config-sk>
, designed for creating and editing alert configurations within the Perf application. This element serves as a user interface for defining the conditions under which an alert should be triggered, how regressions are detected, and where notifications should be sent.
Core Functionality and Design:
The primary goal of alert-config-sk
is to offer a comprehensive yet user-friendly way to manage alert settings. It encapsulates all the necessary input fields and logic for defining an Alert
object, which is a central data structure in Perf for representing alert configurations.
Key design considerations include:
window.perf.notifications
, window.perf.display_group_by
, window.perf.need_alert_action
). This allows the same component to present different options depending on the specific Perf instance‘s configuration or the user’s context. For example, the notification options (email vs. issue tracker) and the visibility of “Group By” settings can change._config
object, and changes to the element's properties (like config
, paramset
) trigger re-renders.query-chooser-sk
for selecting traces, algo-select-sk
for choosing clustering algorithms, and various elements-sk
components (e.g., select-sk
, multi-select-sk
, checkbox-sk
) for standard UI inputs. This promotes consistency and reduces redundant code.Key Components and Files:
alert-config-sk.ts
: This is the heart of the module, defining the AlertConfigSk
class which extends ElementSk
.config
: An Alert
object representing the current alert configuration being edited. This is the primary data model for the component.paramset
: A ParamSet
object providing the available parameters and their values for constructing queries (used by query-chooser-sk
).key_order
: An array of strings dictating the preferred order of keys in the query-chooser-sk
.template
static method): Uses lit-html
to define the structure and content of the element. It dynamically renders sections based on the current configuration and global settings (e.g., window.perf.notifications
).query-change
from query-chooser-sk
, selection-changed
from select-sk
) to update the _config
object.thresholdDescriptors
object maps step detection algorithms to their corresponding units and descriptive labels, ensuring the “Threshold” input field is always relevant.?
operator in lit-html or if
statements in helper functions like _groupBy
) is used to show/hide UI elements based on window.perf
flags.testBugTemplate()
: Sends a POST
request to /_/alert/bug/try
to test the configured bug URI template.testAlert()
: Sends a POST
request to /_/alert/notify/try
to test the alert notification setup.toDirection()
, toConfigState()
: Convert string values from UI selections to the appropriate enum types for the Alert
object.indexFromStep()
: Determines the correct selection index for the “Step Detection” dropdown based on the current _config.step
value.alert-config-sk.scss
: Contains the SASS styles for the element, ensuring a consistent look and feel within the Perf application. It imports styles from themes_sass_lib
and buttons_sass_lib
for theming and button styling.alert-config-sk-demo.html
and alert-config-sk-demo.ts
: Provide a demonstration page for the alert-config-sk
element.alert-config-sk
and buttons to manipulate global window.perf
settings, allowing developers to test different UI states of the component.paramset
and config
data, and provides event listeners for the control buttons to refresh the alert-config-sk
component and display its current state. This is crucial for development and testing.alert-config-sk_puppeteer_test.ts
: Contains Puppeteer tests for the component. These tests verify that the component renders correctly in different states (e.g., with/without group_by, different notification options) by interacting with the demo page and taking screenshots.index.ts
: A simple entry point that imports and thereby registers the alert-config-sk
custom element, making it available for use in HTML.Workflow Example: Editing an Alert
Initialization:
alert-config-sk
is added to the DOM.paramset
property is set, providing the available trace parameters.config
property is set with the Alert
object to be edited (or a default new configuration).window.perf
settings influence which UI sections are initially visible.User Interaction:
query-chooser-sk
), Grouping (via algo-select-sk
), Step Detection, Threshold, etc.select-sk
):selection-changed
).alert-config-sk
listens for this event.alert-config-sk.ts
updates the corresponding property in its internal _config
object (e.g., this._config.step = newStepValue
).User interacts with <select-sk id="step"> | V <select-sk> emits 'selection-changed' event | V AlertConfigSk.stepSelectionChanged(event) is called | V this._config.step is updated | V this._render() is (indirectly) called by Lit | V UI updates, e.g., label for "Threshold" input changes
Testing Configuration (Optional):
AlertConfigSk.testBugTemplate()
is called./_/alert/bug/try
.AlertConfigSk.testAlert()
is called./_/alert/notify/try
.Saving Changes:
alert-config-sk
is responsible for retrieving the updated config
object from the alert-config-sk
element (e.g., element.config
) and persisting it (e.g., by sending it to a backend API). alert-config-sk
itself does not handle the saving of the configuration to a persistent store.This element aims to simplify the complex task of configuring alerts by providing a structured and reactive interface, abstracting away the direct manipulation of the underlying Alert
JSON object for the end-user.
The alerts-page-sk
module provides a user interface for managing and configuring alerts within the Perf application. Users can view, create, edit, and delete alert configurations. The page displays existing alerts in a table and provides a dialog for detailed configuration of individual alerts. It interacts with a backend API to fetch and persist alert data.
Why a dedicated page for alerts? Centralizing alert management provides a clear and focused interface for users responsible for monitoring performance metrics. This separation of concerns simplifies the overall application structure and user experience.
How are alerts displayed and managed? Alerts are displayed in a tabular format, offering a quick overview of key information like name, query, owner, and status. Icons are used for common actions like editing and deleting, enhancing usability. A modal dialog, utilizing the <dialog>
HTML element and the alert-config-sk
component, is employed for focused editing of individual alert configurations. This approach avoids cluttering the main page and provides a dedicated space for detailed settings.
Why use Lit for templating? Lit is used for its efficient rendering and component-based architecture. This allows for a declarative way to define the UI and manage its state, making the code more maintainable and easier to understand. The use of html
tagged template literals provides a clean and JavaScript-native way to write templates.
How is user authorization handled? The page checks if the logged-in user has an ‘editor’ role. This is determined by fetching the user‘s status from /_/login/status
. Editing and creation functionalities are disabled if the user lacks the necessary permissions, preventing unauthorized modifications. The logged-in user’s email is also pre-filled as the owner for new alerts.
Why is fetch-mock
used in the demo? fetch-mock
is utilized in the demo (alerts-page-sk-demo.ts
) to simulate backend API responses. This allows for isolated testing and development of the frontend component without requiring a running backend. It enables developers to define expected responses for various API endpoints, facilitating a predictable environment for UI development and testing.
How are API interactions handled? The component uses the fetch
API to communicate with the backend. Helper functions like jsonOrThrow
and okOrThrow
are used to simplify response handling and error management. Specific endpoints are used for listing (/_/alert/list/...
), creating (/_/alert/new
), updating (/_/alert/update
), and deleting (/_/alert/delete/...
) alerts.
Why distinguish between “Alert” and “Component” in the UI? The UI adapts to display either an “Alert” field or an “Issue Tracker Component” field based on the window.perf.notifications
global setting. This allows the application to integrate with different notification systems. If markdown_issuetracker
is configured, it links directly to the relevant issue tracker component.
alerts-page-sk.ts
: This is the core TypeScript file defining the AlertsPageSk
custom element.
connectedCallback()
: Initializes the component by fetching initial data (paramset and alert list).list()
: Fetches and re-renders the list of alerts.add()
: Initiates the creation of a new alert by fetching a default configuration from the server and opening the edit dialog.edit()
: Opens the edit dialog for an existing alert.accept()
: Handles the submission of changes from the edit dialog, sending an update request to the server.delete()
: Sends a request to the server to delete an alert.openOnLoad()
: Checks the URL for an alert ID on page load and, if present, opens the edit dialog for that specific alert. This allows for direct linking to an alert's configuration.alerts
: An array holding the currently displayed alert configurations._cfg
: The Alert
object currently being edited in the dialog.isEditor
: A boolean indicating if the current user has editing privileges.dialog
: A reference to the HTML <dialog>
element used for editing.alertconfig
: A reference to the alert-config-sk
element within the dialog.alerts-page-sk.scss
: Contains the SASS/CSS styles for the alerts-page-sk
element.
alerts-page-sk-demo.ts
: Provides a demonstration and development environment for the alerts-page-sk
component.
fetch-mock
to simulate backend API responses for /login/status
, /_/count/
, /_/alert/update
, /_/alert/list/...
, /_/initpage/
, and /_/alert/new
. This allows the component to be developed and tested in isolation.window.perf
properties that might affect the component's behavior (e.g., key_order
, display_group_by
, notifications
).alerts-page-sk
elements into the demo HTML page.alerts-page-sk-demo.html
: The HTML structure for the demo page.
alerts-page-sk
component is rendered for demonstration purposes. Includes an <error-toast-sk>
for displaying error messages.alerts-page-sk_puppeteer_test.ts
: Contains Puppeteer tests for the alerts-page-sk
component.
index.ts
: A simple entry point that imports and thereby registers the alerts-page-sk
custom element.
1. Viewing Alerts:
User navigates to the alerts page | V alerts-page-sk.connectedCallback() | +----------------------+ | | V V fetch('/_/initpage/') fetch('/_/alert/list/false') // Fetch paramset and initial alert list | | V V Update `paramset` Update `alerts` array | | +----------------------+ | V _render() // Lit renders the table with alerts
2. Creating a New Alert:
User clicks "New" button (if isEditor === true) | V alerts-page-sk.add() | V fetch('/_/alert/new') // Get a template for a new alert | V Update `cfg` with the new alert template (owner set to current user) | V dialog.showModal() // Show the alert-config-sk dialog | V User fills in alert details in alert-config-sk | V User clicks "Accept" | V alerts-page-sk.accept() | V cfg = alertconfig.config // Get updated config from alert-config-sk | V fetch('/_/alert/update', { method: 'POST', body: JSON.stringify(cfg) }) // Send new alert to backend | V alerts-page-sk.list() // Refresh the alert list
3. Editing an Existing Alert:
User clicks "Edit" icon next to an alert (if isEditor === true) | V alerts-page-sk.edit() with the selected alert's data | V Set `origCfg` (deep copy of current `cfg`) Set `cfg` to the selected alert's data | V dialog.showModal() // Show the alert-config-sk dialog pre-filled with alert data | V User modifies alert details in alert-config-sk | V User clicks "Accept" | V alerts-page-sk.accept() | V cfg = alertconfig.config // Get updated config | V IF JSON.stringify(cfg) !== JSON.stringify(origCfg) THEN fetch('/_/alert/update', { method: 'POST', body: JSON.stringify(cfg) }) // Send updated alert | V alerts-page-sk.list() // Refresh list ENDIF
4. Deleting an Alert:
User clicks "Delete" icon next to an alert (if isEditor === true) | V alerts-page-sk.delete() with the selected alert's ID | V fetch('/_/alert/delete/{alert_id}', { method: 'POST' }) // Send delete request | V alerts-page-sk.list() // Refresh the alert list
5. Toggling “Show Deleted Configs”:
User clicks "Show deleted configs" checkbox | V alerts-page-sk.showChanged() | V Update `showDeleted` property based on checkbox state | V alerts-page-sk.list() // Fetches alerts based on the new `showDeleted` state
The algo-select-sk
module provides a custom HTML element that allows users to select a clustering algorithm. This component is crucial for applications where different clustering approaches might yield better results depending on the data or the analytical goal.
The core purpose of this module is to present a user-friendly way to switch between available clustering algorithms, specifically “k-means” and “stepfit”. It encapsulates the selection logic and emits an event when the chosen algorithm changes, allowing other parts of the application to react accordingly.
The “why” behind this module is the need for a standardized and reusable UI component for algorithm selection. Instead of each part of an application implementing its own dropdown or radio buttons for algorithm choice, algo-select-sk
provides a consistent look and feel.
The “how” involves leveraging the select-sk
custom element from the elements-sk
library to provide the actual dropdown functionality. algo-select-sk
builds upon this by:
algo
attribute (and corresponding property) to store and reflect the currently selected algorithm.algo-change
event with the new algorithm in the detail
object. This decoupling allows other components to listen for changes without direct dependencies on algo-select-sk
.The choice to use select-sk
as a base provides a consistent styling and behavior aligned with other elements in the Skia infrastructure.
algo-select-sk.ts
: This is the heart of the module.AlgoSelectSk
class: This ElementSk
subclass defines the custom element's behavior.template
: Uses lit-html
to render the underlying select-sk
element with predefined div
elements representing the algorithm options (“K-Means” and “Individual” which maps to “stepfit”). The selected
attribute on these divs is dynamically updated based on the current algo
property.connectedCallback
and attributeChangedCallback
: Ensure the element renders correctly when added to the DOM or when its algo
attribute is changed programmatically._selectionChanged
method: This is the event handler for the selection-changed
event from the inner select-sk
element. When triggered, it updates the algo
property of algo-select-sk
and then dispatches the algo-change
custom event. This is the primary mechanism for communicating the selected algorithm to the outside world. User interacts with <select-sk> | V <select-sk> emits 'selection-changed' event | V AlgoSelectSk._selectionChanged() is called | V Updates internal 'algo' property | V Dispatches 'algo-change' event with { algo: "new_value" }
algo
getter/setter: Provides a programmatic way to get and set the selected algorithm. The setter ensures that only valid algorithm values (‘kmeans’ or ‘stepfit’) are set, defaulting to ‘kmeans’ for invalid inputs. This adds a layer of robustness.toClusterAlgo
function: A utility function to validate and normalize the input string to one of the allowed ClusterAlgo
types. This prevents invalid algorithm names from being propagated.AlgoSelectAlgoChangeEventDetail
interface: Defines the structure of the detail
object for the algo-change
event, ensuring type safety for event consumers.algo-select-sk.scss
: Provides minimal styling, primarily ensuring that the cursor is a pointer when hovering over the element, indicating interactivity. It imports shared color and theme styles.index.ts
: A simple entry point that imports algo-select-sk.ts
, ensuring the custom element is defined and available for use when the module is imported.algo-select-sk-demo.html
and algo-select-sk-demo.ts
: These files provide a demonstration page for the algo-select-sk
element.algo-select-sk
, including one with a pre-selected algorithm and one in dark mode, to showcase its appearance.algo-change
event from one of the instances and displays the event detail in a <pre>
tag. This serves as a live example of how to consume the event.algo-select-sk_puppeteer_test.ts
: Contains Puppeteer tests to verify the component renders correctly and basic functionality. It checks for the presence of the elements on the demo page and takes a screenshot for visual regression testing.The component is designed to be self-contained and easy to integrate. By simply including the element in HTML and listening for the algo-change
event, developers can incorporate algorithm selection functionality into their applications.
anomalies-table-sk
)The anomalies-table-sk
module provides a custom HTML element for displaying a sortable and interactive table of performance anomalies. Its primary purpose is to present anomaly data in a clear, actionable format, allowing users to quickly identify, group, triage, and investigate performance regressions or improvements.
Anomaly
objects in a tabular format. Each row represents an anomaly and displays key information such as bug ID, revision range, test path, and metrics like delta percentage and absolute delta.triage-menu-sk
to allow users to assign bug IDs, mark anomalies as invalid or ignored, or reset their triage state./u/?anomalyIDs=...
)./m/...
)./u/?bugID=...
).groupAnomalies
method iterates through the anomaly list, merging anomalies into existing groups if their revision ranges intersect, or creating new groups otherwise.sort-sk
element, which observes changes to data attributes on the table rows. This avoids server roundtrips for simple sorting operations.this._render()
) only when necessary, such as when data changes, groups are expanded/collapsed, or selections are updated. This improves performance.AnomalyGroup
Class: A simple AnomalyGroup
class is used to manage collections of related anomalies and their expanded state. This provides a clear structure for handling grouped data.showPopup
boolean property.anomalies_checked
when the selection state of an anomaly changes. This allows parent components or other parts of the application to react to user selections./_anomalies/group_report
backend API. This API is designed to provide a consolidated view or a shared identifier (sid
) for a group of anomalies, which is then used to construct the graph URL. This is preferred over constructing potentially very long URLs with many individual anomaly IDs.group_report
API to provide context (one week before and after the anomaly) in the graph.ChromeTraceFormatter
to correctly format trace queries for linking to the graph explorer.themes_sass_lib
, buttons_sass_lib
, and select_sass_lib
for a consistent look and feel. Specific styles handle the appearance of regression vs. improvement, expanded rows, and the triage popup.anomalies-table-sk.ts
: This is the core file containing the LitElement class definition for AnomaliesTableSk
. It implements all the logic for rendering the table, handling user interactions, grouping anomalies, and interacting with backend services for triage and graphing.populateTable(anomalyList: Anomaly[])
: The primary method to load data into the table. It triggers grouping and rendering.generateTable()
, generateGroups()
, generateRows()
: Template methods responsible for constructing the HTML structure of the table using lit-html
.groupAnomalies()
: Implements the logic for grouping anomalies based on overlapping revision ranges.openReport()
: Handles the logic for generating a URL to graph the selected anomalies, potentially calling the /_anomalies/group_report
API.togglePopup()
: Manages the visibility of the triage menu popup.anomalyChecked()
: Handles checkbox state changes and updates the checkedAnomaliesSet
.openMultiGraphUrl()
: Constructs the URL for viewing an anomaly's trend in the multi-graph explorer, fetching time range context via an API call.anomalies-table-sk.scss
: Contains the SCSS styles specific to the anomalies table, defining its layout, appearance, and the styling for different states (e.g., improvement, regression, expanded rows).index.ts
: A simple entry point that imports and registers the anomalies-table-sk
custom element.anomalies-table-sk-demo.ts
and anomalies-table-sk-demo.html
: Provide a demonstration page for the component, showcasing its usage with sample data and interactive buttons to populate the table and retrieve checked anomalies. The demo also sets up a global window.perf
object with configuration typically provided by the Perf application environment.1. Displaying and Grouping Anomalies:
[User Action: Page Load with Anomaly Data] | v AnomaliesTableSk.populateTable(anomalyList) | v AnomaliesTableSk.groupAnomalies() |-> For each Anomaly in anomalyList: | |-> Try to merge with existing AnomalyGroup (if revision ranges intersect) | |-> Else, create new AnomalyGroup | v AnomaliesTableSk._render() | v [DOM Update: Table is rendered with grouped anomalies, groups initially collapsed]
2. Selecting and Triaging Anomalies:
[User Action: Clicks checkbox for an anomaly or group] | v AnomaliesTableSk.anomalyChecked() or AnomalySk.toggleChildrenCheckboxes() |-> Updates `checkedAnomaliesSet` |-> Updates header checkbox state if needed |-> Emits 'anomalies_checked' event |-> Enables/Disables "Triage" and "Graph" buttons based on selection | v [User Action: Clicks "Triage" button (if enabled)] | v AnomaliesTableSk.togglePopup() |-> Shows TriageMenuSk popup |-> TriageMenuSk.setAnomalies(checkedAnomalies) | v [User interacts with TriageMenuSk (e.g., assigns bug, marks invalid)] | v TriageMenuSk makes API request (e.g., to /_/triage) | v [Application reloads data or updates table based on triage result]
3. Graphing Selected Anomalies:
[User Action: Selects one or more anomalies] | v [User Action: Clicks "Graph" button (if enabled)] | v AnomaliesTableSk.openReport() | |--> If single anomaly selected: | |-> window.open(`/u/?anomalyIDs={id}`, '_blank') | |--> If multiple anomalies selected: |-> Call fetchGroupReportApi(idString) | |-> POST to /_/anomalies/group_report with anomaly IDs | |-> Receives response with `sid` (shared ID) | |-> window.open(`/u/?sid={sid}`, '_blank')
4. Expanding/Collapsing an Anomaly Group:
[User Action: Clicks expand/collapse button on a group row] | v AnomaliesTableSk.expandGroup(anomalyGroup) |-> Toggles `anomalyGroup.expanded` boolean | v AnomaliesTableSk._render() | v [DOM Update: Rows within the group are shown or hidden]
The anomaly-sk
module provides a custom HTML element <anomaly-sk>
and related functionalities for displaying details about performance anomalies. It's designed to present information about a specific anomaly, including its severity, the affected revision range, and a link to the associated bug report. A key utility function, getAnomalyDataMap
, is also provided to process raw anomaly data into a format suitable for plotting.
Key Responsibilities and Components:
anomaly-sk.ts
: This is the core file defining the <anomaly-sk>
custom element.
ElementSk
and uses the lit-html
library for templating. It accepts an Anomaly
object as a property and dynamically renders a table displaying information like the score before and after the anomaly, percentage change, revision range, improvement status, and bug ID.lookupCids
function from the cid
module to construct a clickable link to the commit range.bug_host_url
property allows customization of the bug tracker URL.formatRevisionRange
method asynchronously fetches commit hashes for the start and end revisions of the anomaly to create a link to the commit range view. If window.perf.commit_range_url
is not defined, it simply displays the revision numbers.getAnomalyDataMap
(function in anomaly-sk.ts
):
plot-simple-sk
. This function bridges the gap between the raw data representation and the visual representation of anomalies on a graph.TraceSet
(a collection of traces), ColumnHeader[]
(representing commit points on the x-axis), an AnomalyMap
(mapping trace IDs and commit IDs to Anomaly
objects), and a list of highlight_anomalies
IDs.TraceSet
. If a trace has anomalies listed in the AnomalyMap
, it then iterates through those anomalies.cid
) with the offset
in the ColumnHeader
. A crucial detail is that if an exact commit ID match isn’t found in the header (e.g., due to a data upload failure for that specific commit), it will associate the anomaly with the next available commit point. This ensures that anomalies are still visualized even if their precise commit data point is missing, rather than being omitted entirely.highlight_anomalies
input.AnomalyData
objects, each containing the x
, y
coordinates, the Anomaly
object itself, and a highlight
flag.Input: TraceSet: { "traceA": [10, 12, 15*], ... } (*value at commit 101) Header: [ {offset: 99}, {offset: 100}, {offset: 101} ] AnomalyMap: { "traceA": { "101": AnomalyObjectA } } HighlightList: [] getAnomalyDataMap | V Output: { "traceA": [ { x: 2, y: 15, anomaly: AnomalyObjectA, highlight: false } ], ... }
anomaly-sk.scss
: This file contains the SCSS styles for the <anomaly-sk>
element.
themes_sass_lib
).th
and td
elements within the anomaly-sk
component.anomaly-sk-demo.html
and anomaly-sk-demo.ts
: These files set up a demonstration page for the <anomaly-sk>
element.
anomaly-sk-demo.html
includes instances of <anomaly-sk>
with different IDs. anomaly-sk-demo.ts
initializes these components with sample Anomaly
data. It also mocks the /_/cid/
API endpoint using fetch-mock
to simulate responses for commit detail lookups, which is crucial for the formatRevisionRange
functionality to work in the demo. Global window.perf
configurations are also set up, as the component relies on them (e.g., commit_range_url
).Test Files (anomaly-sk_test.ts
, anomaly-sk_puppeteer_test.ts
):
anomaly-sk_test.ts
: Contains unit tests for the getAnomalyDataMap
function (verifying its mapping logic, especially the handling of missing commit points) and for static utility methods within AnomalySk
like formatPercentage
and the asynchronous formatRevisionRange
. It uses fetch-mock
to control API responses for CID lookups.anomaly-sk_puppeteer_test.ts
: Contains browser-based integration tests using Puppeteer. It verifies that the demo page renders correctly and takes screenshots for visual regression testing.Workflow for Displaying an Anomaly:
Anomaly
object is passed to the anomaly
property of the <anomaly-sk>
element. <anomaly-sk .anomaly=${someAnomalyObject}></anomaly-sk>
set anomaly()
setter in AnomalySk
is triggered.this.formatRevisionRange()
to asynchronously prepare the revision range display.formatRevisionRange
extracts start_revision
and end_revision
.lookupCids([start_rev_num, end_rev_num])
which makes a POST request to /_/cid/
.window.perf.commit_range_url
is set, it constructs an <a>
tag with the URL populated with the fetched hashes. Otherwise, it just formats the revision numbers as text.TemplateResult
is stored in this._revision
.this._render()
is called, which re-renders the component's template.AnomalySk.template
) displays the table:getPercentChange
).this.revision
template generated in step 3).AnomalySk.formatBug
, potentially linking to this.bugHostUrl
).This module effectively isolates the presentation and data transformation logic related to individual anomalies, making it a maintainable and reusable piece of the Perf frontend. The handling of potentially missing data points in getAnomalyDataMap
shows a robust design choice for dealing with real-world data imperfections.
bisect-dialog-sk
)The bisect-dialog-sk
module provides a user interface element for initiating a bisection process within the Perf application. This is specifically designed to help pinpoint the commit that introduced a performance regression or improvement, primarily for Chrome.
The primary responsibility of this module is to present a dialog to the user, pre-filled with relevant information extracted from a chart tooltip (e.g., when a user identifies an anomaly in a performance graph). It allows the user to confirm or modify these parameters and then submit a request to the backend to start a bisection task.
Performance analysis often involves identifying the exact change that caused a shift in metrics. A manual bisection process can be tedious and error-prone. This dialog streamlines this by:
alogin-sk
module to fetch the logged-in user's email, which is a required parameter for the bisect request.Initialization and Pre-filling:
setBisectInputParams
method is called with details like the testPath
, startCommit
, endCommit
, bugId
, story
, and anomalyId
.User Interaction and Submission:
open()
method displays the modal dialog.postBisect
method is invoked.Request Construction and API Call:
postBisect
gathers the current values from the form fields.testPath
to extract components like the benchmark
, chart
, and statistic
. The logic for deriving chart
and statistic
involves checking the last part of the test name against a predefined list of STATISTIC_VALUES
(e.g., “avg”, “count”).CreateBisectRequest
object is constructed with all the necessary parameters.fetch
call is made to the /_/bisect/create
endpoint with the JSON payload.Response Handling:
errorMessage
, and the dialog remains open, allowing the user to correct any issues or retry.Simplified Bisect Request Workflow:
User Clicks Bisect Trigger (e.g., on chart) | V Calling Code prepares `BisectPreloadParams` | V `bisect-dialog-sk.setBisectInputParams(params)` | V `bisect-dialog-sk.open()` | V Dialog is Displayed (pre-filled) | V User reviews/modifies data & Clicks "Bisect" | V `bisect-dialog-sk.postBisect()` | V `testPath` is parsed (extract benchmark, chart, statistic) | V `CreateBisectRequest` object is built | V `fetch POST /_/bisect/create` with request data | V Handle API Response: - Success -> Close dialog, Show success notification (external) - Error -> Show error message, Keep dialog open
bisect-dialog-sk.ts
: This is the core TypeScript file defining the BisectDialogSk
custom element.
BisectDialogSk
class: Extends ElementSk
and manages the dialog's state, rendering, and interaction logic.BisectPreloadParams
interface: Defines the structure of the initial data passed to the dialog.template
: A lit-html template defining the dialog's HTML structure, including input fields for test path, bug ID, start/end commits, story, and an optional patch. It also includes a close icon, a spinner for loading states, and submit/close buttons.connectedCallback()
: Initializes the element, sets up property upgrades, queries for DOM elements (dialog, form, spinner, button), and attaches an event listener to the form‘s submit event. It also fetches the logged-in user’s status.setBisectInputParams()
: Populates the internal state and input fields with data provided externally.open()
: Shows the modal dialog and ensures the submit button is enabled.closeBisectDialog()
: Closes the dialog.postBisect()
: This is the heart of the submission logic. It:testPath
to extract various components required for the bisect request (like benchmark
, chart
, story
, statistic
). The logic for chart
and statistic
derivation is particularly important here.CreateBisectRequest
payload.POST
request to the /_/bisect/create
endpoint.STATISTIC_VALUES
: A constant array used to determine if the last part of a test name is a statistic (e.g., avg
, min
, max
).bisect-dialog-sk.scss
: Contains the SASS styles for the dialog, ensuring it aligns with the application's theme. It styles the dialog itself, input fields, and the footer elements.
index.ts
: A simple entry point that imports and thus registers the bisect-dialog-sk
custom element.
BUILD.bazel
: Defines the build rules for this module, specifying its dependencies (SASS, TypeScript, other SK elements like alogin-sk
, select-sk
, spinner-sk
, close-icon-sk
) and sources. The dependencies highlight its reliance on common UI components and infrastructure modules for features like login status and error messaging.
ElementSk
): Encapsulating the dialog as a custom element promotes reusability and modularity. It can be easily integrated into different parts of the Perf application where bisection capabilities are needed.lit-html
for Templating: Provides an efficient and declarative way to define the dialog's HTML structure and update it based on its state./_/bisect/create
) is tailored for Chrome's bisection infrastructure. The project: 'chromium'
in the request payload confirms this.jsonOrThrow
and errorMessage
provides a standard way to handle API errors and inform the user.spinner-sk
element gives visual feedback during the asynchronous fetch
operation, improving user experience.calendar-input-sk
)The calendar-input-sk
module provides a user-friendly way to select dates. It combines a standard text input field for manual date entry with a button that reveals a calendar-sk
element within a dialog for visual date picking. This approach offers flexibility for users who prefer typing dates directly and those who prefer a visual calendar interface.
calendar-input-sk.ts
: This is the core file defining the CalendarInputSk
custom element.
<input type="text">
element for direct date input. A pattern
attribute ([0-9]{4}-[0-9]{1,2}-[0-9]{1,2}
) and a title
are used to guide the user on the expected YYYY-MM-DD
format. An error indicator (✗
) is shown if the input doesn't match the pattern.<button>
element, styled with a date-range-icon-sk
, triggers the display of the calendar.<dialog>
element is used to present the calendar-sk
element. This choice simplifies the implementation of modal behavior.openHandler
method is responsible for showing the dialog. It uses a Promise
to manage the asynchronous nature of user interaction with the dialog (either selecting a date or canceling). This makes the event handling logic cleaner and easier to follow.inputChangeHandler
is triggered when the user types into the text field. It validates the input against the defined pattern. If valid, it parses the date string and updates the displayDate
property.calendarChangeHandler
is invoked when a date is selected from the calendar-sk
component within the dialog. It resolves the aforementioned Promise
with the selected date.dialogCancelHandler
is called when the dialog is closed without a date selection (e.g., by pressing the “Cancel” button or the Escape key). It rejects the Promise
.input
custom event (of type CustomEvent<Date>
) is dispatched whenever the selected date changes, whether through the text input or the calendar dialog. This allows parent components to react to date selections.displayDate
property acts as the single source of truth for the currently selected date. Setting this property will update both the text input and the date displayed in the calendar-sk
when it's opened.lit-html
library for templating, providing a declarative way to define the element's structure and efficiently update the DOM.ElementSk
, inheriting common functionalities for Skia custom elements.calendar-input-sk.scss
: This file contains the styling for the calendar-input-sk
element.
--error
, --on-surface
, --surface-1dp
) for theming, allowing the component‘s appearance to adapt to different contexts (like dark mode). The .invalid
class is conditionally displayed based on the input field’s validity state using the :invalid
pseudo-class.index.ts
: This file simply imports and thereby registers the calendar-input-sk
custom element.
calendar-input-sk-demo.html
/ calendar-input-sk-demo.ts
: These files constitute a demonstration page for the calendar-input-sk
element.
<calendar-input-sk>
in various configurations. The TypeScript file initializes these instances, sets initial displayDate
values, and demonstrates how to listen for the input
event. It also shows an example of programmatically setting an invalid value in one of the input fields.1. Selecting a Date via Text Input:
User types "2023-10-26" into text input | V inputChangeHandler in calendar-input-sk.ts | +-- (Input is valid: matches pattern "YYYY-MM-DD") --> Parse "2023-10-26" into a Date object | | | V | Update _displayDate property | | | V | Render component (updates input field's .value) | | | V | Dispatch "input" CustomEvent<Date> | +-- (Input is invalid: e.g., "2023-") --> Do nothing (CSS shows error indicator)
2. Selecting a Date via Calendar Dialog:
User clicks calendar button | V openHandler in calendar-input-sk.ts | V dialog.showModal() is called | V <dialog> with <calendar-sk> is displayed | +-- User selects a date in <calendar-sk> --> <calendar-sk> dispatches "change" event | | | V | calendarChangeHandler in calendar-input-sk.ts | | | V | dialog.close() | | | V | Promise resolves with the selected Date | +-- User clicks "Cancel" button or presses Esc --> dialog dispatches "cancel" event | V dialogCancelHandler in calendar-input-sk.ts | V dialog.close() | V Promise rejects
If Promise resolves (date selected):
openHandler continues after await | V Update _displayDate property with the resolved Date | V Render component (updates input field's .value) | V Dispatch "input" CustomEvent<Date> | V Focus on the text input field
The design emphasizes a clear separation of concerns: the calendar-sk
handles the visual calendar logic, while calendar-input-sk
manages the integration of text input and the dialog presentation. The use of a Promise
in openHandler
simplifies the handling of the asynchronous dialog interaction, leading to more readable and maintainable code.
The calendar-sk
module provides a custom HTML element <calendar-sk>
that displays an interactive monthly calendar. This element was created to address limitations with the native HTML <input type="date">
element, specifically its lack of Safari support and the inability to style the pop-up calendar. Furthermore, it aims to be more themeable and accessible than other existing web component solutions like Elix.
The core philosophy behind calendar-sk
is to provide a user-friendly, accessible, and customizable date selection experience. Accessibility is a key consideration, with design choices informed by WAI-ARIA practices for date pickers. This includes keyboard navigation and appropriate ARIA attributes.
Key Responsibilities and Components:
calendar-sk.ts
: This is the heart of the module, defining the CalendarSk
custom element which extends ElementSk
.lit-html
library for templating, dynamically generating the HTML for the calendar grid. The calendar displays one month at a time.CalendarSk.template
) constructs the overall table structure, including navigation buttons for changing the year and month, and headers for the year and month.CalendarSk.rowTemplate
is responsible for rendering each week (row) of the calendar.CalendarSk.buttonForDateTemplate
creates the individual day buttons. It handles logic for disabling buttons for dates outside the current month and highlighting the selected date and today's date._displayDate
(a JavaScript Date
object) which represents the currently selected or focused date.CalendarDate
class is a helper to simplify comparisons of year, month, and date, as JavaScript Date
objects can be tricky with timezones and direct comparisons.getNumberOfDaysInMonth
and firstDayIndexOfMonth
are used to correctly layout the days within the grid.navigate-before-icon-sk
and navigate-next-icon-sk
) for incrementing/decrementing the month and year. Methods like incYear
, decYear
, incMonth
, and decMonth
handle the logic for updating _displayDate
and re-rendering. A crucial detail in month/year navigation is handling cases where the current day (e.g., 31st) doesn't exist in the target month (e.g., February). In such scenarios, the date is adjusted to the last valid day of the target month.keyboardHandler
method implements navigation using arrow keys (day/week changes) and PageUp/PageDown keys (month changes). This handler is designed to be attached to a parent element (like a dialog or the document) to allow for controlled event handling, especially when multiple keyboard-interactive elements are on a page. When a key is handled, it prevents further event propagation and focuses the newly selected date button.Intl.DateTimeFormat
to display month names and weekday headers according to the specified locale
property or the browser's default locale. The buildWeekDayHeader
method dynamically generates these headers.change
custom event ( CustomEvent<Date>
) whenever a new date is selected by clicking on a day. The event detail contains the selected Date
object.calendar-sk.scss
. It imports styles from //perf/modules/themes:themes_sass_lib
and //elements-sk/modules/styles:buttons_sass_lib
.calendar-sk.scss
: This file contains the SASS/CSS styles for the <calendar-sk>
element. It defines the visual appearance of the calendar grid, buttons, headers, and how selected or “today” dates are highlighted. It relies on CSS variables (e.g., --background
, --secondary
, --surface-1dp
) for theming, allowing the look and feel to be customized by the consuming application.calendar-sk-demo.html
and calendar-sk-demo.ts
: These files set up a demonstration page for the calendar-sk
element.calendar-sk-demo.html
includes instances of the calendar, some in dark mode and one configured for a different locale (zh-Hans-CN
), to showcase its versatility.calendar-sk-demo.ts
initializes these calendar instances, sets their initial displayDate
and locale
, and attaches event listeners to log the change
event. It also demonstrates how to hook up the keyboardHandler
.index.ts
: A simple entry point that imports and thus registers the calendar-sk
custom element, making it available for use in HTML.Key Workflows:
Initialization and Rendering: ElementSk constructor
-> connectedCallback
-> buildWeekDayHeader
-> _render
(calls CalendarSk.template
)
<calendar-sk>
element is added to the DOM, its connectedCallback
is invoked._displayDate
.Date Selection (Click): User clicks on a date button -> dateClick
method -> Updates _displayDate
-> Dispatches change
event with the new Date
-> _render
(to update UI, e.g., highlight new selection)
User clicks a date button. [date button]
--click--> dateClick(event)
| +--> new Date(this._displayDate)
(create copy) | +--> d.setDate(event.target.dataset.date)
(update day) | +--> dispatchEvent(new CustomEvent<Date>('change', { detail: d }))
| +--> this._displayDate = d
| +--> this._render()
Month/Year Navigation (Click): User clicks “Previous Month” button -> decMonth
method -> Calculates new year, monthIndex, and date (adjusting for days in month) -> Updates _displayDate
with the new Date
-> _render
(to display the new month/year)
User clicks "Previous Month" button. `[Previous Month button]` --click--> `decMonth()` | +--> Calculate new year, month, date (adjusting for month boundaries and days in month) | +--> `this._displayDate = new Date(newYear,
newMonthIndex, newDate)| +-->
this._render()`
Keyboard Navigation: User presses “ArrowRight” while calendar (or its container) has focus -> keyboardHandler(event)
-> case 'ArrowRight': this.incDay();
-> incDay
method updates _displayDate
(e.g., from May 21 to May 22) -> this._render()
-> e.stopPropagation(); e.preventDefault();
-> this.querySelector<HTMLButtonElement>('button[aria-selected="true"]')!.focus();
User presses ArrowRight key. keydown event (ArrowRight)
---> keyboardHandler(event)
| + (matches case 'ArrowRight'
) | +--> this.incDay()
| | | +--> `this._displayDate = new Date(year, monthIndex,
date + 1)| | | +-->
this._render()| +-->
event.stopPropagation()| +-->
event.preventDefault()` | +--> Focus the newly selected day button.
The use of zero-indexed months (monthIndex
) internally, as is common with the JavaScript Date
object, is a deliberate choice for consistency with the underlying API, though it requires careful handling to avoid off-by-one errors, especially when calculating things like the number of days in a month.
The chart-tooltip-sk
module provides a custom HTML element, <chart-tooltip-sk>
, designed to display detailed information about a specific data point on a chart. This tooltip is intended to be interactive, offering context-sensitive actions and information relevant to performance monitoring and analysis. It can be triggered by hovering over or clicking on a chart point.
The design philosophy behind this module is to centralize the presentation of complex data point information and related actions. Instead of scattering this logic across various chart implementations, chart-tooltip-sk
encapsulates it, promoting reusability and maintainability. It aims to provide a rich user experience by surfacing relevant details like commit information, anomaly status, bug tracking, and actions like bisection or requesting further traces.
The primary responsibility of chart-tooltip-sk
is to render a tooltip with relevant information and interactive elements based on the data point it's associated with.
Core Functionality & Design Choices:
load()
method is the main entry point for populating the tooltip with data. It accepts various parameters like the trace index, test name, y-value, date, commit position, anomaly details, and bug information. This comprehensive loading mechanism allows the parent charting component (e.g., explore-simple-sk
) to provide all necessary context.fetch_details()
method is responsible for asynchronously retrieving commit details using the /_/cid/
endpoint. This is done to avoid loading all commit details upfront for every point on a chart, which could be performance-intensive._always_show_commit_info
and _skip_commit_detail_display
flags (sourced from window.perf
) allow for configurable display of commit details, catering to different instance needs.anomaly-sk
for consistent formatting of anomaly data.triage-menu-sk
to allow users to triage new anomalies (e.g., create bugs, mark as not a bug).user-issue-sk
to display and manage Buganizer issues linked to a data point (even if it's not a formal anomaly). Users can associate existing bugs or create new ones.bug_host_url
(from window.perf
) is used to construct links to the bug tracking system._show_pinpoint_buttons
is true, typically for Chromium instances) that opens bisect-dialog-sk
. This allows users to initiate a bisection to find the exact commit that caused a regression._show_pinpoint_buttons
) that opens pinpoint-try-job-dialog-sk
. This is used to request more detailed trace data for a specific commit.point-links-sk
to show relevant links for a data point based on instance configuration (e.g., links to V8 or WebRTC specific commit ranges). This is configured via keys_for_commit_range
and keys_for_useful_links
in window.perf
.show_json_file_display
in window.perf
), it provides a way to view the raw JSON data for the point via json-source-sk
.moveTo()
method handles the dynamic positioning of the tooltip relative to the mouse cursor or the selected chart point. It intelligently adjusts its position to stay within the viewport and avoid overlapping critical chart elements.chart-tooltip-sk.scss
), including themes imported from //perf/modules/themes:themes_sass_lib
.md-elevation
for a Material Design-inspired shadow effect.Key Files:
chart-tooltip-sk.ts
: The core TypeScript file defining the ChartTooltipSk
class, its properties, methods, and HTML template (using lit-html
). This is where the primary logic for data display, interaction handling, and integration with sub-components resides.chart-tooltip-sk.scss
: The SASS file containing the styles for the tooltip element.index.ts
: A simple entry point that imports and registers the chart-tooltip-sk
custom element.chart-tooltip-sk-demo.html
& chart-tooltip-sk-demo.ts
: Files for demonstrating the tooltip's functionality. The demo sets up mock data and fetchMock
to simulate API responses, allowing isolated testing and visualization of the component.BUILD.bazel
: Defines how the element and its demo page are built, including dependencies on other Skia Elements and Perf modules like anomaly-sk
, commit-range-sk
, triage-menu-sk
, etc.Workflow Example: Displaying Tooltip on Chart Point Click (Fixed Tooltip)
User clicks a point on a chart | V Parent Chart Component (e.g., explore-simple-sk) 1. Determines data for the clicked point (coordinates, commit, trace info). 2. Optionally fetches commit details if not already available. 3. Optionally checks its anomaly map for anomaly data. 4. Calls `chartTooltipSk.load(...)` with all relevant data, setting `tooltipFixed = true` and providing a close button action. 5. Calls `chartTooltipSk.moveTo({x, y})` to position the tooltip. | V chart-tooltip-sk 1. `load()` method populates internal properties (_test_name, _y_value, _commit_info, _anomaly, etc.). 2. `_render()` is triggered (implicitly or explicitly). 3. The lit-html template in `static template` is evaluated: - Basic info (test name, value, date) is displayed. - If `commit_info` is present, commit details (author, message, hash) are shown. - If `_anomaly` is present: - Anomaly metrics are displayed. - If `anomaly.bug_id === 0`, `triage-menu-sk` is shown. - If `anomaly.bug_id > 0`, bug ID is shown with an unassociate button. - Pinpoint job links are shown if available. - If `tooltip_fixed` is true: - "Bisect" and "Request Trace" buttons are shown (if configured). - `user-issue-sk` is shown (if not an anomaly). - `json-source-sk` button/link is shown (if configured). - The close icon is visible. 4. Child components like `commit-range-sk`, `point-links-sk`, `user-issue-sk`, `triage-menu-sk` are updated with their respective data. 5. `moveTo()` positions the rendered `div.container` on the screen. | V User interacts with buttons (e.g., "Bisect", "Triage", "Close") | V chart-tooltip-sk or its child components handle the interaction - e.g., clicking "Bisect" calls `openBisectDialog()`, which shows `bisect-dialog-sk`. - e.g., clicking "Close" executes the `_close_button_action` passed during `load()`.
This modular approach ensures that chart-tooltip-sk
is a self-contained, feature-rich component for displaying detailed contextual information and actions related to data points in performance charts.
This module, /modules/cid
, provides functionality for interacting with Commit IDs (CIDs), which are also referred to as CommitNumbers. The primary purpose of this module is to facilitate the retrieval of detailed commit information based on a set of commit numbers and their corresponding sources.
The core functionality revolves around the lookupCids
function. This function is designed to be a simple and efficient way to fetch commit details from a backend endpoint.
Why Asynchronous Operations?
The lookup of commit information involves a network request to a backend service (/_/cid/
). Network requests are inherently asynchronous. Therefore, lookupCids
returns a Promise
. This allows the calling code to continue execution while the commit information is being fetched and to handle the response (or any potential errors) when it becomes available. This non-blocking approach is crucial for maintaining a responsive user interface or efficient server-side processing.
Why JSON for Data Exchange?
JSON (JavaScript Object Notation) is used as the data format for both the request and the response.
cids
(an array of CommitNumber
objects) is serialized into a JSON string and sent in the body of the HTTP POST request. JSON is a lightweight and widely supported format, making it ideal for client-server communication.CIDHandlerResponse
type. The jsonOrThrow
utility (imported from ../../../infra-sk/modules/jsonOrThrow
) is used to parse this JSON response. This utility simplifies error handling by automatically throwing an error if the response is not valid JSON or if the HTTP request itself fails.Why POST Request?
A POST request is used instead of a GET request for sending the cids
. While GET requests are often used for retrieving data, they are typically limited in the amount of data that can be sent in the URL (e.g., through query parameters). Since the number of cids
to look up could be large, sending them in the request body via a POST request is a more robust and scalable approach. The Content-Type: application/json
header informs the server that the request body contains JSON data.
cid.ts
: This is the sole TypeScript file in the module and contains the implementation of the lookupCids
function.lookupCids(cids: CommitNumber[]): Promise<CIDHandlerResponse>
:CommitNumber
objects and asynchronously fetches detailed commit information for each from the /_/cid/
backend endpoint./_/cid/
endpoint.cids
array is converted into a JSON string and included as the request body.Content-Type: application/json
) are set.fetch
API is used to make the network request.jsonOrThrow
. If the request is successful and the response is valid JSON, it resolves the promise with the parsed CIDHandlerResponse
. Otherwise, it rejects the promise with an error.jsonOrThrow
(from ../../../infra-sk/modules/jsonOrThrow
): For robust JSON parsing and error handling.CommitNumber
, CIDHandlerResponse
(from ../json
): These are type definitions that define the structure of the input commit identifiers and the expected response from the backend.The typical workflow for using this module is as follows:
Caller | /modules/cid/cid.ts (lookupCids) | Backend Server (/_/cid/) ---------------------------|----------------------------------|------------------------- 1. Has array of CommitNumber objects. | | 2. Calls `lookupCids(cids)`| | `---------------------->`| | | 3. Serializes `cids` to JSON. | | 4. Creates POST request with JSON body. | `--------------------------->`| 5. Receives POST request. | | 6. Processes `cids`. | | 7. Generates `CIDHandlerResponse`. | `<---------------------------`| 8. Sends JSON response. | 9. Receives response. | | 10. `jsonOrThrow` parses response.| | (Throws error on failure) | | | 11. Receives Promise that | | resolves with | | `CIDHandlerResponse` | | (or rejects with error). `<----------------------`| |
The cluster-lastn-page-sk
module provides a user interface for testing and configuring alert configurations by running them against a recent range of commits. This allows users to “dry run” an alert to see what regressions it would detect before saving it to run periodically.
Core Functionality:
The primary purpose of this module is to facilitate the iterative process of defining effective alert configurations. Instead of deploying an alert and waiting for it to trigger (potentially with undesirable results), users can simulate its behavior on historical data. This helps in fine-tuning parameters like the detection algorithm, radius, sparsity, and interestingness threshold.
Key Components and Files:
cluster-lastn-page-sk.ts
: This is the heart of the module, defining the ClusterLastNPageSk
custom element.
this.state
), the commit range (this.domain
), and the results of the dry run (this.regressions
). It utilizes stateReflector
to potentially persist and restore parts of this state in the URL, allowing users to share specific configurations or test setups.alert-config-dialog
which hosts an alert-config-sk
element).domain-picker-sk
element.run()
method).writeAlert()
method).triage-cluster-dialog
which hosts a cluster-summary2-sk
element)./_/initpage/
and /_/alert/new
respectively./_/dryrun/start
endpoint to initiate the clustering and regression detection process. It uses the startRequest
utility from ../progress/progress
to handle the asynchronous request and display progress./_/alert/update
to save or update it in the backend.lit-html
for templating and dynamically renders the UI based on the current state, including the controls, the progress of a running dry run, and a table of detected regressions. The table displays commit details (commit-detail-sk
) and triage status (triage-status-sk
) for each detected regression.cluster-lastn-page-sk.html
(Demo Page): A simple HTML file that includes the cluster-lastn-page-sk
element and an error-toast-sk
for displaying global error messages. This is primarily used for demonstration and testing purposes.
cluster-lastn-page-sk-demo.ts
: Sets up mock HTTP responses using fetch-mock
for the demo page. This allows the cluster-lastn-page-sk
element to function in isolation without needing a live backend. It mocks endpoints like /_/initpage/
, /_/alert/new
, /_/count/
, and /_/loginstatus/
.
cluster-lastn-page-sk.scss
: Provides the styling for the cluster-lastn-page-sk
element and its dialogs, ensuring a consistent look and feel with the rest of the Perf application. It uses shared SASS libraries for buttons and themes.
Workflow for Testing an Alert Configuration:
Load Page: User navigates to the page.
cluster-lastn-page-sk
fetches initial paramset and a default new alert configuration.User -> cluster-lastn-page-sk cluster-lastn-page-sk -> GET /_/initpage/ (fetches paramset) cluster-lastn-page-sk -> GET /_/alert/new (fetches default alert)
Configure Alert: User clicks the “Configure Alert” button.
alert-config-dialog
) opens, showing alert-config-sk
.state
in cluster-lastn-page-sk
is updated with the new configuration.User --clicks--> "Configure Alert" button cluster-lastn-page-sk --shows--> alert-config-dialog User --interacts with--> alert-config-sk User --clicks--> "Accept" alert-config-sk --updates--> cluster-lastn-page-sk.state
(Optional) Adjust Commit Range: User interacts with domain-picker-sk
to define the number of recent commits or a specific date range for the dry run.
cluster-lastn-page-sk.domain
is updated.Run Dry Run: User clicks the “Run” button.
cluster-lastn-page-sk
constructs a RegressionDetectionRequest
using the current alert state
and domain
./_/dryrun/start
.User --clicks--> "Run" button cluster-lastn-page-sk --creates--> RegressionDetectionRequest cluster-lastn-page-sk --POSTs to--> /_/dryrun/start (with request body) (progress updates via startRequest callback) Backend --processes & clusters--> Backend --sends progress/results--> cluster-lastn-page-sk cluster-lastn-page-sk --updates--> UI (regressions table, status messages)
Review Results: User examines the table of regressions.
triage-cluster-dialog
(showing cluster-summary2-sk
) for more details.Iterate or Save:
- If results are not satisfactory, user goes back to step 2 to adjust the alert configuration and re-runs. - If results are satisfactory, user clicks "Create Alert" (or "Update Alert" if modifying an existing one). - `cluster-lastn-page-sk` sends the current alert `state` to `/_/alert/update`. `User --clicks--> "Create Alert" / "Update Alert"
button cluster-lastn-page-sk --POSTs to--> /_/alert/update (with alert config) Backend --saves/updates alert--> Backend --responds with ID--> cluster-lastn-page-sk cluster-lastn-page-sk --updates--> UI (button text might change to “Update Alert”)`
Design Decisions:
alert-config-sk
, domain-picker-sk
, cluster-summary2-sk
). This promotes modularity, reusability, and separation of concerns.../progress/progress
utility, enhancing user experience.stateReflector
allows parts of the page's state (like the alert configuration) to be encoded in the URL. This is useful for sharing specific test scenarios or bookmarking them.cluster-lastn-page-sk-demo.ts
) heavily relies on fetch-mock
. This enables isolated development and testing of the UI component without a backend dependency, which is crucial for frontend unit/integration tests and local development.The cluster-page-sk
module provides the user interface for Perf's trace clustering functionality. This allows users to identify groups of traces that exhibit similar behavior, which is crucial for understanding performance regressions or improvements across different configurations and tests.
Core Functionality and Design:
The primary goal of this page is to allow users to define a set of traces and then apply a clustering algorithm to them. The “why” behind this is to simplify the analysis of large datasets by grouping related performance changes. Instead of manually inspecting hundreds or thousands of individual traces, users can focus on a smaller number of clusters, each representing a distinct performance pattern.
The “how” involves several key components:
Defining the Scope of Analysis:
commit-detail-picker-sk
. The clustering will typically look at commits before and after this selected point. The state.offset
property stores the selected commit's offset.query-sk
and paramset-sk
. The state.query
holds this query. The query-count-sk
element provides feedback on how many traces match the current query.state.radius
.Clustering Algorithm and Parameters:
algo-select-sk
and stored in state.algo
. The choice of algorithm impacts how clusters are formed and what “similarity” means.state.k
.state.interesting
.state.sparse
) allows users to indicate if the data is sparse, meaning not all traces have data points for all commits. This affects how the clustering algorithm processes missing data.Executing the Clustering and Displaying Results:
start()
method constructs a RegressionDetectionRequest
object containing all the user-defined parameters. This request is sent to the /_/cluster/start
endpoint.progress
utility to manage the asynchronous request. It displays a spinner (spinner-sk
) and status messages (ele.status
, ele.runningStatus
) to keep the user informed. The requestId
property tracks the active request.RegressionDetectionResponse
contains a list of FullSummary
objects. Each FullSummary
represents a discovered cluster. These are rendered using multiple cluster-summary2-sk
elements. This component is responsible for visualizing the details of each cluster, including its member traces and regression information.sort-sk
.State Management:
The cluster-page-sk
component maintains its internal state in a State
object. This includes user selections like the query, commit offset, algorithm, and various parameters. Crucially, this state is reflected in the URL using the stateReflector
utility. This design decision ensures that:
The stateHasChanged()
method is called whenever a piece of the state is modified, triggering the stateReflector
to update the URL and potentially re-render the component.
Key Files and Their Roles:
cluster-page-sk.ts
: This is the main TypeScript file defining the ClusterPageSk
custom element. It orchestrates all the sub-components, manages the application state, handles user interactions (e.g., button clicks, input changes), makes API calls for clustering, and renders the results. It defines the overall layout and logic of the clustering page.cluster-page-sk.html
(inferred, as it's a LitElement): The HTML template is defined within cluster-page-sk.ts
using lit-html
. This template structures the page, embedding various custom elements for commit selection, query building, algorithm choice, and result display.cluster-page-sk.scss
: Provides the specific styling for the cluster-page-sk
element and its layout, ensuring a consistent look and feel.index.ts
: A simple entry point that imports and registers the cluster-page-sk
custom element, making it available for use in HTML.cluster-page-sk-demo.ts
& cluster-page-sk-demo.html
: These files set up a demonstration page for the cluster-page-sk
element. cluster-page-sk-demo.ts
uses fetch-mock
to simulate API responses, allowing the component to be developed and tested in isolation without needing a live backend. This is crucial for rapid development and ensuring the UI behaves correctly under various backend scenarios.State
class (within cluster-page-sk.ts
): Defines the structure of the data that is persisted in the URL and drives the component's behavior. It encapsulates all user-configurable options for the clustering process.Workflow Example: Performing a Cluster Analysis
User Interaction | Component/State Change | Backend Interaction -----------------------------------------|-------------------------------|--------------------- 1. User navigates to the cluster page. | `ClusterPageSk` initializes. | Fetches initial paramset (`/_/initpage/`) | `stateReflector` initializes | | from URL or defaults. | | | 2. User selects a commit. | `commit-detail-picker-sk` | (Potentially fetches commit details if not cached) | emits `commit-selected`. | | `state.offset` updates. | | `stateHasChanged()` called. | | | 3. User types a query (e.g., "config=gpu").| `query-sk` emits | (Potentially `/`_`/count/` to update trace count) | `query-change`. | | `state.query` updates. | | `stateHasChanged()` called. | | | 4. User selects an algorithm (e.g., kmeans).| `algo-select-sk` emits | | `algo-change`. | | `state.algo` updates. | | `stateHasChanged()` called. | | | 5. User adjusts advanced parameters | Input elements update | (K, radius, interestingness). | corresponding `state` props. | | `stateHasChanged()` called. | | | 6. User clicks "Run". | `start()` method is called. | POST to `/_/cluster/start` with `RegressionDetectionRequest` | `requestId` is set. | (This is a long-running request) | Spinner becomes active. | | | 7. Page periodically updates status. | `progress` utility polls for | GET requests to check progress. | updates. | | `ele.runningStatus` updates. | | | 8. Clustering completes. | `progress` utility resolves. | Final response from `/_/cluster/start` (or progress endpoint) | `summaries` array is populated| containing `RegressionDetectionResponse`. | with cluster data. | | `requestId` is cleared. | | Spinner stops. | | | 9. Results are displayed. | `ClusterPageSk` re-renders, | | showing `cluster-summary2-sk` | | elements for each cluster. |
This workflow highlights how user inputs are translated into state changes, which then drive API requests and ultimately update the UI to present the clustering results. The separation of concerns among various sub-components (for query, commit selection, etc.) makes the main cluster-page-sk
element more manageable.
The cluster-summary2-sk
module provides a custom HTML element for displaying detailed information about a cluster of performance test results. This includes visualizing the trace data, showing regression statistics, and allowing users to triage the cluster.
Core Functionality and Design:
The primary purpose of this element is to present a comprehensive summary of a performance cluster. It aims to provide all necessary information for a user to understand the nature of a performance change (regression or improvement) and take appropriate action (e.g., filing a bug, marking it as expected).
Key design considerations include:
plot-simple-sk
element is used to display the centroid trace of the cluster over time. This visual representation helps users quickly grasp the trend and identify the point of change. An “x-bar” can be displayed on the plot to highlight the specific commit where a step change is detected.StepDetection
algorithm used (e.g., ‘absolute’, ‘percent’, ‘mannwhitneyu’). This ensures that the presented information is relevant and interpretable for the specific detection method.commit-detail-panel-sk
allows users to view details of the commit associated with the detected step point or any selected point on the trace plot. This is crucial for correlating performance changes with specific code modifications.notriage
attribute, the element includes a triage2-sk
component. This allows authenticated users with “editor” privileges to set the triage status (e.g., “positive”, “negative”, “untriaged”) and add a message. This functionality is essential for tracking the investigation and resolution of performance issues.word-cloud-sk
element, which displays a summary of the parameters that make up the traces in the cluster. This helps in understanding the common characteristics of the affected tests.commit-range-sk
component allows users to define a range around the detected step or a selected commit, facilitating further investigation within the Perf application.Key Components and Their Roles:
cluster-summary2-sk.ts
: This is the main TypeScript file defining the ClusterSummary2Sk
custom element.ClusterSummary2Sk
class: Extends ElementSk
and manages the element's state, rendering, and event handling.full_summary
, triage
, alert
): These properties receive the core data for the cluster. When full_summary
is set, it triggers the rendering of the plot, statistics, and commit details. The alert
property determines the labels and formatting for regression statistics. The triage
property reflects the current triage state.template
static method): Uses lit-html
to define the element's structure, binding data to various sub-components and display areas.open-keys
: Fired when the “View on dashboard” button is clicked, providing details for opening the explorer.triaged
: Fired when the triage status is updated, containing the new status and the relevant commit information.trace_selected
: Handles events from plot-simple-sk
when a point on the graph is clicked, triggering a lookup for the corresponding commit details.statusClass()
: Determines the CSS class for the regression display based on the severity (e.g., “high”, “low”).permaLink()
: Generates a URL to the triage page focused on the step point.lookupCids()
(static): A static method (delegating to ../cid/cid.ts
) used to fetch commit details based on commit numbers.labelsForStepDetection
: A crucial constant object that maps different StepDetection
algorithm names (e.g., ‘percent’, ‘mannwhitneyu’, ‘absolute’) to specific labels and number formatting functions for the regression statistics. This ensures that the displayed information is meaningful and correctly interpreted for the algorithm used to detect the cluster.cluster-summary2-sk.html
(template, rendered by cluster-summary2-sk.ts
): Defines the visual layout using HTML and embedded custom elements. It uses a CSS grid for positioning the main sections: regression summary, statistics, plot, triage status, commit details, actions, and word cloud.cluster-summary2-sk.scss
: Provides the styling for the element. It defines how different sections are displayed, including styles for regression severity (e.g., red for “high” regressions, green for “low”), button appearances, and responsive behavior (hiding the plot on smaller screens).cluster-summary2-sk-demo.html
and cluster-summary2-sk-demo.ts
: These files set up a demonstration page for the cluster-summary2-sk
element. The .ts
file provides mock data for FullSummary
, Alert
, and TriageStatus
to populate the demo instances of the element. It also demonstrates how to listen for the triaged
and open-keys
custom events.Workflows:
Initialization and Data Display:
full_summary
(containing cluster data and trace frame), alert
(details of the alert that triggered this cluster), and optionally triage
(current triage status) properties to the cluster-summary2-sk
element.set full_summary()
:summary
and frame
data.data-clustersize
).plot-simple-sk
with the centroid trace from summary.centroid
and time labels from frame.dataframe.header
.lookupCids
is called to fetch and display details for the commit at the step point in commit-detail-panel-sk
.set alert()
:labels
used for displaying regression statistics based on alert.step
and labelsForStepDetection
.set triage()
:triageStatus
and re-renders the triage controls.Host Application cluster-summary2-sk ---------------- ------------------- [Set full_summary data] --> Process data | +-> plot-simple-sk (Draws trace) | +-> commit-detail-panel-sk (Shows step commit) | +-> Display stats (regression, size, etc.) [Set alert data] ---------> Update regression labels/formatters [Set triage data] --------> Update triage2-sk state
User Triage:
triage2-sk
(selects status) and the message input field.update()
method is called:ClusterSummary2SkTriagedEventDetail
object is created containing the step_point
(as columnHeader
) and the current triageStatus
.triaged
custom event is dispatched with this detail.triaged
event to persist the triage status.User cluster-summary2-sk Host Application ---- ------------------- ---------------- Selects status ----> [triage2-sk updates value] Types message ----> [Input updates value] Clicks "Update" ---> update() | +-> Creates TriagedEventDetail | +-> Dispatches "triaged" event --> Listens and handles event (e.g., saves to backend)
Viewing on Dashboard:
openShortcut()
method is called:ClusterSummary2SkOpenKeysEventDetail
object is created with the shortcut
ID, begin
and end
timestamps from the frame, and the step_point
as xbar
.open-keys
custom event is dispatched.open-keys
and navigates the user to the explorer view with the provided parameters.User cluster-summary2-sk Host Application ---- ------------------- ---------------- Clicks "View on dash" --> openShortcut() | +-> Creates OpenKeysEventDetail | +-> Dispatches "open-keys" event --> Listens and handles event (e.g., navigates to explorer)
The cluster-summary2-sk
element plays a vital role in the Perf frontend by providing a focused and interactive view for analyzing individual performance regressions or improvements identified through clustering. Its integration with plotting, commit details, and triaging makes it a key tool for performance analysis workflows.
High-level Overview:
The commit-detail-panel-sk
module provides a custom HTML element <commit-detail-panel-sk>
designed to display a list of commit details. It offers functionality to make these commit entries selectable and emits an event when a commit is selected. This component is primarily used in user interfaces where users need to browse and interact with a sequence of commits.
Why and How:
The core purpose of this module is to present commit information in a structured and interactive way. Instead of simply displaying raw commit data, it leverages the commit-detail-sk
element (an external dependency) to render each commit with relevant information like author, message, and a link to the commit.
The design decision to make commits selectable (via the selectable
attribute) enhances user interaction. When a commit is clicked in “selectable” mode, it triggers a commit-selected
custom event. This event carries detailed information about the selected commit, including its index in the list, a concise description, and the full commit object. This allows parent components or applications to react to user selections and perform actions based on the chosen commit (e.g., loading further details, navigating to a specific state).
The implementation uses Lit library for templating and rendering. The commit data is provided via the details
property, which expects an array of Commit
objects (defined in perf/modules/json
). The component dynamically generates table rows for each commit.
The visual appearance is controlled by commit-detail-panel-sk.scss
. It defines styles for the panel, including highlighting the selected row and adjusting opacity based on the selectable
state. The styling aims for a clean and readable presentation of commit information.
A hide
property is also available to conditionally show or hide the entire commit list. This is useful for scenarios where the panel's visibility needs to be controlled dynamically by the parent application.
Key Components/Files:
commit-detail-panel-sk.ts
: This is the heart of the module. It defines the CommitDetailPanelSk
class, which extends ElementSk
.Commit
objects (_details
property).template
and rows
static methods)._click
method).selectable
attribute is present), it dispatches the commit-selected
custom event with relevant commit data.selectable
, selected
, and hide
attributes and their corresponding properties, re-rendering the component when these change.commit-detail-sk
element to display individual commit details within each row.commit-detail-panel-sk.scss
: This file contains the SASS styles for the component.selectable
.--primary
, --surface-1dp
) from //perf/modules/themes:themes_sass_lib
for consistent theming.commit-detail-panel-sk-demo.ts
and commit-detail-panel-sk-demo.html
: These files provide a demonstration page for the component.<commit-detail-panel-sk>
element in an HTML page.details
property and how to listen for the commit-selected
event.index.ts
: A simple entry point that imports and registers the commit-detail-panel-sk
custom element, making it available for use.BUILD.bazel
: Defines how the module is built and its dependencies. For instance, it declares commit-detail-sk
as a runtime dependency and Lit as a TypeScript dependency.commit-detail-panel-sk_puppeteer_test.ts
: Contains Puppeteer tests to verify the component's rendering and basic functionality.Key Workflows:
Initialization and Rendering:
Parent Application --> Sets 'details' property of <commit-detail-panel-sk> with Commit[] | V commit-detail-panel-sk.ts --> _render() is called | V Lit template generates <table> | V For each Commit in 'details': Generates <tr> containing <commit-detail-sk .cid=Commit>
Commit Selection (when selectable
is true): User --> Clicks on a <tr> in the <commit-detail-panel-sk> | V commit-detail-panel-sk.ts --> _click(event) handler is invoked | V Determines the clicked commit's index and data | V Sets 'selected' attribute/property to the index of the clicked commit | V Dispatches 'commit-selected' CustomEvent with { selected: index, description: string, commit: Commit } | V Parent Application --> Listens for 'commit-selected' event and processes the event.detail
The design favors declarative attribute-based configuration (e.g., selectable
, selected
) and event-driven communication for user interactions, which are common patterns in web component development.
The commit-detail-picker-sk
module provides a user interface element for selecting a specific commit from a range of commits. It's designed to be a reusable component that simplifies the process of commit selection within applications that need to interact with commit histories.
Core Functionality and Design:
The primary purpose of commit-detail-picker-sk
is to allow users to browse and select a commit. This is achieved by presenting a button that, when clicked, opens a dialog.
[Button: "Author - Commit Message"] --- (click) ---> [Dialog Opens]
commit-detail-panel-sk
: This submodule is responsible for displaying the list of commits fetched from the backend. Users can click on a commit in this panel to select it.day-range-sk
component allows users to specify a time window for fetching commits. This is crucial for performance and usability, as it prevents loading an overwhelming number of commits at once. When the date range changes, the component automatically fetches the relevant commits. [day-range-sk] -- (date range change) --> [Fetch Commits for New Range] | V [commit-detail-panel-sk updates]
spinner-sk
element provides visual feedback to the user while commits are being fetched, indicating that an operation is in progress.Data Flow and State Management:
/_/cidRange/
endpoint. The request body includes the begin
and end
timestamps of the desired range and optionally the offset
of a currently selected commit (to ensure it's included in the results if it falls outside the new range). User Action (e.g., change date range) | V [commit-detail-picker-sk] | V (Constructs RangeRequest: {begin, end, offset}) POST /_/cidRange/ | V (Receives Commit[] array) [commit-detail-picker-sk] | V (Updates internal 'details' array) [commit-detail-panel-sk] (Re-renders with new commit list)
commit-detail-panel-sk
, the panel emits a commit-selected
event. - commit-detail-picker-sk
listens for this event and updates its internal selected
index. - The dialog is then closed, and the main button‘s text updates to reflect the new selection. - Crucially, commit-detail-picker-sk
itself emits a commit-selected
event. This allows parent components to react to the user’s choice. The detail of this event is of type CommitDetailPanelSkCommitSelectedDetails
, containing information about the selected commit. [commit-detail-panel-sk] -- (internal click on a commit) --> Emits 'commit-selected' (internal) | V [commit-detail-picker-sk] -- (handles internal event) --> Updates 'selected' index Updates button text Closes dialog Emits 'commit-selected' (external)
selection
property): The component exposes a selection
property (of type CommitNumber
). If this property is set externally, the component will attempt to fetch commits around that CommitNumber
and pre-select it in the panel.Key Files and Responsibilities:
commit-detail-picker-sk.ts
: This is the core TypeScript file defining the CommitDetailPickerSk
custom element.commit-detail-panel-sk
, and day-range-sk
. It handles fetching commit data, managing the selection state, and emitting the final commit-selected
event.open()
, close()
), handling range changes (rangeChange()
), updating the commit list (updateCommitSelections()
), and processing selections from the panel (panelSelect()
). The selection
getter/setter allows for programmatic control of the selected commit.commit-detail-picker-sk.scss
: Contains the SASS/CSS styles for the component.--on-background
, --background
).dialog
element, the buttons within it, and ensures proper display and spacing of child components like day-range-sk
.commit-detail-picker-sk-demo.html
& commit-detail-picker-sk-demo.ts
: These files provide a demonstration page for the component.commit-detail-picker-sk
, mocks the backend API call (/_/cidRange/
) using fetch-mock
to provide sample commit data, and sets up an event listener to display the commit-selected
event details.commit-detail-panel-sk
: Used within the dialog to list and allow selection of individual commits. commit-detail-picker-sk
passes the fetched details
(array of Commit
objects) to this panel.day-range-sk
: Used to allow the user to define the time window for which commits should be fetched. Its day-range-change
event triggers a refetch in the picker.spinner-sk
: Provides visual feedback during data loading.ElementSk
: Base class from infra-sk
providing common custom element functionality.jsonOrThrow
: Utility for parsing JSON responses and throwing an error if parsing fails or the response is not OK.errorMessage
: Utility for displaying error messages to the user.The design focuses on encapsulation: the commit-detail-picker-sk
component manages its internal state (current range, fetched commits, selected index) and exposes a clear interface for interaction (a button to open, a selection
property, and a commit-selected
event). This makes it easy to integrate into larger applications that require users to pick a commit from a potentially large history.
The commit-detail-sk
module provides a custom HTML element <commit-detail-sk>
designed to display concise information about a single commit. This element is crucial for user interfaces where presenting commit details in a structured and interactive manner is necessary.
In applications dealing with version control systems, there's often a need to display details of individual commits. This could be for reviewing changes, navigating commit history, or linking to related actions like exploring code changes, viewing clustered data, or triaging issues associated with a commit. The commit-detail-sk
element encapsulates this functionality, offering a reusable and consistent way to present commit information.
The core of the module is the CommitDetailSk
class, which extends ElementSk
. This class defines the structure and behavior of the <commit-detail-sk>
element.
Key Responsibilities and Components:
commit-detail-sk.ts
: This is the heart of the module.
CommitDetailSk
custom element.Commit
object (defined in perf/modules/json
) as input via the cid
property. This object contains details like the commit hash, author, message, timestamp, and URL.template
function, using lit-html
, defines the HTML structure of the element. It displays:diffDate
).cid.url
.openLink
method handles the click events on these buttons, opening the respective links in a new browser window/tab.upgradeProperty
is used to ensure that the cid
property is correctly initialized if it's set before the element is fully connected to the DOM.commit-detail-sk.scss
: This file contains the styling for the <commit-detail-sk>
element.
--blue
, --primary
), allowing the component to adapt to different visual themes (light and dark mode, as demonstrated in the demo).//perf/modules/themes:themes_sass_lib
and //elements-sk/modules:colors_sass_lib
to ensure consistency with the broader application's design system.commit-detail-sk-demo.html
and commit-detail-sk-demo.ts
: These files provide a demonstration page for the <commit-detail-sk>
element.
<commit-detail-sk>
in both light and dark mode contexts.Commit
data. It also simulates a click on the element to potentially reveal more details or actions if such functionality were implemented (though in the current version, the “tip” div with buttons is always visible). The Date.now
function is mocked to ensure consistent output for the diffDate
calculation in the demo and tests.Workflow Example: Displaying Commit Information and Actions
1. Application provides a `Commit` object. e.g., { hash: "abc123...", author: "user@example.com", ... } 2. The `Commit` object is assigned to the `cid` property of a `<commit-detail-sk>` element. <commit-detail-sk .cid=${commitData}></commit-detail-sk> 3. `CommitDetailSk` element renders: [abc123...] - [user@example.com] - [2 days ago] - [Commit message] +----------------------------------------------------------------+ | [Explore] [Cluster] [Triage] [Commit (link to commit source)] | <- Action buttons +----------------------------------------------------------------+ 4. User clicks an action button (e.g., "Explore"). 5. `openLink` method is called with a generated URL (e.g., "/g/e/abc123..."). 6. A new browser tab opens to the specified URL.
This design promotes reusability and separation of concerns. The element focuses solely on presenting commit information and providing relevant action links, making it easy to integrate into various parts of an application that need to display commit details. The use of lit-html
for templating allows for efficient rendering and updates.
The commit-range-sk
module provides a custom HTML element, <commit-range-sk>
, designed to display a link representing a range of commits within a Git repository. This functionality is particularly useful in performance analysis tools where identifying the specific commits that introduced a performance regression or improvement is crucial.
Core Functionality and Design:
The primary purpose of commit-range-sk
is to dynamically generate a URL that points to a commit range viewer (e.g., a Git web interface like Gerrit or GitHub). This URL is constructed based on a “begin” and an “end” commit.
Identifying the Commit Range:
trace
(an array of numerical data points, where each point corresponds to a commit), a commitIndex
(the index within the trace
array that represents the “end” commit of interest), and header
information (which maps trace indices to commit metadata like offset
or commit number).commitIndex
and the header
.commitIndex - 1
in the trace
. It skips over any entries marked with MISSING_DATA_SENTINEL
(indicating commits for which there's no data point) until it finds a valid previous commit.Converting Commit Numbers to Hashes:
window.perf.commit_range_url
, typically requires Git commit hashes (SHAs) rather than internal commit numbers or offsets.commit-range-sk
element uses a commitNumberToHashes
function to perform this conversion.defaultcommitNumberToHashes
, makes an asynchronous call to a backend service (likely /
/cid/``) by invokinglookupCids
from the//perf/modules/cid:cid_ts_lib
module. This service is expected to return the commit hashes corresponding to the provided commit numbers.commitNumberToHashes
with a mock function during testing (as seen in commit-range-sk_test.ts
).URL Construction and Display:
window.perf.commit_range_url
template. This template usually contains placeholders like {begin}
and {end}
which are replaced with the actual commit hashes.<begin_offset + 1> - <end_offset>
”. Otherwise, it will just show the “<end_offset>
”. The +1
for the begin offset in a range is to ensure the displayed range starts after the last known good commit.showLinks
property:showLinks
is false
(default, or when the element is merely hovered over in some UIs), only the text representing the commit(s) is displayed.showLinks
is true
, a fully formed hyperlink (<a>
tag) is rendered.Key Components/Files:
commit-range-sk.ts
: This is the core file defining the CommitRangeSk
custom element.
ElementSk
, a base class for custom elements in the Skia infrastructure._trace
, _commitIndex
, _header
, _url
, _text
, and _commitIds
.recalcLink()
method is central to its operation. It's triggered whenever relevant input properties (trace
, commitIndex
, header
) change. This method orchestrates the process of finding commit IDs, converting them to hashes, and generating the URL and display text.setCommitIds()
implements the logic for determining the start and end commit numbers based on the input trace and header, handling missing data points.lit/html
library for templating, allowing for efficient rendering and updates to the DOM.commit-range-sk-demo.ts
and commit-range-sk-demo.html
: These files provide a demonstration page for the commit-range-sk
element.
commit-range-sk-demo.ts
sets up a mock environment, including mocking the fetch
call to /
/cid/``usingfetch-mock
. This is crucial for demonstrating the element's behavior without needing a live backend.window.perf
object with necessary configuration, such as the commit_range_url
template.<commit-range-sk>
element and populates its properties to showcase its functionality.commit-range-sk_test.ts
: This file contains unit tests for the CommitRangeSk
element.
chai
for assertions and setUpElementUnderTest
for easy instantiation of the element in a test environment.commitNumberToHashes
method on the element instance to provide controlled hash values and assert the correctness of the generated URL and text, especially in scenarios involving MISSING_DATA_SENTINEL
.BUILD.bazel
: Defines how the module is built, its dependencies (e.g., //infra-sk/modules/ElementSk
, //perf/modules/json
, lit
), and how the demo page and tests are structured.
Workflow Example: Generating a Commit Range Link
Initialization:
<commit-range-sk>
sets the global window.perf.commit_range_url
(e.g., "http://example.com/range/{begin}/{end}"
).<commit-range-sk>
element is added to the DOM.Property Setting:
- The application provides data to the element: - `element.trace = [10, MISSING_DATA_SENTINEL, 12, 15];` - `element.header = [{offset: C1}, {offset: C2}, {offset: C3}, {offset: C4}];`(where C1-C4 are commit numbers)
-element.commitIndex = 3;
(points to the data15
and commitC4
)
- `element.showLinks = true;`
recalcLink()
Triggered:
recalcLink()
.Determine Commit IDs (setCommitIds()
):
header[commitIndex].offset
=> C4
.commitIndex - 1 = 2
. trace[2]
is 12
(not missing). So, header[2].offset
=> C3
._commitIds
becomes [C3, C4]
.Check if Range (isRange()
):
C3 + 1 === C4
? Let's assume C3
and C4
are not consecutive (e.g., C3=100
, C4=102
). isRange()
returns true
."${C3 + 1} - ${C4}"
(e.g., "101 - 102"
).Convert Commit IDs to Hashes (commitNumberToHashes
):
- `commitNumberToHashes([C3, C4])` is called. - Internally, this likely makes a POST request to `/`/cid/``with`[C3, C4]`. - Backend returns: `{ commitSlice: [{hash: "hash_for_C3"}, {hash:
“hash_for_C4”}] }`.
["hash_for_C3", "hash_for_C4"]
.Construct URL:
url = window.perf.commit_range_url
(e.g., "http://example.com/range/{begin}/{end}"
)url = url.replace('{begin}', "hash_for_C3")
url = url.replace('{end}', "hash_for_C4")
_url
becomes "http://example.com/range/hash_for_C3/hash_for_C4"
.Render:
- Since `showLinks` is true, the template becomes: `<a
href=“http://example.com/range/hash_for_C3/hash_for_C4” target=“_blank”>101 - 102` - The element updates its content with this HTML.
This workflow demonstrates how commit-range-sk
encapsulates the logic for finding relevant commits, converting their identifiers, and presenting a user-friendly link to explore changes between them, abstracting away the complexities of interacting with commit data and URL templates.
The common
module houses utility functions and data structures that are shared across various parts of the Perf application, particularly those related to data visualization and testing. Its primary purpose is to promote code reuse and maintain consistency in how data is processed and displayed.
The module's responsibilities can be broken down into the following areas:
Plot Data Construction and Formatting:
Why: Visualizing performance data often involves transforming raw data into formats suitable for charting libraries (like Google Charts). This process needs to be standardized to ensure plots are consistent and correctly represent the underlying information.
How:
plot-builder.ts
: This file is central to preparing data for plotting.
convertFromDataframe
: This function is crucial for adapting data organized in a DataFrame
structure (where traces are rows) into a format suitable for Google Charts, which typically expects data in columns. It essentially transposes the TraceSet
. The domain
parameter allows specifying whether the x-axis should represent commit positions, dates, or both, providing flexibility in how time-series data is visualized.
Input DataFrame (TraceSet): TraceA: [val1, val2, val3] TraceB: [valA, valB, valC] Header: [commit1, commit2, commit3] convertFromDataframe (domain='commit') -> Output for Google Chart: ["Commit Position", "TraceA", "TraceB"] [commit1_offset, val1, valA ] [commit2_offset, val2, valB ] [commit3_offset, val3, valC ]
ConvertData
: This function takes a ChartData
object, which is a more abstract representation of plot data (lines with x, y coordinates and labels), and transforms it into the specific array-of-arrays format required by Google Charts. This abstraction allows other parts of the application to work with ChartData
without needing to know the exact details of the charting library's input format.
Input ChartData: xLabel: "Time" lines: { "Line1": [{x: t1, y: v1}, {x: t2, y: v2}], "Line2": [{x: t1, y: vA}, {x: t2, y: vB}] } ConvertData -> Output for Google Chart: ["Time", "Line1", "Line2"] [t1, v1, vA ] [t2, v2, vB ]
mainChartOptions
and SummaryChartOptions
: These functions provide pre-configured option objects for Google Line Charts. They encapsulate common styling and behavior (like colors, axis formatting, tooltip behavior, and null interpolation) to ensure a consistent look and feel for different types of charts (main detail charts vs. summary overview charts). This avoids repetitive configuration and makes it easier to maintain visual consistency. The options are also designed to adapt to the current theme (light/dark mode) by using CSS custom properties.
defaultColors
: A predefined array of colors used for chart series, ensuring a consistent and visually distinct palette.
Plotting Utilities:
Why: Beyond basic data transformation, there are common tasks related to preparing data specifically for plotting, such as associating anomalies with data points and handling missing values.
How:
plot-util.ts
: This file contains helper functions that build upon plot-builder.ts
.
CreateChartDataFromTraceSet
: This function serves as a higher-level constructor for ChartData
. It takes a raw TraceSet
(a dictionary where keys are trace identifiers and values are arrays of numbers), corresponding x-axis labels (commit numbers or dates), the desired x-axis format, and anomaly information. It then iterates through the traces, constructs DataPoint
objects (which include x, y, and any associated anomaly), and organizes them into the ChartData
structure. A key aspect is its handling of MISSING_DATA_SENTINEL
to exclude missing points from the chart data, relying on the charting library's interpolation. It also uses findMatchingAnomaly
to link anomalies to their respective data points.
Input TraceSet: "trace_foo": [10, 12, MISSING_DATA_SENTINEL, 15] xLabels: [c1, c2, c3, c4] Anomalies: { "trace_foo": [{x: c2, y: 12, anomaly: {...}}] } CreateChartDataFromTraceSet -> Output ChartData: lines: { "trace_foo": [ {x: c1, y: 10, anomaly: null}, {x: c2, y: 12, anomaly: {...}}, // Point for c3 is skipped due to MISSING_DATA_SENTINEL {x: c4, y: 15, anomaly: null} ] } ...
findMatchingAnomaly
: A utility to efficiently check if a given data point (identified by its trace key, x-coordinate, and y-coordinate) corresponds to a known anomaly. This is used by CreateChartDataFromTraceSet
to enrich data points with anomaly details.
Test Utilities:
test-util.ts
: This file provides functions to set up a common testing and demo environment.setUpExploreDemoEnv
: This is a comprehensive function that uses fetch-mock
to intercept various API calls that are typically made by Perf frontend components (e.g., explore page, alert details). It returns predefined, static responses for endpoints like /_/login/status
, /_/initpage/...
, /_/count/
, /_/frame/start
, /_/defaults/
, /_/status/...
, /_/cid/
, /_/details/
, /_/shortcut/get
, /_/nextParamList/
, and /_/shortcut/update
.paramSet
data, DataFrame
structures, commit information, and default configurations. This ensures that components relying on these API calls behave predictably in a test or demo environment. The function also checks for a proxy_endpoint
cookie to avoid mocking if a real backend is being proxied for development or demo purposes.The /modules/const
module serves as a centralized repository for constants utilized throughout the Perf UI. Its primary purpose is to ensure consistency and maintainability by providing a single source of truth for values that are shared across different parts of the user interface.
A key design decision behind this module is to manage values that might also be defined in the backend. This avoids potential discrepancies and ensures that frontend and backend systems operate with the same understanding of specific sentinel values or configurations.
The core responsibility of this module is to define and export these shared constants.
One of the key components is the const.ts
file. This file contains the actual definitions of the constants. A notable constant defined here is MISSING_DATA_SENTINEL
.
The MISSING_DATA_SENTINEL
constant (value: 1e32
) is critical for representing missing data points within traces. The backend uses this specific floating-point value to indicate that a sample is absent. The choice of 1e32
is deliberate. JSON, the data interchange format used, does not natively support NaN
(Not a Number) or infinity values (+/- Inf
). Therefore, a valid float32
that has a compact JSON representation and is unlikely to clash with actual data values was chosen. It is imperative that this frontend constant remains synchronized with the MissingDataSentinel
constant defined in the backend Go package //go/vec32/vec
. This synchronization ensures that both the UI and the backend correctly interpret missing data.
Any part of the Perf UI that needs to interpret or display trace data, especially when dealing with potentially incomplete datasets, will rely on this MISSING_DATA_SENTINEL
. For instance, charting libraries or data table components might use this constant to visually differentiate missing points or to exclude them from calculations.
Workflow involving MISSING_DATA_SENTINEL
:
Backend Data Generation --> Data contains MissingDataSentinel
from //go/vec32/vec
| V Data Serialization (JSON) --> 1e32
is used for missing data | V Frontend Data Fetching | V Frontend UI Component (e.g., a chart) | V UI uses MISSING_DATA_SENTINEL
from /modules/const/const.ts
to identify missing points | V Appropriate rendering (e.g., gap in a line chart, specific placeholder in a table)
The /modules/csv
module provides functionality to convert DataFrame
objects, a core data structure representing performance or experimental data, into the Comma Separated Values (CSV) format. This conversion is essential for users who wish to export data for analysis in external tools, spreadsheets, or for archival purposes.
The primary challenge in converting a DataFrame
to CSV lies in representing the potentially sparse and varied parameter sets associated with each trace (data series) in a flat, tabular format. The DataFrame
stores traces indexed by a “trace ID,” which is a string encoding of key-value pairs representing the parameters that uniquely identify that trace.
The conversion process addresses this challenge through a multi-step approach:
Parameter Key Consolidation:
parseIdsIntoParams
function takes an array of trace IDs and transforms each ID string back into its constituent key-value parameter pairs. This is achieved by leveraging the fromKey
function from the //perf/modules/paramtools
module.allParamKeysSorted
function then iterates through all these parsed parameter sets to identify the complete, unique set of all parameter keys present across all traces. These keys are then sorted alphabetically. This sorted list of unique parameter keys will form the initial set of columns in the CSV, ensuring a consistent order and comprehensive representation of all parameters.Pseudocode for parameter key consolidation:
traceIDs = ["key1=valueA,key2=valueB", "key1=valueC,key3=valueD"] parsedParams = {} for each id in traceIDs: parsedParams[id] = fromKey(id) // e.g., {"key1=valueA,key2=valueB": {key1:"valueA", key2:"valueB"}} allKeys = new Set() for each params in parsedParams.values(): for each key in params.keys(): allKeys.add(key) sortedColumnNames = sorted(Array.from(allKeys)) // e.g., ["key1", "key2", "key3"]
Header Row Generation:
dataFrameToCSV
function begins by constructing the header row of the CSV.sortedColumnNames
derived in the previous step.DataFrame
's header
property. Each element in df.header
typically represents a point in time (or a commit, build, etc.), and its timestamp
field is converted into an ISO 8601 formatted date string.Pseudocode for header row generation:
csvHeader = sortedColumnNames for each columnHeader in df.header: csvHeader.push(new Date(columnHeader.timestamp * 1000).toISOString()) csvLines.push(csvHeader.join(','))
Data Row Generation:
df.traceset
(excluding “special_” traces, which are likely internal or metadata traces not intended for direct CSV export):sortedColumnNames
are retrieved. If a trace does not have a value for a particular parameter key, an empty string is used, ensuring that each row has the same number of columns corresponding to the parameter keys.MISSING_DATA_SENTINEL
(defined in //perf/modules/const
) is a special value indicating missing data; this is converted to an empty string in the CSV to represent a null or missing value. Other numerical values are appended directly.Pseudocode for data row generation:
for each traceId, traceData in df.traceset: if traceId starts with "special_": continue traceParams = parsedParams[traceId] rowData = [] for each columnName in sortedColumnNames: rowData.push(traceParams[columnName] or "") // Add parameter value or empty string for each value in traceData: if value is MISSING_DATA_SENTINEL: rowData.push("") else: rowData.push(value) csvLines.push(rowData.join(','))
Final CSV String Assembly:
\n
) to produce the complete CSV string.The design prioritizes creating a CSV that is both human-readable and easily parsable by other tools. By dynamically determining the parameter columns based on the input DataFrame
and sorting them, it ensures that all relevant trace metadata is included in a consistent manner. The explicit handling of MISSING_DATA_SENTINEL
ensures that missing data is represented clearly as empty fields.
The key files in this module are:
index.ts
: This file contains the core logic for the CSV conversion. It houses the parseIdsIntoParams
, allParamKeysSorted
, and the main dataFrameToCSV
functions. It leverages helper functions from //perf/modules/paramtools
for parsing trace ID strings and relies on constants from //perf/modules/const
for identifying missing data.index_test.ts
: This file provides unit tests for the dataFrameToCSV
function. It defines a sample DataFrame
with various scenarios, including different parameter sets per trace and missing data points, and asserts that the generated CSV matches the expected output. This is crucial for ensuring the correctness and robustness of the CSV generation logic.The dependencies on //perf/modules/const
(for MISSING_DATA_SENTINEL
) and //perf/modules/json
(for DataFrame
, ColumnHeader
, Params
types) indicate that this module is tightly integrated with the broader data representation and handling mechanisms of the Perf system. The dependency on //perf/modules/paramtools
(for fromKey
) highlights its role in interpreting the structured information encoded within trace IDs.
The dataframe
module is designed to manage and manipulate time-series data, specifically performance testing traces, within the Perf application. It provides a centralized way to fetch, store, and process trace data, enabling functionalities like visualizing performance trends, identifying anomalies, and managing user-reported issues.
The core idea is to have a reactive data repository that components can consume. This allows for efficient data loading and updates, especially when dealing with large datasets and dynamic time ranges. Instead of each component fetching and managing its own data, they can rely on a shared DataFrameRepository
to handle these tasks. This promotes consistency and reduces redundant data fetching.
dataframe_context.ts
This file defines the DataFrameRepository
class, which acts as the central data store and manager. It‘s implemented as a LitElement (<dataframe-repository-sk>
) that doesn’t render any UI itself but provides data and loading states through Lit contexts.
Why a LitElement with Contexts? Using a LitElement allows easy integration into the existing component-based architecture. Lit contexts (@lit/context
) provide a clean and reactive way for child components to consume the DataFrame
and related information without prop drilling or complex event bus implementations.
Core Functionalities:
Data Fetching:
resetTraces(range, paramset)
: Fetches an initial set of traces based on a time range and a ParamSet
(a set of key-value pairs defining the traces to query). This is typically called when the user defines a new query. User defines query -> explore-simple-sk calls resetTraces() | V DataFrameRepository -> Fetches data from /_/frame/start | V Updates internal _header, _traceset, anomaly, userIssues | V Provides DataFrame, DataTable, AnomalyMap, UserIssueMap via context
extendRange(offsetInSeconds)
: Fetches additional data to extend the current time range, either forwards or backwards. This is used for infinite scrolling or when the user wants to see more data. To improve performance for large range extensions, it slices the requested range into smaller chunks (chunkSize
) and fetches them concurrently. User scrolls/requests more data -> UI calls extendRange() | V DataFrameRepository -> Slices range into chunks if needed | V Fetches data for each chunk from /_/frame/start concurrently | V Merges new data with existing _header, _traceset, anomaly | V Provides updated DataFrame, DataTable, AnomalyMap via context
/_/frame/start
endpoint, sending a FrameRequest
which includes the time range, query (derived from ParamSet
), and timezone.Data Caching and Merging:
_header
(array of ColumnHeader
objects, representing commit points/timestamps) and _traceset
(a TraceSet
object mapping trace keys to their data arrays).MISSING_DATA_SENTINEL
to maintain alignment with the header.Anomaly Management:
AnomalyMap
) along with the trace data.updateAnomalies(anomalies, id)
: Allows merging new anomalies and removing specific anomalies (e.g., when an anomaly is nudged or re-triaged). This uses mergeAnomaly
and removeAnomaly
from index.ts
.User-Reported Issue Management:
getUserIssues(traceKeys, begin, end)
: Fetches user-reported issues (e.g., Buganizer bugs linked to specific data points) from the /_/user_issues/
endpoint for a given set of traces and commit range.updateUserIssue(traceKey, commitPosition, bugId)
: Updates the local cache of user issues, typically after a new issue is filed or an existing one is modified.norm()
) before querying for user issues to ensure issues are found even if the displayed trace is a transformed version of the original.Google DataTable Conversion:
DataFrame
into a google.visualization.DataTable
format using convertFromDataframe
(from perf/modules/common:plot-builder_ts_lib
). This DataTable
is then provided via dataTableContext
and is typically consumed by charting components like <plot-google-chart-sk>
.DataFrameRepository.loadPromise
).State Management:
loading
: A boolean provided via dataframeLoadingContext
to indicate if a data request is in flight._requestComplete
: A Promise that resolves when the current data fetching operation completes. This can be used to coordinate actions that depend on data being available.Contexts Provided:
dataframeContext
: Provides the current DataFrame
object.dataTableContext
: Provides the google.visualization.DataTable
derived from the DataFrame
.dataframeAnomalyContext
: Provides the AnomalyMap
for the current data.dataframeUserIssueContext
: Provides the UserIssueMap
for the current data.dataframeLoadingContext
: Provides a boolean indicating if data is currently being loaded.dataframeRepoContext
: Provides the DataFrameRepository
instance itself, allowing consumers to call its methods (e.g., extendRange
).index.ts
This file contains utility functions for manipulating DataFrame
structures, similar to its Go counterpart (//perf/go/dataframe/dataframe.go
). These functions are crucial for merging, slicing, and analyzing the data.
Key Functions:
findSubDataframe(header, range, domain)
: Given a DataFrame
header and a time/offset range, this function finds the start and end indices within the header that correspond to the given range. This is essential for slicing data.generateSubDataframe(dataframe, range)
: Creates a new DataFrame
containing only the data within the specified index range of the original DataFrame
.mergeAnomaly(anomaly1, ...anomalies)
: Merges multiple AnomalyMap
objects into a single one. If anomalies exist for the same trace and commit, the later ones in the arguments list will overwrite earlier ones. It always returns a non-null AnomalyMap
.removeAnomaly(anomalies, id)
: Creates a new AnomalyMap
excluding any anomalies with the specified id
. This is used when an anomaly is moved or re-triaged on the backend, and the old entry needs to be cleared.findAnomalyInRange(allAnomaly, range)
: Filters an AnomalyMap
to include only anomalies whose commit positions fall within the given commit range.mergeColumnHeaders(a, b)
: Merges two arrays of ColumnHeader
objects, producing a new sorted array of unique headers. It also returns mapping objects (aMap
, bMap
) that indicate the new index of each header from the original arrays. This is fundamental for the join
operation.join(a, b)
: Combines two DataFrame
objects into a new one.mergeColumnHeaders
.traceset
. For each trace in the original DataFrames, it uses the aMap
and bMap
to place the trace data points into the correct slots in the new, longer trace arrays, filling gaps with MISSING_DATA_SENTINEL
.paramset
from both DataFrames.buildParamSet(d)
: Reconstructs the paramset
of a DataFrame
based on the keys present in its traceset
. This ensures the paramset
accurately reflects the data.timestampBounds(df)
: Returns the earliest and latest timestamps present in the DataFrame
's header.traceset.ts
This file provides utility functions for extracting and formatting information from the trace keys within a DataFrame
or DataTable
. Trace keys are strings that encode various parameters (e.g., ",benchmark=Speedometer,test=MotionMark,"
).
Key Functions:
getAttributes(df)
: Extracts all unique attribute keys (e.g., “benchmark”, “test”) present across all trace keys in a DataFrame
.getTitle(dt)
: Identifies the common key-value pairs across all trace labels in a DataTable
. These common pairs form the “title” of the chart, representing what all displayed traces have in common.DataTable
input? This function is often used directly with the DataTable
that feeds a chart, as column labels in the DataTable
are typically the trace keys.getLegend(dt)
: Identifies the key-value pairs that are not common across all trace labels in a DataTable
. These differing parts form the “legend” for each trace, distinguishing them from one another."untitled_key"
for consistency in display.titleFormatter(title)
: Formats the output of getTitle
(an object) into a human-readable string, typically by joining values with ‘/’.legendFormatter(legend)
: Formats the output of getLegend
(an array of objects) into an array of human-readable strings.getLegendKeysTitle(label)
: Takes a legend object (for a single trace) and creates a string by joining its keys, often used as a title for the legend section.isSingleTrace(dt)
: Checks if a DataTable
contains data for only a single trace (i.e., has 3 columns: domain, commit position/date, and one trace).findTraceByLabel(dt, legendTraceId)
: Finds the column label (trace key) in a DataTable
that matches the given legendTraceId
.findTracesForParam(dt, paramKey, paramValue)
: Finds all trace labels in a DataTable
that contain a specific key-value pair.removeSpecialFunctions(key)
: A helper used internally to strip function wrappers (like norm(...)
) from trace keys before processing, ensuring that the underlying parameters are correctly parsed.Design Rationale for Title/Legend Generation: When multiple traces are plotted, the title should reflect what‘s common among them (e.g., “benchmark=Speedometer”), and the legend should highlight what’s different (e.g., “test=Run1” vs. “test=Run2”). These functions automate this process by analyzing the trace keys.
1. User navigates to a page or submits a query. | V 2. <explore-simple-sk> (or similar component) determines initial time range and ParamSet. | V 3. Calls `dataframeRepository.resetTraces(initialRange, initialParamSet)`. | V 4. DataFrameRepository: a. Sets `loading = true`. b. Constructs `FrameRequest`. c. POSTs to `/_/frame/start`. d. Receives `FrameResponse` (containing DataFrame and AnomalyMap). e. Updates its internal `_header`, `_traceset`, `anomaly`. f. Calls `setDataFrame()`: i. Updates `this.dataframe` (triggers `dataframeContext`). ii. Converts DataFrame to `google.visualization.DataTable`. iii. Updates `this.data` (triggers `dataTableContext`). g. Updates `this.anomaly` (triggers `dataframeAnomalyContext`). h. Sets `loading = false`. | V 5. Charting components (consuming `dataTableContext`) re-render with the new data. | V 6. Other UI elements (consuming `dataframeContext`, `dataframeAnomalyContext`) update.
1. User action triggers a request to load more data (e.g., scrolls near edge of chart). | V 2. UI component calls `dataframeRepository.extendRange(offsetInSeconds)`. | V 3. DataFrameRepository: a. Sets `loading = true`. b. Calculates the new time range (`deltaRange`). c. Slices the new range into chunks if `offsetInSeconds` is large (`sliceRange`). d. For each chunk: i. Constructs `FrameRequest`. ii. POSTs to `/_/frame/start`. e. `Promise.all` awaits all chunk responses. f. Filters out empty/error responses and sorts responses by timestamp. g. Merges `header` and `traceset` from sorted responses into existing `_header` and `_traceset`. - For traceset: pads with `MISSING_DATA_SENTINEL` if a trace is missing in a new chunk. h. Merges `anomalymap` from sorted responses into existing `anomaly`. i. Calls `setDataFrame()` (as in initial load). j. Sets `loading = false`. | V 4. Charting components and other UI elements update.
1. Charting component (e.g., <perf-explore-sk>) has access to the `DataTable` via `dataTableContext`. | V 2. It calls `getTitle(dataTable)` and `getLegend(dataTable)` from `traceset.ts`. | V 3. It then uses `titleFormatter` and `legendFormatter` to get displayable strings. | V 4. Renders these strings as the chart title and legend series.
dataframe_context_test.ts
: Tests the DataFrameRepository
class. It uses fetch-mock
to simulate API responses from /_/frame/start
and /_/user_issues/
. Tests cover initialization, data loading (resetTraces
), range extension (extendRange
) with and without chunking, anomaly merging, and user issue fetching/updating.index_test.ts
: Tests the utility functions in index.ts
, such as mergeColumnHeaders
, join
, findSubDataframe
, mergeAnomaly
, etc. It uses manually constructed DataFrame
objects to verify the logic of these data manipulation functions.traceset_test.ts
: Tests the functions in traceset.ts
for extracting titles and legends from trace keys. It generates DataFrame
objects with various key combinations, converts them to DataTable
(requiring Google Chart API to be loaded), and then asserts the output of getTitle
, getLegend
, etc.test_utils.ts
: Provides helper functions for tests, notably:generateFullDataFrame
: Creates mock DataFrame
objects with specified structures, which is invaluable for setting up consistent test scenarios.generateAnomalyMap
: Creates mock AnomalyMap
objects linked to a DataFrame
.mockFrameStart
: A utility to easily mock the /_/frame/start
endpoint with fetch-mock
, returning parts of a provided full DataFrame
based on the request's time range.mockUserIssues
: Mocks the /_/user_issues/
endpoint.The testing strategy relies heavily on creating controlled mock data and API responses to ensure that the data processing and fetching logic behaves as expected under various conditions.
The day-range-sk
module provides a custom HTML element for selecting a date range. It allows users to pick a “begin” and “end” date, which is a common requirement in applications that deal with time-series data or event logging.
The primary goal of this module is to offer a user-friendly way to define a time interval. It achieves this by composing two calendar-input-sk
elements, one for the start date and one for the end date. This design choice leverages an existing, well-tested component for date selection, promoting code reuse and consistency.
Key Components and Responsibilities:
day-range-sk.ts
: This is the core file defining the DayRangeSk
custom element.
ElementSk
, a base class for custom elements, providing lifecycle callbacks and rendering capabilities.lit-html
library for templating, rendering two calendar-input-sk
elements labeled “Begin” and “End”.begin
and end
dates are stored as attributes (and corresponding properties) representing Unix timestamps in seconds. This is a common and unambiguous way to represent points in time.calendar-input-sk
element fires an input
event (signifying a date change), the DayRangeSk
element updates its corresponding begin
or end
attribute and then dispatches a custom event named day-range-change
.day-range-change
event's detail
object contains the begin
and end
timestamps, allowing parent components to easily consume the selected range.begin
and end
are set if not provided: begin
defaults to 24 hours before the current time, and end
defaults to the current time. This provides a sensible initial state.connectedCallback
and attributeChangedCallback
are used to ensure the element renders correctly when added to the DOM or when its attributes are modified.day-range-sk.scss
: This file contains the styling for the day-range-sk
element.
themes.scss
) and defines specific styles for the labels and input fields within the day-range-sk
component, ensuring they adapt to light and dark modes.day-range-sk-demo.html
and day-range-sk-demo.ts
: These files provide a demonstration page for the day-range-sk
element.
day-range-sk
with different initial begin
and end
attributes.day-range-change
event from these instances and displays the event details in a <pre>
tag, demonstrating how to retrieve the selected date range.day-range-sk_puppeteer_test.ts
: This file contains Puppeteer tests for the day-range-sk
element.
loadCachedTestBed
utility to set up a testing environment, navigates to the demo page, and takes screenshots for visual regression testing. It also performs a basic smoke test to confirm the element is present on the page.Key Workflows:
Initialization: User HTML
-> day-range-sk (attributes: begin, end)
day-range-sk.connectedCallback()
IF begin/end not set
Set default begin (now - 24h), end (now)
_render()
Create two <calendar-input-sk> elements with initial dates
User Selects a New “Begin” Date: User interacts with "Begin" <calendar-input-sk>
<calendar-input-sk> fires "input" event (with new Date)
day-range-sk._beginChanged(event)
Update this.begin (convert Date to timestamp)
this._sendEvent()
Dispatch "day-range-change" event with { begin: new_begin_timestamp, end: current_end_timestamp }
User Selects a New “End” Date: User interacts with "End" <calendar-input-sk>
<calendar-input-sk> fires "input" event (with new Date)
day-range-sk._endChanged(event)
Update this.end (convert Date to timestamp)
this._sendEvent()
Dispatch "day-range-change" event with { begin: current_begin_timestamp, end: new_end_timestamp }
Parent Component Consumes Date Range: Parent Component
Listen for "day-range-change" on <day-range-sk>
On event:
Access event.detail.begin
Access event.detail.end
Perform actions with the new date range
The conversion between Date
objects (used by calendar-input-sk
) and numeric timestamps (used by day-range-sk
's attributes and events) is handled internally by the dateFromTimestamp
utility function and by using Date.prototype.valueOf() / 1000
. This design ensures that the day-range-sk
element exposes a simple, numeric API for its date range while leveraging a more complex date object-based component for the UI.
The domain-picker-sk
module provides a custom HTML element <domain-picker-sk>
that allows users to select a data domain. This domain can be defined in two ways: either as a specific date range or as a number of data points (commits) preceding a chosen end date. This flexibility is crucial for applications that need to visualize or analyze time-series data where users might want to focus on a specific period or view the most recent N data points.
The core design choice is to offer these two distinct modes of domain selection, catering to different user needs. The “Date Range” mode is useful when users know the specific start and end dates they are interested in. The “Dense” mode is more suitable when users want to see a fixed amount of recent data, regardless of the specific start date.
The component's state is managed internally and can also be set externally via the state
property. This state
object, defined by the DomainPickerState
interface, holds the begin
and end
timestamps (in Unix seconds), the num_commits
(for “Dense” mode), and the request_type
which indicates the current selection mode (0 for “Date Range” - RANGE
, 1 for “Dense” - DENSE
).
Key Files and Their Responsibilities:
domain-picker-sk.ts
: This is the heart of the module. It defines the DomainPickerSk
class, which extends ElementSk
.
lit-html
library for templating, allowing for efficient updates to the DOM when the state changes. The template
static method defines the basic structure, and _showRadio
and _requestType
static methods conditionally render different parts of the UI based on the current request_type
and the force_request_type
attribute._state
object. Initial default values are set in the constructor (e.g., end date is now, begin date is 24 hours ago, default num_commits
is 50).typeRange
, typeDense
, beginChange
, endChange
, and numChanged
update the internal _state
and then call render()
to reflect these changes in the UI.force_request_type
attribute ('range'
or 'dense'
) allows the consuming application to lock the picker into a specific mode, hiding the radio buttons that would normally allow the user to switch. This is useful when the application context dictates a specific type of domain selection. The attributeChangedCallback
and the getter/setter for force_request_type
handle this.radio-sk
for mode selection and calendar-input-sk
for date picking, promoting modularity and reuse.domain-picker-sk.scss
: This file contains the SASS styles for the component.
elements-sk/modules/styles
for consistency (e.g., buttons, colors).index.ts
: A simple entry point that imports and registers the domain-picker-sk
custom element.
import './domain-picker-sk';
which ensures the DomainPickerSk
class is defined and registered with the browser's CustomElementRegistry
via the define
function call within domain-picker-sk.ts
.domain-picker-sk-demo.html
and domain-picker-sk-demo.ts
: These files provide a demonstration page for the component.
domain-picker-sk-demo.html
includes instances of <domain-picker-sk>
, some with the force_request_type
attribute set. domain-picker-sk-demo.ts
initializes the state
of these demo instances with sample data.domain-picker-sk_puppeteer_test.ts
: Contains Puppeteer tests for the component.
puppeteer-tests/util
library to load the demo page and take screenshots, verifying the visual appearance of the component in its default state.Key Workflows/Processes:
Initialization and Rendering:
<domain-picker-sk>
element is added to the DOM.connectedCallback
is invoked.state
and force_request_type
are upgraded (if set as attributes before the element was defined)._state
is established (e.g., end = now, begin = 24h ago, mode = RANGE).render()
is called:force_request_type
. If set, it overrides _state.request_type
._showRadio
decides whether to show mode selection radio buttons._requestType
renders either the “Begin” date input (for RANGE mode) or the “Points” number input (for DENSE mode).[DOM Insertion] -> connectedCallback() -> _upgradeProperty('state') -> _upgradeProperty('force_request_type') -> render() | V [UI Displayed]
User Changes Mode (if force_request_type
is not set):
@change
event triggers typeRange()
or typeDense()
._state.request_type
is updated.render()
is called.[User clicks radio] -> typeRange()/typeDense() -> _state.request_type updated -> render() | V [UI Updates]
User Changes Date/Number of Commits:
<calendar-input-sk>
(for Begin/End dates) or the <input type="number">
(for Points).@input
(for calendar) or @change
(for number input) event triggers beginChange()
, endChange()
, or numChanged()
._state
(e.g., _state.begin
, _state.end
, _state.num_commits
) is updated.render()
is called (though in the case of date changes, the <calendar-input-sk>
handles its own visual update for the date display, and render()
here ensures the parent component is aware and can re-render if other parts depend on these values, although in the current implementation, render()
on the parent might be redundant for just date changes if no other part of this component's template changes directly).[User changes input] -> beginChange()/endChange()/numChanged() | V _state updated | V render() // Potentially re-renders the component | V [UI reflects new value]
The component emits no custom events itself but relies on the events from its child components (radio-sk
, calendar-input-sk
) to trigger internal state updates and re-renders. Consumers of domain-picker-sk
would typically read the state
property to get the user's selection.
The errorMessage
module provides a wrapper around the errorMessage
function from the elements-sk
library. Its primary purpose is to offer a more convenient way to display persistent error messages to the user.
Core Functionality and Design Rationale:
The key differentiation of this module lies in its default behavior for message display duration. While the elements-sk
errorMessage
function requires a duration to be specified for how long a message (often referred to as a “toast”) remains visible, this module defaults the duration to 0
seconds.
This design choice is intentional: a duration of 0
typically signifies that the error message will not automatically close. This is particularly useful in scenarios where an error is critical or requires user acknowledgment, and an auto-dismissing message might be missed. By defaulting to a persistent display, the module prioritizes ensuring the user is aware of the error.
Responsibilities and Key Components:
The module exposes a single function: errorMessage
.
errorMessage(message: string | { message: string } | { resp: Response } | object, duration: number = 0): void
:message
parameter as the underlying elements-sk
function. This means it can handle plain strings, objects with a message
property, objects containing a Response
object (from which an error message can often be extracted), or generic objects.duration
parameter. If not explicitly provided by the caller, it defaults to 0
. This default triggers the persistent display behavior mentioned above.elementsErrorMessage
from the elements-sk
library, passing along the provided message
and the (potentially defaulted) duration
.Workflow:
The typical workflow for using this module is straightforward:
errorMessage
function is imported from this module.errorMessage
function is called with the error details.errorMessage("A critical error occurred.")
-> Displays “A critical error occurred.” indefinitely.errorMessage("Something went wrong.", 5000)
-> Displays “Something went wrong.” for 5 seconds (overriding the default).Essentially, this module acts as a thin convenience layer, promoting a specific error display pattern (persistent messages) by changing the default behavior of a more general utility. This reduces boilerplate for common use cases where persistent error notification is desired.
The existing-bug-dialog-sk
module provides a user interface element for associating performance anomalies with existing bug reports in a bug tracking system (like Monorail). It's designed to be used within a larger performance monitoring application where users need to triage and manage alerts generated by performance regressions.
The core purpose of this module is to simplify the workflow of linking one or more detected anomalies to a pre-existing bug. Instead of manually navigating to the bug tracker and updating the bug, users can do this directly from the performance monitoring interface. This reduces context switching and streamlines the bug management process.
Key Components and Responsibilities:
existing-bug-dialog-sk.ts
: This is the heart of the module, defining the custom HTML element existing-bug-dialog-sk
.
open()
, closeDialog()
)._anomalies
)./_/triage/associate_alerts
) to create the association.anomaly-changed
. This event signals other parts of the application (e.g., charts or lists displaying anomalies) that the anomaly data has been updated (specifically, the bug_id
field) and they might need to re-render./_/anomalies/group_report
to get details of anomalies in the same group, including their associated bug_id
s. This endpoint might return a sid
(state ID) if the report generation is asynchronous, requiring a follow-up request./_/triage/list_issues
to fetch the titles of these bugs. This provides more context to the user than just showing bug IDs.setAnomalies()
method is crucial for initializing the dialog with the relevant anomaly data when it's about to be shown.window.perf.bug_host_url
to construct links to the bug tracker.existing-bug-dialog-sk.scss
: This file contains the SASS/CSS styles for the dialog.
--on-background
, --background
, etc.).index.ts
: This is a simple entry point that imports and registers the existing-bug-dialog-sk
custom element, making it available for use in HTML.
Workflow for Associating Anomalies with an Existing Bug:
setAnomalies()
on an existing-bug-dialog-sk
instance, passing the selected anomalies.open()
on the dialog instance. Application existing-bug-dialog-sk | | | -- setAnomalies(anomalies) --> | | | | ------ open() ---------> | | | | -- fetch_associated_bugs() --> Backend API (/anomalies/group_report) | | | <-- (Associated Bug IDs) -- | | | | -- fetch_bug_titles() ---> Backend API (/triage/list_issues) | | | <--- (Bug Titles) -------- | | | | -- Renders Dialog with form & associated bugs list --
existing-bug-dialog-sk | | -- (User Submits Form) --> | | | | -- _spinner.active = true --> (UI Update: Show spinner) | | | -- fetch('/_/triage/associate_alerts', POST, {bug_id, keys}) --> Backend API | | | <---- (Success/Failure) ---- | | | | -- _spinner.active = false -> (UI Update: Hide spinner) | | | -- IF Success: | | | -- closeDialog() ------> (UI Update: Hide dialog) | | | | | -- window.open(bug_url) -> (Opens bug in new tab) | | | | | -- dispatchEvent('anomaly-changed') --> Application (Notifies other components) | | | -- IF Failure: | | | -- errorMessage(msg) -> (UI Update: Show error toast)
anomaly-changed
) update to reflect the new association.The design prioritizes a clear and focused user experience for a common task in performance alert triaging. By integrating directly with the backend API for bug association and fetching related bug information, it aims to be an efficient tool for developers and SREs. The use of custom events allows for loose coupling with other components in the larger application.
The explore-multi-sk
module provides a user interface for displaying and interacting with multiple performance data graphs simultaneously. This is particularly useful when users need to compare different metrics, configurations, or time ranges side-by-side. The core idea is to leverage the functionality of individual explore-simple-sk
elements, which represent single graphs, and manage their states and interactions within a unified multi-graph view.
State Management: A central State
object within explore-multi-sk
manages properties that are common across all displayed graphs. These include the time range (begin
, end
), display options (showZero
, dots
), and pagination settings (pageSize
, pageOffset
). This approach simplifies the overall state management and keeps the URL from becoming overly complex, as only a limited set of shared parameters need to be reflected in the URL.
Each individual graph (explore-simple-sk
instance) maintains its own specific state related to the data it displays (formulas, queries, selected keys). explore-multi-sk
stores an array of GraphConfig
objects, where each object corresponds to an explore-simple-sk
instance and holds its unique configuration.
The stateReflector
utility is used to synchronize the shared State
with the URL, allowing for bookmarking and sharing of multi-graph views.
Dynamic Graph Addition and Removal: Users can dynamically add new graphs to the view. When a new graph is added, an empty explore-simple-sk
instance is created and the user can then configure its data source (query or formula).
If the useTestPicker
option is enabled (often determined by backend defaults), instead of a simple “Add Graph” button, a test-picker-sk
element is displayed. This component provides a more structured way to select tests and parameters, and upon selection, a new graph is automatically generated and populated.
Graphs can also be removed. Event listeners are in place to handle remove-explore
custom events, which are typically dispatched by the individual explore-simple-sk
elements when a user closes them in a “Multiview” context (where useTestPicker
is active).
Pagination: To handle potentially large numbers of graphs, pagination is implemented using the pagination-sk
element. This allows users to view a subset of the total graphs at a time, improving performance and usability. The pageSize
and pageOffset
are part of the shared state.
Graph Manipulation (Split and Merge):
These operations primarily involve manipulating the graphConfigs
array and then re-rendering the graphs.
Shortcuts: The module supports saving and loading multi-graph configurations using shortcuts. When the configuration of graphs changes (traces added/removed, graphs split/merged), updateShortcutMultiview
is called. This function communicates with a backend service (/_/shortcut/get
and a corresponding save endpoint invoked by updateShortcut
from explore-simple-sk
) to store or retrieve the graphConfigs
associated with a unique shortcut ID. This ID is then reflected in the URL, allowing users to share specific multi-graph setups.
Synchronization of Interactions:
x-axis-toggled
is dispatched. explore-multi-sk
listens for this and updates the x-axis on all other visible graphs to maintain consistency.explore-multi-sk.ts
, the explore-simple-sk
component likely has mechanisms for plot selection. If the plotSummary
feature is active, selections on one graph might influence others, though the provided code for explore-multi-sk
doesn't directly show this cross-graph selection synchronization logic, but it does have syncChartSelection
which would handle this.Defaults and Configuration: The component fetches default configurations from a /_/defaults/
endpoint. These defaults can influence various aspects, such as: - Whether to use test-picker-sk
(useTestPicker
). - Default parameters and their order for test-picker-sk
(include_params
, default_param_selections
). This allows for instance-specific customization of the Perf UI.
explore-multi-sk.ts
:
ExploreMultiSk
custom element. It is responsible for:explore-simple-sk
graph elements.stateReflector
to update the URL based on the shared state.test-picker-sk
if enabled.explore-simple-sk
.pagination-sk
for displaying graphs in pages.test-picker-sk
for adding graphs when useTestPicker
is true.favorites-dialog-sk
to allow users to save graph configurations.explore-multi-sk.html
(Inferred from the Lit html
template in explore-multi-sk.ts
):
explore-multi-sk
element. This includes:test-picker-sk
element (conditionally visible).pagination-sk
elements for navigating through graph pages.#graphContainer
) where the individual explore-simple-sk
elements are dynamically rendered.<button>
elements for user actions.<test-picker-sk>
for test selection.<pagination-sk>
for graph pagination.<favorites-dialog-sk>
for saving favorites.div
(#graphContainer
) to hold the explore-simple-sk
instances.explore-multi-sk.scss
:
explore-multi-sk
element and its children. It ensures that the layout is appropriate for displaying multiple graphs and their controls.#menu
and #pagination
areas.explore-simple-sk
plots.#test-picker
and #add-graph-button
.1. Initial Load and State Restoration:
User navigates to URL with explore-multi-sk | V explore-multi-sk.connectedCallback() | V Fetch defaults from /_/defaults/ | V stateReflector() is initialized | V State is read from URL (or defaults if URL is empty) | V IF state.shortcut is present: Fetch graphConfigs from /_/shortcut/get using the shortcut ID | V ELSE (or after fetching): For each graphConfig (or if starting fresh, often one empty graph is implied or added): Create/configure explore-simple-sk instance Set its state based on graphConfig and shared state | V Add graphs to the current page based on pagination settings | V Render the component
2. Adding a Graph (without Test Picker):
User clicks "Add Graph" button | V explore-multi-sk.addEmptyGraph() is called | V A new ExploreSimpleSk instance is created A new empty GraphConfig is added to this.graphConfigs | V explore-multi-sk.updatePageForNewExplore() | V IF current page is full: Increment pageOffset (triggering pageChanged) ELSE: Add new graph to current page | V The new explore-simple-sk element might open its query dialog for the user
3. Adding a Graph (with Test Picker):
TestPickerSk is visible (due to defaults or state) | V User interacts with TestPickerSk, selects tests/parameters | V User clicks "Plot" button in TestPickerSk | V TestPickerSk dispatches 'plot-button-clicked' event | V explore-multi-sk listens for 'plot-button-clicked' | V explore-multi-sk.addEmptyGraph(unshift=true) is called (new graph at the top) | V explore-multi-sk.addGraphsToCurrentPage() updates the view | V TestPickerSk.createQueryFromFieldData() gets the query | V The new ExploreSimpleSk instance has its query set
4. Splitting a Graph:
User has one graph with multiple traces and clicks "Split Graph" | V explore-multi-sk.splitGraph() | V this.getTracesets() retrieves traces from the first (and only) graph | V this.clearGraphs() removes the existing graph configuration | V FOR EACH trace in the retrieved traceset: this.addEmptyGraph() A new GraphConfig is created for this trace (e.g., config.queries = [queryFromKey(trace)]) | V this.updateShortcutMultiview() (new shortcut reflecting multiple graphs) | V this.state.pageOffset is reset to 0 | V this.addGraphsToCurrentPage() renders the new set of individual graphs
5. Saving/Updating a Shortcut:
Graph configuration changes (e.g., trace added/removed, graph split/merged, new graph added) | V explore-multi-sk.updateShortcutMultiview() is called | V Calls exploreSimpleSk.updateShortcut(this.graphConfigs) | V (Inside updateShortcut) IF graphConfigs is not empty: POST this.graphConfigs to backend (e.g., /_/shortcut/new or /_/shortcut/update) Backend returns a new or existing shortcut ID | V explore-multi-sk.state.shortcut is updated with the new ID | V this.stateHasChanged() is called, triggering stateReflector to update the URL
The explore-simple-sk
module provides a custom HTML element for exploring and visualizing performance data. It allows users to query, plot, and analyze traces, identify anomalies, and interact with commit details. This element is a core component of the Perf application's data exploration interface.
Core Functionality:
The element's primary responsibility is to provide a user interface for:
Key Design Decisions and Implementation Choices:
State
class in explore-simple-sk.ts
defines the structure of this state./frame/start
endpoint. The requestFrame
method handles initiating these requests and processing the responses. The FrameRequest
and FrameResponse
types define the communication contract with the server.plot-simple-sk
(a custom canvas-based plotter) and plot-google-chart-sk
(which wraps Google Charts). The choice of plotter can be configured.query-sk
for query input, paramset-sk
for displaying parameters, commit-detail-panel-sk
for commit information). This promotes modularity and reusability.explore-simple-sk
element through custom events. For example, when a query changes in query-sk
, it emits a query-change
event that explore-simple-sk
listens to.Key Files and Components:
explore-simple-sk.ts
: This is the main TypeScript file that defines the ExploreSimpleSk
custom element. It handles:explore-simple-sk.html
(embedded in explore-simple-sk.ts
): This Lit-html template defines the structure of the element's UI. It includes placeholders for various child components and dynamic content.explore-simple-sk.scss
: This SCSS file provides the styling for the element and its components.explore-simple-sk.ts
):query-sk
: For constructing and managing queries.paramset-sk
: For displaying and interacting with parameter sets.plot-simple-sk
/ plot-google-chart-sk
: For rendering the plots.commit-detail-panel-sk
: For displaying commit information.anomaly-sk
: For displaying and managing anomalies.Workflow Example: Plotting a Query
query-sk
element to define a query.query-sk
emits a query-change
event with the new query.explore-simple-sk
listens for this event, updates its internal state (specifically the queries
array in the State
object), and triggers a re-render.explore-simple-sk
constructs a FrameRequest
based on the updated state and calls requestFrame
to fetch data from the server. User Input (query-sk) -> Event (query-change) -> State Update (ExploreSimpleSk) -> Data Request (requestFrame)
FrameResponse
, explore-simple-sk
processes the data, updates its internal _dataframe
object, and prepares the data for plotting.explore-simple-sk
passes the processed data to the plot-simple-sk
or plot-google-chart-sk
element, which then renders the traces on the graph. Server Response (FrameResponse) -> Data Processing (ExploreSimpleSk) -> Plot Update (plot-simple-sk/plot-google-chart-sk) -> Visual Output
This workflow illustrates the reactive nature of the element, where user interactions trigger state changes, which in turn lead to data fetching and UI updates.
The explore-sk
module serves as the primary user interface for exploring and analyzing performance data within the Perf application. It provides a comprehensive view for users to query, visualize, and interact with performance traces.
The core functionality of explore-sk
is built upon the explore-simple-sk
element. explore-sk
acts as a wrapper, enhancing explore-simple-sk
with additional features like user authentication integration, default configuration loading, and the optional test-picker-sk
for more guided query construction.
Key Responsibilities and Components:
explore-sk.ts
: This is the main TypeScript file defining the ExploreSk
custom element.
/_/defaults/
). This ensures that the exploration view is pre-configured with sensible starting points.alogin-sk
to determine the logged-in user's status. This information is used to enable features like “favorites” if a user is logged in.stateReflector
to persist and restore the state of the underlying explore-simple-sk
element in the URL. This allows users to share specific views or bookmark their current exploration state.test-picker-sk
. If the use_test_picker_query
flag is set in the state (often via URL parameters or defaults), the test-picker-sk
component is shown, providing a structured way to build queries based on available parameter keys and values.test-picker-sk
(e.g., plot-button-clicked
, remove-all
, populate-query
) and translates these into actions on the explore-simple-sk
element, such as adding new traces based on the selected test parameters or clearing the view.explore-simple-sk
.explore-simple-sk
(imported module): This is a fundamental building block that handles the core trace visualization, querying logic, and interaction with the graph.
explore-sk
delegates most of the heavy lifting related to data exploration to this component. It passes down the initial state, default configurations, and user-specific settings.test-picker-sk
(imported module): A component that allows users to build queries by selecting from available test parameters and their values.
explore-sk
then uses to fetch and display the corresponding traces via explore-simple-sk
. It can also be populated based on a highlighted trace, allowing users to quickly refine queries based on existing data.favorites-dialog-sk
(imported module): Enables users to save and manage their favorite query configurations.
explore-simple-sk
and its functionality is enabled by explore-sk
based on the user's login status.State Management (stateReflector
):
explore-sk
uses stateReflector
to listen for state changes in explore-simple-sk
. When the state changes, stateReflector
updates the URL. Conversely, when the page loads or the URL changes, stateReflector
parses the URL and applies the state to explore-simple-sk
.Workflow Example: Initial Page Load with Test Picker
explore-sk
element is connected to the DOM.connectedCallback
is invoked:/_/defaults/
.stateReflector
is initialized. If the URL contains state for explore-simple-sk
, it's applied.use_test_picker_query = true
.use_test_picker_query
is true:initializeTestPicker()
is called.test-picker-sk
element is made visible.test-picker-sk
is initialized with parameters from the defaults (e.g., include_params
, default_param_selections
) or from existing queries in the state.test-picker-sk
to select desired test parameters.test-picker-sk
.test-picker-sk
emits a plot-button-clicked
event.explore-sk
listens for this event:test-picker-sk
.exploreSimpleSk.addFromQueryOrFormula()
to add the new traces to the graph.explore-simple-sk
fetches the data, renders the traces, and emits a state_changed
event.stateReflector
captures this state_changed
event and updates the URL to reflect the new query.This workflow illustrates how explore-sk
acts as a central coordinator, integrating various specialized components to provide a cohesive data exploration experience. The design emphasizes modularity, with explore-simple-sk
handling the core plotting and test-picker-sk
offering an alternative query input mechanism, all managed and presented by explore-sk
.
The favorites-dialog-sk
module provides a custom HTML element that displays a modal dialog for users to add or edit “favorites.” Favorites, in this context, are likely user-defined shortcuts or bookmarks to specific views or states within the application, identified by a name, description, and a URL.
Core Functionality and Design:
The primary purpose of this module is to present a user-friendly interface for managing these favorites. It‘s designed as a modal dialog to ensure that the user’s focus is on the task of adding or editing a favorite without distractions from the underlying page content.
Key Components:
favorites-dialog-sk.ts
: This is the heart of the module, defining the FavoritesDialogSk
custom element.
ElementSk
, a base class for custom elements in the Skia infrastructure, providing a common foundation.lit/html.js
) for templating, allowing for declarative and efficient rendering of the dialog's UI.open()
method is the public API for triggering the dialog. It accepts optional parameters for pre-filling the form when editing an existing favorite. Crucially, it returns a Promise
. This promise-based approach is a key design choice. It resolves when the favorite is successfully saved and rejects if the user cancels the dialog. This allows the calling code (likely a parent component managing the list of favorites) to react appropriately, for instance, by re-fetching the updated list of favorites only when a change has actually occurred.confirm()
method handles the submission logic. It performs basic validation (checking for empty name and URL) and then makes an HTTP POST request to either /_/favorites/new
or /_/favorites/edit
depending on whether a new favorite is being created or an existing one is being modified.spinner-sk
element is used to provide visual feedback to the user during the asynchronous operation of saving the favorite.errorMessage
to display issues to the user, such as network errors or validation failures from the backend.dismiss()
method handles the cancellation of the dialog, rejecting the promise returned by open()
.filterName
, filterDescription
, filterUrl
) update the component's internal state as the user types, and trigger re-renders via this._render()
.favorites-dialog-sk.scss
: This file contains the SASS styles for the dialog.
<dialog>
element, input fields, labels, and buttons, ensuring a consistent look and feel within the application's theme (as indicated by @import '../themes/themes.scss';
).favorites-dialog-sk-demo.html
/ favorites-dialog-sk-demo.ts
: These files provide a demonstration page for the favorites-dialog-sk
element.
open()
method of the favorites-dialog-sk
element with appropriate parameters.Workflow: Adding/Editing a Favorite
A typical workflow involving this dialog would be:
User Action: The user clicks a button (e.g., “Add Favorite” or an “Edit” icon next to an existing favorite) in the main application UI.
Dialog Invocation: The event handler for this action calls the open()
method of an instance of favorites-dialog-sk
.
open()
might be called with minimal or no arguments, defaulting the URL to the current page.open()
is called with the favId
, name
, description
, and url
of the favorite to be edited.User clicks "Add New" --> favoritesDialog.open('', '', '', 'current.page.url') | V Dialog Appears | V User fills form, clicks "Save" --> confirm() is called | V POST /_/favorites/new | V (Success) Dialog closes, open() Promise resolves | V Calling component re-fetches favorites -------------------------------- OR --------------------------------- User clicks "Edit Favorite" --> favoritesDialog.open('id123', 'My Fav', 'Desc', 'fav.url.com') | V Dialog Appears (pre-filled) | V User modifies form, clicks "Save" --> confirm() is called | V POST /_/favorites/edit (with 'id123') | V (Success) Dialog closes, open() Promise resolves | V Calling component re-fetches favorites -------------------------------- OR --------------------------------- User clicks "Cancel" or Close Icon --> dismiss() is called | V Dialog closes, open() Promise rejects | V Calling component does nothing (no re-fetch)
User Interaction: The user fills in or modifies the “Name,” “Description,” and “URL” fields in the dialog.
Submission/Cancellation:
confirm()
method is invoked.fetch
request is made to the backend API (/_/favorites/new
or /_/favorites/edit
).Promise
returned by open()
resolves.confirm
).dismiss()
method is invoked.Promise
returned by open()
rejects.Post-Dialog Action: The component that initiated the dialog (e.g., a favorites-sk
list component) uses the resolved/rejected state of the Promise
to decide whether to refresh its list of favorites. This is a key aspect of the design – it avoids unnecessary re-fetches if the user simply cancels the dialog.
The design prioritizes a clear separation of concerns, using custom elements for UI encapsulation, SASS for styling, and a promise-based API for asynchronous operations and communication with parent components. This makes the favorites-dialog-sk
a reusable and well-defined piece of UI for managing user favorites.
The favorites-sk
module provides a user interface element for displaying and managing a user's “favorites”. Favorites are essentially bookmarked URLs, categorized into sections. This module allows users to view their favorited links, edit their details (name, description, URL), and delete them.
Core Functionality & Design:
The primary responsibility of favorites-sk
is to fetch favorite data from a backend endpoint (/_/favorites/
) and render it in a user-friendly way. It also handles interactions for modifying these favorites, such as editing and deleting.
Data Fetching and Rendering:
connectedCallback
), the element attempts to fetch the favorites configuration from the backend.Favorites
JSON format (defined in perf/modules/json
), is stored in the favoritesConfig
property._render()
method is called to update the display.Favorite Management:
deleteFavoriteConfirm
method is invoked.window.confirm
) to prevent accidental deletions.deleteFavorite
sends a POST request to /_/favorites/delete
with the ID of the favorite to be removed.editFavorite
method.favorites-dialog-sk
element (defined in perf/modules/favorites-dialog-sk
).favorites-dialog-sk
is responsible for presenting a modal dialog where the user can modify the favorite's name, description, and URL.Error Handling:
errorMessage
utility (from elements-sk/modules/errorMessage
).Key Components/Files:
favorites-sk.ts
: This is the heart of the module. It defines the FavoritesSk
custom element, extending ElementSk
. It contains the logic for fetching, rendering, deleting, and initiating the editing of favorites.constructor()
: Initializes the element with its Lit-html template.deleteFavorite()
: Handles the asynchronous request to the backend for deleting a favorite.deleteFavoriteConfirm()
: Provides a confirmation step before actual deletion.editFavorite()
: Manages the interaction with the favorites-dialog-sk
for editing.template()
: The static Lit-html template function that defines the overall structure of the element.getSectionsTemplate()
: A helper function that dynamically generates the HTML for displaying sections and their links based on favoritesConfig
. It specifically adds edit/delete controls for the “My Favorites” section.fetchFavorites()
: Fetches the favorites data from the backend and triggers a re-render.connectedCallback()
: A lifecycle method that ensures favorites are fetched when the element is added to the page.favorites-sk.scss
: Provides the styling for the favorites-sk
element, defining its layout, padding, colors for links, and table appearance.index.ts
: A simple entry point that imports and registers the favorites-sk
custom element, making it available for use in HTML.favorites-sk-demo.html
& favorites-sk-demo.ts
: These files provide a demonstration page for the favorites-sk
element. The HTML includes an instance of <favorites-sk>
and a <pre>
tag to display events. The TypeScript file simply imports the element and sets up an event listener (though no custom events are explicitly dispatched by favorites-sk
in the provided code).Workflow: Deleting a Favorite
User Clicks "Delete" Button (for a link in "My Favorites") | V favorites-sk.ts: deleteFavoriteConfirm(id, name) | V window.confirm("Deleting favorite: [name]. Are you sure?") | +-- User clicks "Cancel" --> Workflow ends | V User clicks "OK" favorites-sk.ts: deleteFavorite(id) | V fetch('/_/favorites/delete', { method: 'POST', body: {id: favId} }) | +-- Network Error/Non-OK Response --> errorMessage() is called, display error | V Successful Deletion favorites-sk.ts: fetchFavorites() | V fetch('/_/favorites/') | V Parse JSON response, update this.favoritesConfig | V this._render() // Re-renders the component with the updated list
Workflow: Editing a Favorite
User Clicks "Edit" Button (for a link in "My Favorites") | V favorites-sk.ts: editFavorite(id, name, desc, url) | V Get reference to <favorites-dialog-sk id="fav-dialog"> | V favorites-dialog-sk.open(id, name, desc, url) // Opens the edit dialog | +-- User cancels dialog --> Promise rejects (potentially with undefined, handled) | V User submits changes in dialog Promise resolves | V favorites-sk.ts: fetchFavorites() // Re-fetches and re-renders the list | V fetch('/_/favorites/') | V Parse JSON response, update this.favoritesConfig | V this._render()
The design relies on Lit for templating and rendering, which provides efficient updates to the DOM when the favoritesConfig
data changes. The separation of concerns is evident: favorites-sk
handles the list display and top-level actions, while favorites-dialog-sk
manages the intricacies of the editing form.
graph-title-sk
)The graph-title-sk
module provides a custom HTML element designed to display titles for individual graphs in a structured and informative way. Its primary goal is to present key-value pairs of metadata associated with a graph in a visually clear and space-efficient manner.
The core of this module is the GraphTitleSk
custom element (graph-title-sk.ts
). Its main responsibilities are:
Data Reception and Storage: It receives a Map<string, string>
where keys represent parameter names (e.g., “bot”, “benchmark”) and values represent their corresponding values (e.g., “linux-perf”, “Speedometer2”). This map, along with the number of traces in the graph, is provided via the set()
method.
Dynamic Rendering: Based on the provided data, the element dynamically generates HTML to display the title. It iterates through the key-value pairs and renders them in a columnar layout. Each pair is displayed with the key (parameter name) in a smaller font above its corresponding value.
Handling Empty or Generic Titles:
titleEntries
map is empty but numTraces
is greater than zero, it displays a generic title like “Multi-trace Graph (X traces)” to indicate a graph with multiple data series without specific shared parameters.Space Management and Truncation:
- The title entries are arranged in a flexible, wrapping layout (`display:
flex; flex-wrap: wrap;) using CSS (
graph-title-sk.scss`). This allows the title to adapt to different screen widths.
MAX_PARAMS
, currently 8), it initially displays only the first MAX_PARAMS
entries. A “Show Full Title” button (<md-text-button class="showMore">
) is then provided, allowing the user to expand the view and see all title entries. Conversely, a “Show Short Title” mechanism is implied (though not explicitly shown as a button in the current code, showShortTitles()
method exists) to revert to the truncated view.title
attribute of thediv
containing the value.ElementSk
): The component is built as a custom element extending ElementSk
. This aligns with the Skia infrastructure's approach to building reusable UI components and allows for easy integration into Skia applications.lit
library‘s html
template literal tag. This provides a declarative and efficient way to define the component’s view and update it when data changes. The _render()
method, inherited from ElementSk
, is called to trigger re-rendering when the internal state (_titleEntries
, numTraces
, showShortTitle
) changes.graph-title-sk.scss
). This separates presentation concerns from the component‘s logic. CSS variables (e.g., var(--primary)
) are used for theming, allowing the component’s appearance to be consistent with the overall application theme.set()
Method for Data Input: Instead of relying solely on HTML attributes for complex data like a map, a public set()
method is provided. This is a common pattern for custom elements when dealing with non-string data or when updates need to trigger specific internal logic beyond simple attribute reflection.MAX_PARAMS
) and provide a “Show Full Title” option is a user experience choice. It prioritizes a clean initial view for complex graphs while still allowing users to access all details if needed.1. Initial Rendering with Data:
User/Application Code GraphTitleSk Element --------------------- -------------------- calls set(titleData, numTraces) --> stores titleData & numTraces calls _render() | V getTitleHtml() is invoked | V Iterates titleData: - Skips empty keys/values - If entries > MAX_PARAMS & showShortTitle is true: - Renders first MAX_PARAMS entries - Renders "Show Full Title" button - Else: - Renders all entries | V HTML template is updated with generated content Browser renders the title
2. Toggling Full/Short Title Display (when applicable):
User Interaction GraphTitleSk Element ---------------- -------------------- Clicks "Show Full Title" button --> onClick handler (showFullTitle) executes | V this.showShortTitle = false calls _render() | V getTitleHtml() is invoked | V Now renders ALL title entries because showShortTitle is false | V HTML template is updated Browser re-renders the title to show all entries
A similar flow occurs if a mechanism to call showShortTitles()
is implemented and triggered.
The demo page (graph-title-sk-demo.html
and graph-title-sk-demo.ts
) showcases various states of the graph-title-sk
element, including:
numTraces
is 0 and the map is also empty, which would result in no title being displayed).Overview:
The ingest-file-links-sk
module provides a custom HTML element, <ingest-file-links-sk>
, designed to display a list of relevant links associated with a specific data point in the Perf performance monitoring system. These links are retrieved from the ingest.Format
data structure, which can be generated by various ingestion processes. The primary purpose is to offer users quick access to related resources, such as Swarming task runs, Perfetto traces, or bot information, directly from the Perf UI.
Why:
Performance analysis often requires context beyond the raw data. Understanding the environment in which a test ran (e.g., specific bot configuration), or having direct access to detailed trace files, can be crucial for debugging performance regressions or understanding improvements. This module centralizes these relevant links in a consistent and easily accessible manner, improving the efficiency of performance investigations.
How:
The <ingest-file-links-sk>
element fetches link data asynchronously. When its load()
method is called with a CommitNumber
(representing a specific point in time or version) and a traceID
(identifying the specific data series), it makes a POST request to the /_/details/?results=false
endpoint. This endpoint is expected to return a JSON object conforming to the ingest.Format
structure.
The element then parses this JSON response. It specifically looks for the links
field within the ingest.Format
. If links
exist and the version
field in the ingest.Format
is present (indicating a modern format), the element dynamically renders a list of these links.
Key design considerations and implementation details:
spinner-sk
element is displayed while data is being loaded.links
object. If a value is a valid URL, it‘s rendered as an <a>
tag. Otherwise, it’s displayed as “Key: Value”.[Link Text](url)
) into standard HTML anchor tags. This allows ingestion processes to provide links in a more human-readable format if desired.version
field in the response. If it‘s missing, it assumes a legacy data format that doesn’t support these links and gracefully avoids displaying anything.Responsibilities and Key Components:
ingest-file-links-sk.ts
: This is the core file defining the IngestFileLinksSk
custom element.load(cid: CommitNumber, traceid: string)
method is the public API for triggering the data fetching and rendering process.displayLinks
static method is responsible for generating the TemplateResult
array for rendering the list items.isUrl
and removeMarkdown
helper functions provide utility for link processing.ingest-file-links-sk.scss
: This file contains the SASS styles for the custom element, defining its appearance, including list styling and spinner positioning.ingest-file-links-sk-demo.html
and ingest-file-links-sk-demo.ts
: These files provide a demonstration page for the element. The demo page uses fetch-mock
to simulate the backend API response, allowing developers to see the element in action and test its functionality in isolation.ingest-file-links-sk_test.ts
: This file contains unit tests for the IngestFileLinksSk
element. It uses fetch-mock
to simulate various API responses and asserts the element's behavior, such as correct link rendering, spinner state, and error handling.ingest-file-links-sk_puppeteer_test.ts
: This file contains Puppeteer-based end-to-end tests. These tests load the demo page in a headless browser and verify the element's visual rendering and basic functionality.Key Workflow: Loading and Displaying Links
User Action/Page Load -> Calls ingest-file-links-sk.load(commit, traceID) | V ingest-file-links-sk: Show spinner-sk | V Make POST request to /_/details/?results=false (with commit and traceID in request body) | V Backend API: Processes request, retrieves links for the given commit and trace | V ingest-file-links-sk: Receives JSON response (ingest.Format) | +----------------------+ | | V V Response OK? Response Error? | | V V Parse links Display error message Hide spinner Hide spinner Render link list
This module defines TypeScript interfaces and types that represent the structure of JSON data used throughout the Perf application. It essentially acts as a contract between the Go backend and the TypeScript frontend, ensuring data consistency and type safety.
Why:
The primary motivation for this module is to leverage TypeScript's strong typing capabilities. By defining these interfaces, we can catch potential data inconsistencies and errors at compile time rather than runtime. This is particularly crucial for a data-intensive application like Perf, where the frontend relies heavily on JSON responses from the backend.
Furthermore, these definitions are automatically generated from Go struct definitions. This ensures that the frontend and backend data models remain synchronized. Any changes to the Go structs will trigger an update to these TypeScript interfaces, reducing the likelihood of manual errors and inconsistencies.
How:
The index.ts
file contains all the interface and type definitions. These are organized into a flat structure for simplicity, with some nested namespaces (e.g., pivot
, progress
, ingest
) where logical grouping is beneficial.
A key design choice is the use of nominal typing for certain primitive types (e.g., CommitNumber
, TimestampSeconds
, Trace
). This is achieved by creating type aliases that are branded with a unique string literal type. For example:
export type CommitNumber = number & { _commitNumberBrand: 'type alias for number'; }; export function CommitNumber(v: number): CommitNumber { return v as CommitNumber; }
This prevents accidental assignment of a generic number
to a CommitNumber
variable, even though they are structurally identical at runtime. This adds an extra layer of type safety, ensuring that, for example, a timestamp is not inadvertently used where a commit number is expected. Helper functions (e.g., CommitNumber(v: number)
) are provided for convenient type assertion.
Key Components/Files/Submodules:
index.ts
: This is the sole file in this module and contains all the TypeScript interface and type definitions. It serves as the single source of truth for JSON data structures used in the frontend.Alert
, DataFrame
, FrameRequest
, Regression
): These define the shape of complex JSON objects. For instance, the Alert
interface describes the structure of an alert configuration, including its query, owner, and various detection parameters. The DataFrame
interface represents the core data structure for displaying traces, including the actual trace data (traceset
), column headers (header
), and associated parameter sets (paramset
).ClusterAlgo
, StepDetection
, Status
): These define specific allowed string values for certain properties, acting like enums. For example, ClusterAlgo
can only be 'kmeans'
or 'stepfit'
, ensuring that only valid clustering algorithms are specified.CommitNumber
, TimestampSeconds
, Trace
, ParamSet
): As explained above, these provide stronger type checking for primitive types that have specific semantic meaning within the application. TraceSet
, for example, is a map where keys are trace identifiers (strings) and values are Trace
arrays (nominally typed number[]
).pivot.Request
, ingest.Format
): Some interfaces are grouped under namespaces to organize related data structures. For example, pivot.Request
defines the structure for requesting pivot table operations, including grouping criteria and aggregation operations. The ingest.Format
interface defines the structure of data being ingested into Perf, including metadata like Git hash and the actual performance results.ReadOnlyParamSet
, AnomalyMap
): These represent common data patterns. ReadOnlyParamSet
is a map of parameter names to arrays of their possible string values, marked as read-only to reflect its typical usage. AnomalyMap
is a nested map structure used to associate anomalies with specific commits and traces.Workflow Example: Requesting and Displaying Trace Data
A common workflow involves the frontend requesting trace data from the backend and then displaying it.
Frontend (Client) prepares a FrameRequest
:
Client Code --> Creates `FrameRequest` object: { begin: 1678886400, // Start timestamp end: 1678972800, // End timestamp queries: ["config=gpu&name=my_test_trace"], // ... other properties }
Frontend sends the FrameRequest
to the Backend (Server).
Backend processes the request and generates a FrameResponse
:
Server Logic --> Processes `FrameRequest` --> Fetches data from database/cache --> Constructs `FrameResponse` object: { dataframe: { traceset: { "config=gpu&name=my_test_trace": [10.1, 10.5, 9.8, ...Trace] }, header: [ { offset: 12345, timestamp: 1678886400 }, ...ColumnHeader[] ], paramset: { "config": ["gpu", "cpu"], "name": ["my_test_trace"] } }, skps: [0, 5, 10], // Indices of significant points // ... other properties like msg, display_mode, anomalymap }
Backend sends the FrameResponse
(as JSON) back to the Frontend.
Frontend receives the JSON and parses it, expecting it to conform to the FrameResponse
interface: Client Code --> Receives JSON --> Parses JSON into a `FrameResponse` typed object --> Uses `frameResponse.dataframe.traceset` to render charts --> Uses `frameResponse.dataframe.header` to display commit information
This typed interaction ensures that if the backend, for example, renamed traceset
to trace_data
in its Go struct, the automatic generation would update the DataFrame
interface. The TypeScript compiler would then flag an error in the frontend code trying to access frameResponse.dataframe.traceset
, preventing a runtime error and guiding the developer to update the frontend code accordingly.
The json-source-sk
module provides a custom HTML element, <json-source-sk>
, designed to display the raw JSON data associated with a specific data point in a trace. This is particularly useful in performance analysis and debugging scenarios where understanding the exact input data ingested by the system is crucial.
The core responsibility of this module is to fetch and present JSON data in a user-friendly dialog. It aims to simplify the process of inspecting the source data for a given commit and trace identifier.
The key component is the JSONSourceSk
class, defined in json-source-sk.ts
. This class extends ElementSk
, a base class for custom elements in the Skia infrastructure.
How it Works:
Initialization and Properties:
cid
: The Commit ID (represented as CommitNumber
), which identifies a specific version or point in time.traceid
: A string identifier for the specific trace being examined.traceid
is not a valid key (checked by validKey
from perf/modules/paramtools
), the control buttons are hidden.User Interaction and Data Fetching:
_loadSource
or _loadSourceSmall
methods, respectively._loadSourceImpl
. This implementation detail allows for sharing the core fetching logic while differentiating the request URL._loadSourceImpl
constructs a CommitDetailsRequest
object containing the cid
and traceid
./_/details/
endpoint.isSmall
is true), the URL includes ?results=false
, indicating to the backend that a potentially truncated or summarized version of the JSON is requested.spinner-sk
element is activated to provide visual feedback during the fetch operation.jsonOrThrow
. If the request is successful, the JSON data is formatted with indentation and stored in the _json
private property.errorMessage
(from perf/modules/errorMessage
) is used to display an error notification to the user.Displaying the JSON:
<dialog>
element (#json-dialog
).jsonFile()
method in the template is responsible for rendering the <pre>
tag containing the formatted JSON string, but only if _json
is not empty.showModal()
, providing a modal interface for viewing the JSON.#closeIcon
with a close-icon-sk
) allows the user to dismiss the dialog. Closing the dialog also clears the _json
property.Design Rationale:
async/await
and fetch
allows for non-blocking data retrieval, ensuring the UI remains responsive while waiting for the server.jsonOrThrow
and errorMessage
provides a better user experience by informing users about issues during data retrieval.spinner-sk
element clearly indicates when data is being loaded.<dialog>
) for displaying the JSON helps focus the user's attention on the data without cluttering the main interface.json-source-sk.scss
) provides basic styling and leverages existing button styles (//elements-sk/modules/styles:buttons_sass_lib
). It also includes considerations for dark mode by using CSS variables like --on-background
and --background
.Workflow Example: Viewing JSON Source
User Sets Properties Element Renders User Clicks Button Fetches Data Displays JSON -------------------- --------------- ------------------ ------------ ------------- [json-source-sk -> [Buttons visible] -> ["View Json File"] -> POST /_/details/ -> <dialog> .cid = 123 {cid, traceid} <pre>{json}</pre> .traceid = ",foo=bar,"] </dialog> (spinner active) | V Response Received (spinner inactive)
The demo page (json-source-sk-demo.html
and json-source-sk-demo.ts
) illustrates how to use the <json-source-sk>
element. It sets up mock data using fetchMock
to simulate the backend endpoint and programmatically clicks the button to demonstrate the JSON loading functionality.
The Puppeteer test (json-source-sk_puppeteer_test.ts
) ensures the element renders correctly and performs basic visual regression testing.
The new-bug-dialog-sk
module provides a user interface element for filing new bugs related to performance anomalies. It aims to streamline the bug reporting process by pre-filling relevant information and integrating with the Buganizer issue tracker.
Core Functionality:
The primary responsibility of this module is to display a dialog that allows users to input details for a new bug. This dialog is populated with information derived from one or more selected Anomaly
objects. The user can then review and modify this information before submitting the bug.
Key Design Decisions and Implementation Choices:
getBugTitle()
, mimics the behavior of the legacy Chromeperf UI to maintain familiarity for users.getLabelCheckboxes()
and getComponentRadios()
.Anomaly
data. This ensures that only relevant options are presented to the user. Lit-html’s templating capabilities are used for this dynamic rendering./alogin-sk
./_/triage/file_bug
endpoint.spinner-sk
) is displayed during this operation to provide visual feedback.anomaly-changed
event is dispatched to notify other components (like explore-simple-sk
or chart-tooltip-sk
) that the anomalies have been updated with the new bug ID.error-toast-sk
, and the dialog remains open, allowing the user to retry or correct information.<dialog>
HTML element, which provides built-in accessibility and modal behavior.Workflow: Filing a New Bug
setAnomalies()
method on new-bug-dialog-sk
, passing the relevant Anomaly
objects and associated trace names.open()
method. User Action (e.g., click "File Bug" button) | V External Component --[setAnomalies(anomalies, traceNames)]--> new-bug-dialog-sk | V External Component --[open()]--> new-bug-dialog-sk
new-bug-dialog-sk
fetches the current user's login status to pre-fill the CC field. - The _render()
method is called, which uses the Lit-html template. - getBugTitle()
generates a suggested title. - getLabelCheckboxes()
and getComponentRadios()
create the UI for selecting labels and components based on the input anomalies. - The dialog (<dialog id="new-bug-dialog">
) is displayed modally. new-bug-dialog-sk.open() | V [Fetch Login Status] --> Updates `_user` | V _render() |--> getBugTitle() --> Populates Title Input |--> getLabelCheckboxes() --> Creates Label Checkboxes |--> getComponentRadios() --> Creates Component Radios | V Dialog is displayed to the user
User clicks "Submit" | V Form Submit Event | V new-bug-dialog-sk.fileNewBug()
fileNewBug()
method is invoked. - The spinner is activated, and form buttons are disabled. - Form data (title, description, selected labels, selected component, assignee, CCs, anomaly keys, trace names) is collected. - A POST request is sent to /_/triage/file_bug
with the collected data. fileNewBug() | V [Activate Spinner, Disable Buttons] | V [Extract Form Data] | V fetch('/_/triage/file_bug', {POST, body: jsonData})
bug_id
. - The spinner is deactivated, and buttons are re-enabled. - The dialog is closed. - A new browser tab is opened to the URL of the created bug (e.g., https://issues.chromium.org/issues/BUG_ID
). - The bug_id
is updated in the local _anomalies
array. - An anomaly-changed
custom event is dispatched with the updated anomalies and bug ID. - Failure: - The server responds with an error. - The spinner is deactivated, and buttons are re-enabled. - An error message is displayed to the user via errorMessage()
. The dialog remains open. fetch Response | +-- Success (HTTP 200, valid JSON with bug_id) | | | V | [Deactivate Spinner, Enable Buttons] | | | V | closeDialog() | | | V | window.open(bugUrl, '_blank') | | | V | Update local _anomalies with bug_id | | | V | dispatchEvent('anomaly-changed', {anomalies, bugId}) | +-- Failure (HTTP error or invalid JSON) | V [Deactivate Spinner, Enable Buttons] | V errorMessage(errorMsg) --> Displays error toast
Key Files:
new-bug-dialog-sk.ts
: This is the core file containing the NewBugDialogSk
class definition, which extends ElementSk
. It includes the Lit-html template for the dialog, the logic for populating form fields based on Anomaly
data, handling form submission, interacting with the backend API to file the bug, and managing the dialog's visibility and state.new-bug-dialog-sk.scss
: This file defines the styles for the dialog, ensuring it integrates visually with the rest of the application and themes. It styles the dialog container, input fields, buttons, and the close icon.new-bug-dialog-sk-demo.ts
and new-bug-dialog-sk-demo.html
: These files provide a demonstration page for the new-bug-dialog-sk
element. The .ts
file sets up mock data (Anomaly
objects) and mock fetch responses to simulate the bug filing process, allowing for isolated testing and development of the dialog. The .html
file includes the new-bug-dialog-sk
element and a button to trigger its opening.index.ts
: This file simply imports new-bug-dialog-sk.ts
to ensure the custom element is defined and available for use.The module relies on several other elements and libraries:
alogin-sk
: To determine the logged-in user for CC'ing.close-icon-sk
: For the dialog's close button.spinner-sk
: To indicate activity during bug filing.error-toast-sk
(via errorMessage
utility): To display error messages.lit
: For templating and component rendering.jsonOrThrow
: A utility for parsing JSON responses and throwing errors on failure.The paramtools
module provides a TypeScript implementation of utility functions for manipulating parameter sets and structured keys. It mirrors the functionality found in the Go module /infra/go/paramtools
, which is the primary source of truth for these operations. The decision to replicate this logic in TypeScript is to enable client-side applications to perform these common tasks without needing to make server requests for simple transformations or validations. This approach improves performance and reduces server load for UI-driven interactions.
The core responsibility of this module is to provide robust and consistent ways to:
ParamSet
objects: ParamSet
s are used to represent collections of possible parameter values, often used for filtering or querying data.Key functionalities and their “why” and “how”:
makeKey(params: Params | { [key: string]: string }): string
:
Params
object (a dictionary of string key-value pairs). It first checks if the params
object is empty, throwing an error if it is, as a key must represent at least one parameter. Then, it sorts the keys of the params
object alphabetically. Finally, it constructs the string by joining each key-value pair with =
and then joining these pairs with ,
, prefixing and suffixing the entire string with a comma. Input: { "b": "2", "a": "1", "c": "3" } | V Sort keys: [ "a", "b", "c" ] | V Format pairs: "a=1", "b=2", "c=3" | V Join and wrap: ",a=1,b=2,c=3,"
fromKey(structuredKey: string, attribute?: string): Params
:
Params
object, making it easier to work with the individual parameters programmatically. It also handles the removal of special functions that might be embedded in the key (e.g., norm(...)
for normalization).removeSpecialFunctions
to strip any function wrappers from the key. Then, it splits the key string by the comma delimiter. Each resulting segment (if not empty) is then split by the equals sign to separate the key and value. These key-value pairs are collected into a new Params
object. An optional attribute
parameter allows excluding a specific key from the resulting Params
object, which can be useful in scenarios where certain attributes are metadata and not part of the core parameters.removeSpecialFunctions(key: string): string
:
norm(...)
, avg(...)
) or special markers (e.g., special_zero
). This function is designed to strip these away, returning the “raw” underlying key. This is important when you need to work with the base parameters without the context of the applied function or special condition.function_name(,param1=value1,...)
. If a match is found, it extracts the content within the parentheses. The extracted string (or the original key if no function was found) is then processed by extractNonKeyValuePairsInKey
.extractNonKeyValuePairsInKey(key: string): string
: This helper function further refines the key string. It splits the string by commas and filters out any segments that do not represent a valid key=value
pair. This helps to remove extraneous parts like special_zero
that might be comma-separated but aren't true parameters. The valid pairs are then re-joined and wrapped with commas.validKey(key: string): boolean
:
avg(...)
) or other special trace types. This is a lightweight validation, as the server performs more comprehensive checks.addParamsToParamSet(ps: ParamSet, p: Params): void
:
Params
object) to an existing ParamSet
. ParamSet
s store unique values for each parameter key. This function ensures that when new parameters are added, only new values are appended to the existing lists for each key, maintaining uniqueness.Params
object (p
). For each key, it retrieves the corresponding array of values from the ParamSet
(ps
). If the key doesn‘t exist in ps
, a new array is created. If the value from p
is not already present in the array, it’s added.paramsToParamSet(p: Params): ParamSet
:
Params
object (representing one specific combination of parameters) into a ParamSet
. In a ParamSet
, each key maps to an array of values, even if there's only one value.ParamSet
. Then, for each key-value pair in the input Params
object, it creates a new entry in the ParamSet
where the key maps to an array containing just that single value.addParamSet(p: ParamSet, ps: ParamSet | ReadOnlyParamSet): void
:
ParamSet
(or ReadOnlyParamSet
) into another. This is useful for combining sets of available parameter options, for example, when aggregating data from multiple sources.ParamSet
(ps
). If a key from ps
is not present in the target ParamSet
(p
), the entire key and its value array (cloned) are added to p
. If the key already exists in p
, it iterates through the values in the source array and adds any values that are not already present in the target array for that key.toReadOnlyParamSet(ps: ParamSet): ReadOnlyParamSet
:
ParamSet
to an immutable ReadOnlyParamSet
. This is useful for signaling that a ParamSet
should not be modified further, typically when passing it to components or functions that expect read-only data.queryFromKey(key: string): string
:
a=1&b=2&c=3
). This is specifically useful for frontend applications, like explore-simple-sk
, where state or filters are often represented in the URL.fromKey
to parse the structured key into a Params
object. Then, it leverages the URLSearchParams
browser API to construct a query string from these parameters. This ensures proper URL encoding of keys and values. Input Key: ",a=1,b=2,c=3," | V fromKey -> Params: { "a": "1", "b": "2", "c": "3" } | V URLSearchParams -> Query String: "a=1&b=2&c=3"
The design choice to have these functions operate with less stringent validation than their server-side Go counterparts is deliberate. The server remains the ultimate authority on data validity. These client-side functions prioritize ease of use and performance for UI interactions, assuming that the data they operate on has either originated from or will eventually be validated by the server.
The index_test.ts
file provides comprehensive unit tests for these functions, ensuring their correctness and robustness across various scenarios, including handling empty inputs, duplicate values, and special key formats. This focus on testing is crucial for maintaining the reliability of these foundational utility functions.
The perf-scaffold-sk
module provides a consistent layout and navigation structure for all pages within the Perf application. It acts as a wrapper, ensuring that common elements like the title bar, navigation sidebar, and error notifications are present and behave uniformly across different sections of Perf.
Core Responsibilities:
alogin-sk
), theme chooser (theme-chooser-sk
), and error/toast notifications (error-toast-sk
).main
content area and allows for specific content (like help text) to be injected into the sidebar.Key Components and Design Decisions:
perf-scaffold-sk.ts
: This is the heart of the module, defining the PerfScaffoldSk
custom element.
Why: Encapsulating the scaffold logic within a custom element promotes reusability and modularity. It allows any Perf page to adopt the standard layout simply by including this element.
How: It uses Lit for templating and rendering the structure (<app-sk>
, header
, aside#sidebar
, main
, footer
).
Content Redistribution: A crucial design choice is how it handles child elements. Since it doesn't use Shadow DOM for the main content area (to allow global styles to apply easily to the page content), it programmatically moves children of <perf-scaffold-sk>
into the <main>
section.
Process:
connectedCallback
is invoked, existing children of <perf-scaffold-sk>
are temporarily moved out.<main>
element.MutationObserver
is set up to watch for any new children added to <perf-scaffold-sk>
and similarly move them to <main>
.Sidebar Content: An exception is made for elements with the specific ID SIDEBAR_HELP_ID
. These are moved into the #help
div within the sidebar. This allows pages to provide context-specific help information directly within the scaffold.
<perf-scaffold-sk> <!-- This will go into <main> --> <div>Page specific content</div> <!-- This will go into <aside>#help --> <div id="sidebar_help">Contextual help</div> </perf-scaffold-sk>
Configuration via window.perf
: The scaffold reads various configuration options from the global window.perf
object. This allows instances of Perf to customize links (help, feedback, chat), behavior (e.g., show_triage_link
), and display information (e.g., instance URL, build tag). This makes the scaffold adaptable to different Perf deployments.
For example, the _helpUrl
and _reportBugUrl
are initialized with defaults but can be overridden by window.perf.help_url_override
and window.perf.feedback_url
respectively.
The visibility of the “Triage” link is controlled by window.perf.show_triage_link
.
Build Information: It displays the current application build tag, fetching it via getBuildTag()
from //perf/modules/window:window_ts_lib
and linking it to the corresponding commit in the buildbot git repository.
Instance Title: It can display the name of the Perf instance, extracted from window.perf.instance_url
.
perf-scaffold-sk.scss
: Defines the styles for the scaffold.
//perf/modules/themes:themes_sass_lib
. It defines the layout, including the sidebar width and the main content area's width (using calc(99vw - var(--sidebar-width))
to avoid horizontal scrollbars caused by 100vw
including the scrollbar width). It also styles the navigation links and other elements within the scaffold.perf-scaffold-sk-demo.html
& perf-scaffold-sk-demo.ts
: Provide a demonstration page for the scaffold.
perf-scaffold-sk-demo.ts
initializes a mock window.perf
object with various settings and then injects an instance of <perf-scaffold-sk>
with some placeholder content (including a div
with id="sidebar_help"
) into the perf-scaffold-sk-demo.html
page.Workflow: Initializing and Rendering a Page with the Scaffold
<perf-scaffold-sk>
as its top-level layout element. html <!-- new_query_page.html --> <body> <perf-scaffold-sk> <!-- Content specific to the New Query page --> <query-composer-sk></query-composer-sk> <div id="sidebar_help"> <p>Tips for creating new queries...</p> </div> </perf-scaffold-sk> </body>
PerfScaffoldSk
element's connectedCallback
fires.perf-scaffold-sk.ts
: - Temporarily moves <query-composer-sk>
and <div id="sidebar_help">...</div>
out of perf-scaffold-sk
. - Renders its own internal template (header with title, login, theme chooser; sidebar with nav links; empty main area; footer with error toast). ` ... ... <-- Placeholder for sidebar help ... <-- Placeholder for main contentid=“help”>element within the
sidebar. - A
MutationObserverstarts listening for any further children added directly to
`.
The final rendered structure (simplified) would look something like:
perf-scaffold-sk └── app-sk ├── header │ ├── h1.name (Instance Title) │ ├── div.spacer │ ├── alogin-sk │ └── theme-chooser-sk ├── aside#sidebar │ ├── div#links │ │ ├── a (New Query) │ │ ├── a (Favorites) │ │ └── ... (other nav links) │ ├── div#help │ │ └── div#sidebar_help (Content from original page) │ │ └── <p>Tips for creating new queries...</p> │ └── div#chat ├── main │ └── query-composer-sk (Content from original page) └── footer └── error-toast-sk
The picker-field-sk
module provides a custom HTML element that serves as a stylized text input field with an associated dropdown menu for selecting from a predefined list of options. This component is designed to offer a user-friendly way to pick a single value from potentially many choices, enhancing the user experience in forms or selection-heavy interfaces.
Core Functionality and Design:
The primary goal of picker-field-sk
is to present a familiar text input that, upon interaction (focus or click), reveals a filterable list of valid options. This addresses the need for a compact and efficient way to select an item, especially when the number of options is large.
The implementation leverages the Vaadin ComboBox component (@vaadin/combo-box
) for its underlying dropdown and filtering capabilities. This choice was made to utilize a well-tested and feature-rich component, avoiding the need to reimplement complex dropdown logic, keyboard navigation, and accessibility features. picker-field-sk
then wraps this Vaadin component, applying custom styling and providing a simplified API tailored to its specific use case.
Key Responsibilities and Components:
picker-field-sk.ts
: This is the heart of the module, defining the PickerFieldSk
custom element which extends ElementSk
.
label
: A string that serves as both the visual label above the input field and the placeholder text within it when empty. This provides context to the user about the expected input.options
: An array of strings representing the valid choices the user can select from. The component dynamically adjusts the width of the dropdown overlay to accommodate the longest option, ensuring readability.helperText
: An optional string displayed below the input field, typically used for providing additional guidance or information to the user.value-changed
: This custom event is dispatched whenever the selected value in the combo box changes. This includes selecting an item from the dropdown, typing a value that matches an option (due to autoselect
), or clearing the input. The new value is available in event.detail.value
. This event is crucial for parent components to react to user selections.focus()
: Programmatically sets focus to the input field.openOverlay()
: Programmatically opens the dropdown list of options. This is useful for guiding the user or for integrating with other UI elements.disable()
: Makes the input field read-only, preventing user interaction.enable()
: Removes the read-only state, allowing user interaction.clear()
: Clears the current value in the input field.setValue(val: string)
: Programmatically sets the value of the input field.getValue()
: Retrieves the current value of the input field.lit-html
for templating. The template renders a <vaadin-combo-box>
element and binds its properties and events to the PickerFieldSk
element's state.calculateOverlayWidth()
private method dynamically adjusts the --vaadin-combo-box-overlay-width
CSS custom property. It iterates through the options
to find the longest string and sets the overlay width to be slightly larger than this string, ensuring all options are fully visible without truncation. This is a key usability enhancement. User provides options --> PickerFieldSk.options setter | V calculateOverlayWidth() | V Find max option length | V Set --vaadin-combo-box-overlay-width CSS property
picker-field-sk.scss
: Contains the SASS styles for the component.
vaadin-combo-box
and its shadow parts (e.g., ::part(label)
, ::part(input-field)
, ::part(items)
) to customize its appearance to match the application's theme (including dark mode support).--vaadin-field-default-width
, --vaadin-combo-box-overlay-width
, and --lumo-text-field-size
are used to control the dimensions and sizing of the Vaadin component..darkmode picker-field-sk
, adjusting colors for labels, helper text, and input fields to ensure proper contrast and visual integration.index.ts
: A simple entry point that imports and thereby registers the picker-field-sk
custom element, making it available for use in HTML.
picker-field-sk-demo.html
& picker-field-sk-demo.ts
: These files create a demonstration page for the picker-field-sk
component.
picker-field-sk-demo.html
includes instances of the picker-field-sk
element and buttons to trigger its various functionalities (focus, fill, open overlay, disable/enable).picker-field-sk-demo.ts
contains JavaScript to initialize the demo elements with sample data (a large list of “speedometer” options to showcase performance with many items) and to wire up the buttons to the corresponding methods of the PickerFieldSk
instances. This allows developers to visually inspect and interact with the component.Workflow Example: User Selects an Option
<picker-field-sk>
and sets its label
and options
properties. <picker-field-sk .label="Fruit" .options=${['Apple', 'Banana', 'Cherry']}></picker-field-sk>
picker-field-sk
input. User clicks/focuses input --> vaadin-combo-box internally handles focus/click | V vaadin-combo-box displays dropdown with options
vaadin-combo-box
filters the displayed options based on the typed text.User selects "Banana" --> vaadin-combo-box updates its internal value | V vaadin-combo-box emits 'value-changed' event
vaadin-combo-box
within picker-field-sk
emits its native value-changed
event. - The onValueChanged
method in PickerFieldSk
catches this event. - PickerFieldSk
then dispatches its own value-changed
custom event, with the selected value in event.detail.value
. picker-field-sk.onValueChanged(vaadinEvent) | V Dispatch new CustomEvent('value-changed', { detail: { value: vaadinEvent.detail.value }})
value-changed
event on the <picker-field-sk>
element, receives the event and can act upon the selected value. Parent component listens for 'value-changed' --> Accesses event.detail.value | V Update application state
This layered approach, building upon the Vaadin ComboBox, provides a robust and themeable selection component while abstracting away the complexities of the underlying library for the consumers of picker-field-sk
.
pinpoint-try-job-dialog-sk
)The pinpoint-try-job-dialog-sk
module provides a user interface element for initiating Pinpoint A/B try jobs.
Purpose:
The primary reason for this module‘s existence within the Perf application is to allow users to request additional trace data for specific benchmark runs. While Pinpoint itself supports a wider range of try job use cases, this dialog is specifically tailored for this trace generation scenario. It’s important to note that this component is considered a legacy feature, and future development should favor the newer Pinpoint frontend.
How it Works:
The dialog is designed to gather the necessary parameters from the user to construct and submit a Pinpoint A/B try job request. This process involves:
alogin-sk
to determine the logged-in user. The user's email is included in the try job request.CreateLegacyTryRequest
object. This object encapsulates all the necessary information for the Pinpoint backend.testPath
(e.g., master/benchmark_name/story_name
) is parsed to extract the configuration (e.g., benchmark_name
) and the benchmark (e.g., story_name
).story
is typically the last segment of the testPath
.extra_test_args
field is formatted to include the user-provided tracing arguments./_/try/
endpoint with the JSON payload.jobUrl
for the newly created Pinpoint job. This URL is then displayed to the user, allowing them to navigate to the Pinpoint UI to monitor the job's progress.Workflow:
User Interaction (e.g., click on chart tooltip) | V Dialog Pre-populated with context (testPath, commits) | V pinpoint-try-job-dialog-sk.open() | V User reviews/modifies input fields (Base Commit, Exp. Commit, Trace Args) | V User clicks "Send to Pinpoint" | V [pinpoint-try-job-dialog-sk] - Gathers input values - Retrieves logged-in user via alogin-sk - Constructs `CreateLegacyTryRequest` JSON - Sends POST request to /_/try/ | V [Backend Pinpoint Service] - Processes the request - Creates A/B try job - Returns jobUrl (success) or error | V [pinpoint-try-job-dialog-sk] - Displays spinner during request - On Success: - Displays link to the created Pinpoint job (jobUrl) - Hides spinner - On Error: - Displays error message - Hides spinner
Key Components/Files:
pinpoint-try-job-dialog-sk.ts
: This is the core TypeScript file that defines the custom element's logic.PinpointTryJobDialogSk
class: Extends ElementSk
and manages the dialog's state, user input, and interaction with the Pinpoint API.template
: Defines the HTML structure of the dialog using lit-html
. This includes input fields for commits and tracing arguments, a submit button, a spinner for loading states, and a link to the created Pinpoint job.connectedCallback()
: Initializes the dialog, sets up event listeners (e.g., for form submission, closing the dialog on outside click), and fetches the logged-in user's information.setTryJobInputParams(params: TryJobPreloadParams)
: Allows external components to pre-fill the dialog's input fields. This is crucial for integrating the dialog with other parts of the Perf UI, like chart tooltips.open()
: Displays the modal dialog.closeDialog()
: Closes the modal dialog.postTryJob()
: This is the central method for handling the job submission. It reads values from the input fields, constructs the CreateLegacyTryRequest
payload, and makes the fetch
call to the Pinpoint API. It also handles the UI updates based on the API response (showing the job URL or an error message).TryJobPreloadParams
interface: Defines the structure for the parameters used to pre-populate the dialog.pinpoint-try-job-dialog-sk.scss
: Contains the SASS/CSS styles for the dialog, ensuring it aligns with the application's visual theme. It styles the input fields, buttons, and the overall layout of the dialog.index.ts
: A simple entry point that imports and registers the pinpoint-try-job-dialog-sk
custom element.BUILD.bazel
: Defines the build rules for the module, specifying its dependencies (e.g., elements-sk
components like select-sk
, spinner-sk
, alogin-sk
, and Material Web components) and how it should be compiled.Design Decisions:
bisect-dialog-sk
: The dialog's structure and initial functionality were adapted from an existing bisect dialog. This likely accelerated development by reusing common patterns for dialog interactions and API calls.CreateLegacyTryRequest
object is fully constructed on the client-side before being sent to the backend. This gives the frontend more control over the request parameters.<dialog>
HTML element provides built-in modal behavior, simplifying the implementation of showing and hiding the dialog.spinner-sk
component provides visual feedback to the user while the API request is in progress.This component serves as a bridge for users of the Perf application to leverage Pinpoint's capabilities for generating detailed trace information, even as the broader Pinpoint tooling evolves.
The pivot-query-sk
module provides a custom HTML element for users to configure and interact with pivot table requests. Pivot tables are a powerful data summarization tool, and this element allows users to define how data should be grouped, what aggregate operations should be performed, and what summary statistics should be displayed.
The core of the module is the PivotQuerySk
class, which extends ElementSk
. This class manages the state of the pivot request and renders the UI for user interaction. It leverages other custom elements like multi-select-sk
and select-sk
to provide intuitive input controls.
Key Design Choices and Implementation Details:
pivot-changed
, whenever the user modifies any part of the pivot request. This allows consuming applications to react to changes in real-time. The event detail (PivotQueryChangedEventDetail
) contains the updated pivot.Request
object or null
if the current configuration is invalid. This decouples the UI component from the application logic that processes the pivot request.PivotQuerySk
element uses Lit's html
templating for rendering. It maintains internal state for the _pivotRequest
(the current pivot configuration) and _paramset
(the available options for grouping). When these properties are set or updated, the _render()
method is called to re-render the component, ensuring the UI reflects the current state.createDefaultPivotRequestIfNull()
method ensures that if _pivotRequest
is initially null
, it's initialized with a default valid structure before any user interaction attempts to modify it. This prevents errors and provides a sensible starting point._paramset
and the existing _pivotRequest
. The allGroupByOptions()
method is particularly noteworthy as it ensures that even if the _paramset
changes, any currently selected group_by
keys in the _pivotRequest
are still displayed as options. This prevents accidental data loss during _paramset
updates. It achieves this by concatenating keys from both sources, sorting, and then filtering out duplicates.pivotRequest
getter includes a call to validatePivotRequest
(from pivotutil
). This ensures that the component only returns a valid pivot.Request
object. If the current configuration is invalid, it returns null
. This promotes data integrity.Responsibilities and Key Components:
pivot-query-sk.ts
: This is the main file defining the PivotQuerySk
custom element.
PivotQuerySk
class:pivot.Request
object, which defines the grouping, operation, and summary statistics for a pivot table.ParamSet
as input, which provides the available keys for the “group by” selection. This ParamSet
likely originates from the dataset being analyzed.multi-select-sk
.select
element.multi-select-sk
.pivot-changed
custom event when the user modifies the pivot request.PivotQueryChangedEventDetail
type: Defines the structure of the data passed in the pivot-changed
event.PivotQueryChangedEventName
constant: The string name of the custom event.groupByChanged
, operationChanged
, summaryChanged
): These methods are triggered by user interactions with the respective UI elements. They update the internal _pivotRequest
and then call emitChangeEvent
.emitChangeEvent()
: Constructs and dispatches the pivot-changed
event.pivotRequest
, paramset
): Provide controlled access to the element's core data, triggering re-renders when set.pivot-query-sk.scss
: Contains the styling for the pivot-query-sk
element. It ensures a consistent look and feel, leveraging styles from themes_sass_lib
and select_sass_lib
. The layout is primarily flex-based to arrange the different selection components.
pivot-query-sk-demo.html
and pivot-query-sk-demo.ts
: These files provide a demonstration page for the pivot-query-sk
element.
pivot-query-sk
.pivot.Request
data and a ParamSet
. It also includes an event listener for pivot-changed
to display the selected pivot configuration as JSON, illustrating how to consume the element's output.Workflow for User Interaction and Event Emission:
Initialization:
pivot-query-sk
element is created.paramset
(available grouping keys) and optionally an initial pivotRequest
.User Modifies a Selection (e.g., changes a “group by” option):
multi-select-sk
(for “group by”) emits a selection-changed
event.PivotQuerySk.groupByChanged()
is called.createDefaultPivotRequestIfNull()
ensures _pivotRequest
is not null._pivotRequest.group_by
is updated based on the new selection.emitChangeEvent()
is called.Event Emission:
emitChangeEvent()
:pivotRequest
(which might be null
if invalid).CustomEvent
named pivot-changed
.detail
of the event is the current (potentially validated) pivotRequest
.Application Responds:
pivot-changed
events on the pivot-query-sk
element or one of its ancestors, receives the event.event.detail
(the pivot.Request
) to update its data display, fetch new data, or perform other actions.This flow can be visualized as:
User Interaction (e.g., click on multi-select) | v Internal element event (e.g., @selection-changed from multi-select-sk) | v PivotQuerySk Event Handler (e.g., groupByChanged) | v Update internal _pivotRequest state | v PivotQuerySk.emitChangeEvent() | v Dispatch "pivot-changed" CustomEvent (with pivot.Request as detail) | v Consuming Application's Event Listener | v Application processes the new pivot.Request
The pivot-table-sk
module provides a custom HTML element, <pivot-table-sk>
, designed to display pivoted data in a tabular format. This element is specifically for DataFrames that have been pivoted and contain summary values, as opposed to summary traces (which would be displayed in a plot).
Core Functionality and Design
The primary purpose of pivot-table-sk
is to present complex, multi-dimensional data in an understandable and interactive table. The “why” behind its design is to offer a user-friendly way to explore summarized data that arises from pivoting operations.
Key design considerations include:
DataFrame
(from //perf/modules/json:index_ts_lib
) and a pivot.Request
(also from //perf/modules/json:index_ts_lib
) as input. The pivot.Request
is crucial as it dictates how the DataFrame
was originally pivoted, including the group_by
keys, the main operation
, and the summary
operations.group_by
keys and the summary
operations.pivot.Request
is suitable for display as a pivot table (using validateAsPivotTable
from //perf/modules/pivotutil:index_ts_lib
). This prevents rendering errors or confusing displays if the input data structure isn't appropriate.Key Components and Files
pivot-table-sk.ts
: This is the heart of the module, defining the PivotTableSk
custom element.
PivotTableSk
class:ElementSk
(from //infra-sk/modules/ElementSk:index_ts_lib
).DataFrame
(df
), pivot.Request
(req
), and the original query
string.KeyValues
type and keyValuesFromTraceSet
function: This is a critical internal data structure. KeyValues
is an object where keys are trace keys (e.g., ',arch=x86,config=8888,'
) and values are arrays of strings. These string arrays represent the values of the parameters specified in req.group_by
, in the same order. For example, if req.group_by
is ['config', 'arch']
, then for the trace ',arch=arm,config=8888,'
, the corresponding KeyValues
entry would be ['8888', 'arm']
. This transformation is performed by keyValuesFromTraceSet
and is essential for rendering the “key” columns of the table and for sorting by these keys.SortSelection
class: Represents the sorting state of a single column. It stores:column
: The index of the column.kind
: Whether the column represents ‘keyValues’ (from group_by
) or ‘summaryValues’ (from summary
operations).dir
: The sort direction (‘up’ or ‘down’).toggleDirection
, buildCompare
(to create a JavaScript sort comparison function based on its state), and encode
/decode
for serialization.SortHistory
class: Manages the overall sorting state of the table.history
) of SortSelection
objects.SortSelection
is moved to the front of the history
array, and its direction is toggled.buildCompare
in SortHistory
creates a composite comparison function that iterates through the SortSelection
objects in history
. The first SortSelection
determines the primary sort order. If it results in a tie, the second SortSelection
is used to break the tie, and so on. This creates the effect of a stable sort across multiple user interactions without needing a true stable sort algorithm for each click.encode
/decode
methods to serialize the entire sort history (e.g., for persisting sort state in a URL).set()
method: The primary way to provide data to the component. It initializes keyValues
, sortHistory
, and the main compare
function. It can also accept an encodedHistory
string to restore a previous sort state.lit-html
for templating.queryDefinition()
: Renders the contextual information about the query and pivot operations.tableHeader()
, keyColumnHeaders()
, summaryColumnHeaders()
: Generate the table header row, including sort icons.sortArrow()
: Dynamically displays the correct sort icon (up arrow, down arrow, or neutral sort icon) based on the current SortHistory
.tableRows()
, keyRowValues()
, summaryRowValues()
: Generate the data rows of the table, applying the current sort order.displayValue()
: Formats numerical values for display, converting a special sentinel value (MISSING_DATA_SENTINEL
from //perf/modules/const:const_ts_lib
) to ‘-’.change
event when the user sorts the table. The event detail (PivotTableSkChangeEventDetail
) is the encoded SortHistory
string. This allows parent components to react to sort changes and potentially persist the state.paramset-sk
to display the query parameters.arrow-drop-down-icon-sk
, arrow-drop-up-icon-sk
, sort-icon-sk
) for the sort indicators.//perf/modules/json:index_ts_lib
for DataFrame
, TraceSet
, pivot.Request
types.//perf/modules/pivotutil:index_ts_lib
for operationDescriptions
and validateAsPivotTable
.//perf/modules/paramtools:index_ts_lib
for fromKey
(to parse trace keys into parameter sets).//infra-sk/modules:query_ts_lib
for toParamSet
(to convert a query string into a ParamSet
).pivot-table-sk.scss
: Provides the styling for the pivot-table-sk
element, including table borders, padding, text alignment, and cursor styles for interactive elements. It leverages themes from //perf/modules/themes:themes_sass_lib
.
index.ts
: A simple entry point that imports and thereby registers the pivot-table-sk
custom element.
pivot-table-sk-demo.html
& pivot-table-sk-demo.ts
:
pivot-table-sk
element.pivot-table-sk-demo.ts
creates sample DataFrame
and pivot.Request
objects and uses them to populate instances of pivot-table-sk
on the demo page. This is crucial for development and visual testing. It demonstrates valid use cases, cases with invalid pivot requests, and cases with null DataFrames to ensure the component handles these scenarios gracefully.Test Files (pivot-table-sk_test.ts
, pivot-table-sk_puppeteer_test.ts
):
pivot-table-sk_test.ts
(Karma test): Contains unit tests for the PivotTableSk
element and its internal logic, particularly the SortSelection
and SortHistory
classes. It verifies:change
event is emitted with the correct encoded history).buildCompare
functions in SortSelection
and SortHistory
produce the correct sorting results for various data types and sort directions.encode
and decode
methods for SortSelection
and SortHistory
work correctly, allowing for round-tripping of sort state.keyValuesFromTraceSet
function correctly transforms TraceSet
data based on the pivot.Request
.pivot-table-sk_puppeteer_test.ts
(Puppeteer test): Performs end-to-end tests by loading the demo page in a headless browser.Workflow Example: User Sorting the Table
Initial State:
pivot-table-sk
element is initialized with a DataFrame
, a pivot.Request
, and an optional initial encodedHistory
string.pivot-table-sk
creates a SortHistory
object. If encodedHistory
is provided, SortHistory.decode()
is called. Otherwise, a default sort order is established (usually based on the order of summary columns, then key columns, all initially ‘up’).SortHistory.buildCompare()
generates the initial comparison function.sort-icon-sk
.User Clicks a Column Header (e.g., “config” key column):
- `changeSort(columnIndex, 'keyValues')` is called within `pivot-table-sk`. - `this.sortHistory.selectColumnToSortOn(columnIndex, 'keyValues')` is invoked: - The `SortSelection` for the "config" column is found in `this.sortHistory.history`. - It's removed from its current position. - Its `direction` is toggled (e.g., from 'up' to 'down'). - This updated `SortSelection` is prepended to `this.sortHistory.history`. `Before: [SummaryCol0(up),
SummaryCol1(up), KeyCol0(config, up), KeyCol1(arch, up)] Click on KeyCol0 (config): After: [KeyCol0(config, down), SummaryCol0(up), SummaryCol1(up), KeyCol1(arch, up)] -
this.compare = this.sortHistory.buildCompare(...)`is called. A new composite comparison function is generated. Now, rows will primarily be sorted by “config” (descending). Ties will be broken by “SummaryCol0” (ascending), then “SummaryCol1” (ascending), and finally “KeyCol1” (ascending).
CustomEvent('change')
is dispatched. Theevent.detail
contains this.sortHistory.encode()
, which is a string representation of the new sort order (e.g., “dk0-su0-su1-ku1”).this.\_render()
is called, re-rendering the table with the new sort order. The “config” column header now shows an arrow-drop-down-icon-sk
.User Clicks Another Column Header (e.g., “avg” summary column):
- The process repeats. The `SortSelection` for the "avg" column is moved to the front of `this.sortHistory.history` and its direction is toggled. `Before: [KeyCol0(config, down), SummaryCol0(avg, up), SummaryCol1(sum,
up), KeyCol1(arch, up)] Click on SummaryCol0 (avg): After: [SummaryCol0(avg, down), KeyCol0(config, down), SummaryCol1(sum, up), KeyCol1(arch, up)]` - The table is re-rendered, now primarily sorted by “avg” (descending), with ties broken by “config” (descending), then “sum” (ascending), then “arch” (ascending).
This multi-level sorting, driven by the SortHistory
maintaining the sequence of user sort actions, is a key aspect of the “how” behind the pivot-table-sk
's user experience. It aims to provide a powerful yet familiar way to analyze pivoted data.
The pivotutil
module provides utility functions and constants for working with pivot table requests. Its primary purpose is to ensure the validity and integrity of pivot requests before they are processed, and to offer human-readable descriptions for pivot operations. This centralization of pivot-related logic helps maintain consistency and simplifies the handling of pivot table configurations across different parts of the application.
index.ts
: This is the core file of the module and contains all the exported functionalities.
operationDescriptions
:
avg
, std
). To improve user experience and make UIs more understandable, a mapping to human-readable names is necessary.pivot.Operation
enum values (imported from ../json
) and values are their corresponding descriptive strings (e.g., “Mean”, “Standard Deviation”). This allows for easy lookup and display of operation names.validatePivotRequest(req: pivot.Request | null): string
:
pivot.Request
object.null
. If so, it returns an error message.group_by
property is present and is an array with at least one element. A pivot table fundamentally relies on grouping data, so this is a mandatory field.Input: pivot.Request | null | V Is request null? --(Yes)--> Return "Pivot request is null." | (No) V Is req.group_by null or empty? --(Yes)--> Return "Pivot must have at least one GroupBy." | (No) V Return "" (Valid)
validateAsPivotTable(req: pivot.Request | null): string
:
validatePivotRequest
to ensure the basic structure of the request is valid. If validatePivotRequest
returns an error, that error is immediately returned.summary
property of the request is present and is an array with at least one element. Summary operations (like sum, average, etc.) are essential for generating the aggregated values displayed in a pivot table. Without them, the request might be valid for plotting individual traces grouped by some criteria, but not for a typical pivot table with summarized data.summary
array is missing or empty, an error message is returned. Otherwise, an empty string is returned.Input: pivot.Request | null | V Call validatePivotRequest(req) --> invalidMsg | V Is invalidMsg not empty? --(Yes)--> Return invalidMsg | (No) V Is req.summary null or empty? --(Yes)--> Return "Must have at least one Summary operation." | (No) V Return "" (Valid for pivot table)
index_test.ts
: This file contains unit tests for the functions in index.ts
.
pivotutil
module.chai
assertion library to define test cases.validatePivotRequest
, it tests scenarios like:null
request.group_by
being null
.group_by
being an empty array.validateAsPivotTable
, it builds upon the validatePivotRequest
checks and adds tests for: _ summary
being null
. _ summary
being an empty array. * A valid request with at least one summary operation. Each test asserts whether the validation functions return an empty string (for valid inputs) or a non-empty error message string (for invalid inputs) as expected.The design decision to separate validatePivotRequest
and validateAsPivotTable
allows for more granular validation. Some parts of an application might only need the basic validation (e.g., ensuring data can be grouped), while others specifically require summary operations for display in a tabular format. This separation provides flexibility. The use of descriptive error messages aids in debugging and user feedback.
The plot-google-chart-sk
module provides a custom element for rendering interactive time-series charts using Google Charts. It is designed to display performance data, including anomalies and user-reported issues, and allows users to interact with the chart through panning, zooming, and selecting data points.
Key Responsibilities:
DataTable
objects, which are consumed from a Lit context (dataTableContext
). This DataTable
typically contains time-series data where the first column is a commit identifier (e.g., revision number or timestamp), the second is a date object, and subsequent columns represent different data traces.dataframeAnomalyContext
and dataframeUserIssueContext
).side-panel-sk
) that displays a legend for the plotted traces. Users can toggle the visibility of individual traces using checkboxes in the side panel.Design Decisions and Implementation Choices:
@google-web-components/google-chart
library for the core charting functionality. This provides a robust and feature-rich charting engine.DataTable
, anomaly information, and loading states from parent components or a centralized data store. This promotes a decoupled architecture.v-resizable-box-sk
: A dedicated component for the vertical selection box used in the “deltaY” mode. It calculates and displays the difference between the start and end points of the drag.drag-to-zoom-box-sk
: Handles the visual representation of the selection box during the drag-to-zoom interaction. It manages the display and dimensions of the box as the user drags.side-panel-sk
: Encapsulates the legend and trace visibility controls. This separation of concerns keeps the main chart component focused on plotting.selection-changed
, plot-data-mouseover
, plot-data-select
) to notify parent components of user interactions and chart state changes. This allows for integration with other parts of an application.md-icon
elements on top of the chart. Their positions are calculated based on the chart‘s layout and the data point coordinates. This approach avoids modifying the Google Chart’s internal rendering and allows for more flexible styling and interaction with these markers.this.chart
) and chart layout information (this.cachedChartArea
) to avoid redundant lookups.removedLabelsCache
to efficiently hide and show traces without reconstructing the entire DataView
each time.navigationMode
property (pan
, deltaY
, dragToZoom
) manages the current mouse interaction state. This simplifies event handling by directing mouse events to the appropriate logic based on the active mode.determineYAxisTitle
method attempts to create a meaningful Y-axis title by examining the unit
and improvement_direction
parameters from the trace names. It displays these only if they are consistent across all visible traces.Key Components/Files:
plot-google-chart-sk.ts
: The core component that orchestrates the chart display and interactions.DataTable
, AnomalyMap
, UserIssueMap
) via Lit context.side-panel-sk
to manage trace visibility.side-panel-sk.ts
: Implements the side panel containing the legend and checkboxes for toggling trace visibility.DataTable
.plot-google-chart-sk
.v-resizable-box-sk
.v-resizable-box-sk.ts
: A custom element for the vertical resizable selection box used during the delta calculation (Shift-click + drag).drag-to-zoom-box-sk.ts
: A custom element for the selection box used during the drag-to-zoom interaction (Ctrl-click + drag).plot-google-chart-sk-demo.ts
and plot-google-chart-sk-demo.html
: Provide a demonstration page showcasing the plot-google-chart-sk
element with sample data. This is crucial for development and testing.index.ts
: Serves as the entry point for the module, importing and registering all the custom elements defined within.Key Workflows:
Initial Chart Rendering: DataTable
(from context) -> plot-google-chart-sk
-> updateDataView()
-> Creates google.visualization.DataView
-> Sets columns based on domain
(commit/date) and visible traces -> updateOptions()
configures chart appearance (colors, axes, view window) -> plotElement.value.view = view
and plotElement.value.options = options
-> Google Chart renders. -> onChartReady()
: -> Caches chart object. -> Calls drawAnomaly()
, drawUserIssues()
, drawXbar()
.
Panning: User mousedown (not Shift or Ctrl) -> onChartMouseDown()
: navigationMode = 'pan'
User mousemove -> onWindowMouseMove()
: -> Calculates deltaX based on mouse movement and current domain. -> Updates this.selectedRange
. -> Calls updateOptions()
to update chart's horizontal view window. -> Dispatches selection-changing
event. User mouseup -> onWindowMouseUp()
: -> Dispatches selection-changed
event. -> navigationMode = null
.
Drag-to-Zoom: User Ctrl + mousedown -> onChartMouseDown()
: navigationMode = 'dragToZoom'
-> zoomRangeBox.value.initializeShow()
: Displays the drag box. User mousemove -> onWindowMouseMove()
: -> zoomRangeBox.value.handleDrag()
: Updates the drag box dimensions. User mouseup -> onChartMouseUp()
: -> Calculates zoom boundaries based on drag box and isHorizontalZoom
. -> zoomRangeBox.value.hide()
. -> showResetButton = true
. -> updateBounds()
: Updates chart's hAxis.viewWindow
or vAxis.viewWindow
. -> navigationMode = null
.
Delta Calculation (Shift-Click): User Shift + mousedown -> onChartMouseDown()
: navigationMode = 'deltaY'
-> deltaRangeBox.value.show()
: Displays the vertical resizable box. User mousemove -> onWindowMouseMove()
: -> deltaRangeBox.value.updateSelection()
: Updates box height and calculates delta. -> Updates sidePanel.value
with delta values. User Shift + mousedown (again) or regular mousedown -> onChartMouseDown()
: -> Toggles deltaRangeOn
. If finishing, sidePanel.value.showDelta = true
. User mouseup (after dragging) -> onChartMouseUp()
: -> Updates sidePanel.value
with final delta values. -> navigationMode = null
.
Toggling Trace Visibility: User clicks checkbox in side-panel-sk
-> side-panel-sk
dispatches side-panel-selected-trace-change
. plot-google-chart-sk
listens (sidePanelCheckboxUpdate()
): -> Updates this.removedLabelsCache
. -> Calls updateDataView()
: -> Recreates DataView
, hiding/showing columns based on removedLabelsCache
. -> Updates chart.
Anomaly/Issue Display: anomalyMap
or userIssues
(from context) changes -> plot-google-chart-sk.willUpdate()
-> plotElement.value.redraw()
(if chart already rendered). Chart redraw triggers onChartReady()
: -> drawAnomaly()
/ drawUserIssues()
: -> Iterates through anomalies/issues for visible traces. -> Calculates screen coordinates (x, y) using chart.getChartLayoutInterface().getXLocation()
and getYLocation()
. -> Clones template md-icon
elements from slots. -> Positions the icons absolutely within anomalyDiv
or userIssueDiv
.
This detailed explanation should provide a solid understanding of the plot-google-chart-sk
module's purpose, architecture, and key functionalities.
The plot-simple-sk
module provides a custom HTML element for rendering 2D line graphs. It's designed to be interactive, allowing users to zoom, inspect individual data points, and highlight specific traces.
Core Functionality and Design:
The primary goal of plot-simple-sk
is to display time-series data or any data that can be represented as a set of (x, y) coordinates. Key design considerations include:
Performance: To handle potentially large datasets and maintain a smooth user experience, the element employs several optimization techniques:
<canvas>
elements stacked on top of each other.traces
) is for drawing the static parts of the plot: the lines, axes, and dots representing data points. These are pre-rendered into Path2D
objects for efficient redrawing.overlay
) is for dynamic elements that change frequently, such as crosshairs, zoom selection rectangles, and hover highlights. This separation prevents unnecessary redrawing of the entire plot.Path2D
Objects: Trace lines and data point dots are converted into Path2D
objects. This allows the browser to optimize their rendering, leading to faster redraws compared to repeatedly issuing drawing commands.kd.ts
) is used. This data structure allows for efficient searching of the closest point in a 2D space, crucial for interactivity with potentially many data points.recalcSearchTask
) or redrawing after a zoom (zoomTask
) are often scheduled using window.setTimeout
. This prevents these potentially expensive operations from blocking the main thread and ensures they only happen when necessary, improving responsiveness. requestAnimationFrame
is used for mouse movement updates to synchronize with the browser's repaint cycle.Interactivity:
detailsZoomRangesStack
), allowing users to progressively zoom in and potentially (though not explicitly stated as a current feature for out) navigate back through zoom levels.trace_focused
event.trace_selected
event.xbar
) or regions (bands
) can be drawn on the plot to mark specific x-axis values or ranges.Appearance and Theming:
width
and height
attributes of the custom element and uses ResizeObserver
to redraw when its dimensions change.window.devicePixelRatio
to render crisply on high-DPI displays by drawing to a larger canvas and then scaling it down with CSS transforms.elements-sk/themes
and uses CSS variables for colors (e.g., --on-background
, --success
, --failure
), allowing its appearance to be customized by the surrounding application's theme. It listens for theme-chooser-toggle
events to redraw when the theme changes.Key Files and Responsibilities:
plot-simple-sk.ts
: This is the heart of the module, defining the PlotSimpleSk
custom element.
ctx
for traces, overlayCtx
for overlays).lineData
(traces and their pre-rendered paths), labels
(x-axis tick information), current _zoom
state, detailsZoomRangesStack
for detail view zooms, hoverPt
, crosshair
, highlighted
traces, _xbar
, _bands
, and _anomalyDataMap
.theme-chooser-toggle
and ResizeObserver
events.addLines
, deleteLines
, removeAll
, and properties like highlight
, xbar
, bands
, zoom
, anomalyDataMap
, userIssueMap
, and dots
to control the plot's content and appearance.d3-scale
(specifically scaleLinear
) to map data coordinates (domain) to canvas pixel coordinates (range) and vice-versa. Functions like rectFromRange
and rectFromRangeInvert
handle these transformations for rectangular regions.PathBuilder
: A helper class to construct Path2D
objects for trace lines and dots based on the current scales and data.SearchBuilder
: A helper class to prepare the data points for the KDTree
by converting source coordinates to canvas coordinates.SummaryArea
and DetailArea
interfaces and manages their respective rectangles, axes, and scaling ranges.kd.ts
: Implements a k-d tree.
O(log n)
on average for search) to find the nearest data point to a given mouse coordinate on the canvas. This is crucial for interactivity like mouse hovering and clicking to identify specific points on traces.x
and y
properties), a distance metric function, and the dimensions to consider (['x', 'y']
). The nearest()
method is the primary interface used by plot-simple-sk.ts
.ticks.ts
: Responsible for generating appropriate tick marks and labels for the time-based x-axis.
Date
objects representing the x-axis values, it determines a sensible set of tick positions and their corresponding formatted string labels (e.g., “Jul”, “Mon, 8 AM”, “10:30 AM”).Intl.DateTimeFormat
. It aims for a reasonable number of ticks (MIN_TICKS
to MAX_TICKS
) and uses a fixTicksLength
function to thin out the ticks if too many are generated.ticks()
function returns an array of objects, each with an x
(index in the original data) and a text
(formatted label).plot-simple-sk.scss
: Contains the SASS/CSS styles for the plot-simple-sk
element.
themes.scss
and uses CSS variables (e.g., var(--on-background)
, var(--background)
) to ensure the plot‘s colors match the application’s theme.index.ts
: A simple entry point that imports plot-simple-sk.ts
to ensure the custom element is defined and registered with the browser.
Demo Files (plot-simple-sk-demo.html
, plot-simple-sk-demo.ts
, plot-simple-sk-demo.scss
):
plot-simple-sk
element's capabilities.plot-simple-sk-demo.ts
) contains the logic to interact with the plot, such as adding random trace data, highlighting traces, zooming, clearing the plot, and displaying anomaly markers. It also logs events emitted by the plot.Key Workflows:
Initialization and Rendering: ElementSk constructor
-> connectedCallback
-> render
render
-> _render
(lit-html template instantiation) -> canvas.getContext
-> updateScaledMeasurements
-> updateScaleRanges
-> recalcDetailPaths
-> recalcSummaryPaths
-> drawTracesCanvas
Adding Data (addLines
): addLines
-> Convert MISSING_DATA_SENTINEL
to NaN
-> Store in this.lineData
-> updateScaleDomains
-> recalcSummaryPaths
-> recalcDetailPaths
-> drawTracesCanvas
recalcDetailPaths
/ recalcSummaryPaths
-> For each line: PathBuilder
creates linePath
and dotsPath
. recalcDetailPaths
-> recalcSearch
(schedules recalcSearchImpl
) recalcSearchImpl
-> SearchBuilder
populates points -> new KDTree
Mouse Hover and Focus: mousemove
event -> this.mouseMoveRaw
updated raf
loop -> checks this.mouseMoveRaw
-> eventToCanvasPt
-> If this.pointSearch
: this.pointSearch.nearest(pt)
-> updates this.hoverPt
-> dispatches trace_focused
event -> Updates this.crosshair
(based on shift key and hoverPt
) -> drawOverlayCanvas
Zooming via Summary Drag: mousedown
on summary -> this.inZoomDrag = 'summary'
-> this.zoomBegin
set mousemove
(while dragging) -> raf
loop: -> eventToCanvasPt
-> clampToRect
(summary area) -> this.summaryArea.range.x.invert(pt.x)
to get source x -> this.zoom = [min_x, max_x]
(triggers _zoomImpl
via setter task) _zoomImpl
(after timeout) -> updateScaleDomains
-> recalcDetailPaths
-> drawTracesCanvas
mouseup
/ mouseleave
-> dispatches zoom
event -> this.inZoomDrag = 'no-zoom'
Zooming via Detail Area Drag: mousedown
on detail -> this.inZoomDrag = 'details'
-> this.zoomRect
initialized mousemove
(while dragging) -> raf
loop: -> eventToCanvasPt
-> clampToRect
(detail area) -> Updates this.zoomRect.width/height
-> drawOverlayCanvas
(to show the dragging rectangle) mouseup
/ mouseleave
-> dispatchZoomEvent
-> doDetailsZoom
doDetailsZoom
-> If zoom box is large enough: this.detailsZoomRangesStack.push(rectFromRangeInvert(...))
-> _zoomImpl
Drawing Process:
drawTracesCanvas()
:
this.ctx
).drawXAxis
(for detail).this.lineData
: draws line.detail.linePath
and line.detail.dotsPath
if this.dots
is true.drawXAxis
again (to draw labels outside the clipped region).this.summary
and not dragging zoom:drawYAxis
(for detail).drawOverlayCanvas()
.drawOverlayCanvas()
:
this.overlayCtx
).this.summary
:drawXBar
, drawBands
.detailsZoomRangesStack
is not empty.this._zoom
.drawXBar
, drawBands
.drawUserIssues
, drawAnomalies
.this.zoomRect
(dashed).This structured approach allows plot-simple-sk
to be both feature-rich and performant for visualizing and interacting with 2D data plots.
The plot-summary-sk
module provides a custom HTML element, <plot-summary-sk>
, designed to display a summary plot of performance data and allow users to select a range within that plot. This is particularly useful for visualizing trends over time or commit ranges and enabling interactive exploration of the data.
At its core, plot-summary-sk
leverages the Google Charts library to render an area chart. It's designed to work with a DataFrame
, a data structure commonly used in Perf for holding timeseries data. The element can display data based on either commit offsets or timestamps (domain
attribute).
Key Responsibilities:
summary_selected
custom event when the user makes or changes a selection. This event carries details about the selected range (start, end, value, and domain).DataFrameRepository
to fetch and append new data.Key Components/Files:
plot-summary-sk.ts
: This is the main file defining the PlotSummarySk
LitElement.DataFrame
data (from dataTableContext
) and renders it using <google-chart>
.selectedTrace
property.h-resizable-box-sk
element to provide the visual selection rectangle and handles the mouse events for drawing and resizing this selection.google-chart-ready
events to ensure operations like setting a selection programmatically happen after the chart is fully initialized.controlTemplate
for optional “load more data” buttons, which interact with a DataFrameRepository
(consumed via dataframeRepoContext
).ResizeObserver
to detect when the element is resized and triggers a chart redraw.h-resizable-box-sk.ts
: This file defines the HResizableBoxSk
LitElement, a reusable component for creating a horizontally resizable and draggable selection box.plot-summary-sk
component. This promotes reusability and simplifies the main component's logic.div
(.surface
) that represents the selection.mousedown
events on its container to initiate an action: ‘draw’ (if clicking outside the existing selection), ‘drag’ (if clicking inside the selection), ‘left’ (if clicking on the left edge), or ‘right’ (if clicking on the right edge).mousemove
events on the window
to update the selection‘s position and size during an action. This ensures interaction continues even if the mouse moves outside the element’s bounds.mouseup
events on the window
to finalize the action and emits a selection-changed
event with the new range.cursor: move
, cursor: ew-resize
).selectionRange
property (getter and setter) allows programmatic control and retrieval of the selection, defined by begin
and end
pixel offsets relative to the component.plot-summary-sk.css.ts
: Contains the CSS styles for the plot-summary-sk
element, defined as a Lit css
tagged template literal.h-resizable-box-sk
) absolutely over the chart, and styles the optional loading buttons and loading indicator.plot-summary-sk-demo.ts
and plot-summary-sk-demo.html
: Provide a demonstration page for the plot-summary-sk
element.plot-summary-sk
with different configurations (e.g., domain
, selectionType
). The TypeScript file generates sample DataFrame
objects, converts them to Google DataTable format, and populates the plot elements. It also listens for summary_selected
events and displays their details.*.test.ts
, *_puppeteer_test.ts
):plot-summary-sk_test.ts
, h_resizable_box_sk_test.ts
) verify individual component logic, such as programmatic selection and state changes. They often mock dependencies like the Google Chart library or use test utilities to generate data.plot-summary-sk_puppeteer_test.ts
) perform end-to-end testing by interacting with the component in a real browser environment. They simulate user actions like mouse drags and verify the emitted event details and visual output (via screenshots).Key Workflows:
Initialization and Data Display:
[DataFrame via context or property] | v plot-summary-sk | v [willUpdate/updateDataView] --> Converts DataFrame to Google DataTable | v <google-chart> --> Renders area chart | v [google-chart-ready event] --> plot-summary-sk may apply cached selection
User Selecting a Range by Drawing:
User mousedowns on <plot-summary-sk> (outside existing selection in h-resizable-box-sk) | v h-resizable-box-sk (action = 'draw') | v User moves mouse (mousemove on window) | v h-resizable-box-sk --> Updates selection box dimensions | v User mouseups (mouseup on window) | v h-resizable-box-sk --> Emits 'selection-changed' (with pixel coordinates) | v plot-summary-sk (onSelectionChanged) | v Converts pixel coordinates to data values (commit/timestamp) | v Emits 'summary_selected' (with data values)
User Resizing/Moving an Existing Selection:
User mousedowns on <h-resizable-box-sk> (on edge for resize, or middle for drag) | v h-resizable-box-sk (action = 'left'/'right'/'drag') | v User moves mouse (mousemove on window) | v h-resizable-box-sk --> Updates selection box position/dimensions | v User mouseups (mouseup on window) | v h-resizable-box-sk --> Emits 'selection-changed' | v plot-summary-sk (onSelectionChanged) --> Converts & Emits 'summary_selected'
Programmatic Selection: Application calls plotSummarySkElement.Select(beginHeader, endHeader) OR Application sets plotSummarySkElement.selectedValueRange = { begin: val1, end: val2 } | v plot-summary-sk | v Caches selectedValueRange (important if chart not ready) | v [If chart ready] --> Converts data values to pixel coordinates | v Sets selectionRange on <h-resizable-box-sk>
If the chart is not ready when selectedValueRange
is set, the conversion and setting of the h-resizable-box-sk
selection is deferred until the google-chart-ready
event fires.
The design separates the concerns of data plotting (Google Charts), interactive range selection UI (h-resizable-box-sk
), and the overall orchestration and data conversion logic (plot-summary-sk
). This makes the system more modular and easier to maintain. The use of LitElement and contexts allows for a reactive programming model and clean integration with other parts of the Perf application.
The point-links-sk
module is a custom HTML element designed to display links associated with specific data points in a performance analysis context. These links often originate from ingestion files and can include commit details, build logs, or other relevant resources.
The primary purpose of this module is to provide users with quick access to contextual information related to a data point. It achieves this by:
Key Responsibilities and Components:
point-links-sk.ts
: This is the core file defining the PointLinksSk
custom element.ElementSk
from infra-sk
.load()
method: This is the main public method responsible for initiating the process of fetching and displaying links. It takes the current commit ID, the previous commit ID, a trace ID, and arrays of keys to identify which links should be treated as commit ranges and which are general “useful links”. It handles the logic for checking the cache, fetching data from the API, processing commit ranges, and updating the display.getLinksForPoint()
and invokeLinksForPointApi()
methods: These private methods handle the actual API interaction to retrieve link data. getLinksForPoint
attempts to fetch from /_/links/
first and falls back to /_/details/?results=false
if the initial attempt fails. It also includes workarounds for specific data inconsistencies (e.g., V8 and WebRTC URLs).renderPointLinks()
and renderRevisionLink()
methods: These methods, along with the static template
, use lit-html
to generate the HTML structure for displaying the links.getCommitIdFromCommitUrl
, getRepoUrlFromCommitUrl
, getFormattedCommitRangeText
, extractUrlFromStringForFuchsia
): These provide utility functions for parsing URLs and formatting text.commitPosition
, displayUrls
, displayTexts
): These store the state of the component, such as the current commit and the links to be displayed.point-links-sk.scss
: Provides the styling for the point-links-sk
element, ensuring a consistent look and feel, including styling for Material Design icons and buttons.index.ts
: A simple entry point that imports and thereby registers the point-links-sk
custom element.point-links-sk-demo.html
& point-links-sk-demo.ts
: These files set up a demonstration page for the point-links-sk
element. The point-links-sk-demo.ts
file uses fetch-mock
to simulate the backend API, allowing developers to test the component's behavior in isolation. It demonstrates how to instantiate and use the point-links-sk
element with different configurations.Workflow for Loading and Displaying Links:
The typical workflow when the load()
method is called can be visualized as:
Caller invokes pointLinksSk.load(currentCID, prevCID, traceID, rangeKeys, usefulKeys, cachedLinks) | V Check if links for (currentCID, traceID) exist in `cachedLinks` | +-- YES --> Use cached links | | | V | Render links | +-- NO ---> Fetch links for `currentCID` from API (`getLinksForPoint`) | V If `rangeKeys` are provided: | Fetch links for `prevCID` from API (`getLinksForPoint`) | For each key in `rangeKeys`: | Extract current commit hash from `currentCID` links | Extract previous commit hash from `prevCID` links | If hashes are different: | Generate "commit range" URL (e.g., .../+log/prevHash..currentHash) | Else (hashes are same): | Use current commit URL | Add to `displayUrls` and `displayTexts` | V If `usefulKeys` are provided: | For each key in `usefulKeys`: | Add corresponding link from `currentCID` links to `displayUrls` | V Update cache with newly fetched/generated links for (currentCID, traceID) | V Render links
This module is designed to be flexible, allowing the consuming application to specify which types of links should be processed for commit ranges and which should be displayed as direct links. The inclusion of error handling (via errorMessage
) and the fallback mechanism in API calls (/_/links/
then /_/details/
) make it more robust.
The progress
module provides a mechanism for initiating and monitoring the status of long-running tasks on the server. This is crucial for user experience, as it allows the client to display progress information and avoid appearing unresponsive during lengthy operations.
The core of this module is the startRequest
function. This function is designed to handle asynchronous server-side processes that might take a significant amount of time to complete.
How startRequest
Works:
Initiation:
startingURL
with a given body
. This request typically triggers the long-running task on the server.spinner-sk
element is provided, it's activated to visually indicate that a process is underway.Polling:
progress.SerializedProgress
. This object contains:status
: Indicates whether the task is “Running” or “Finished” (or potentially other states like “Error”).messages
: An array of key-value pairs providing more detailed information about the current state of the task (e.g., current step, progress percentage).url
: If the status
is “Running”, this URL is used for the next polling request to get updated progress.results
: If the status
is “Finished”, this field contains the final output of the long-running process.status
is “Running”, startRequest
will schedule a setTimeout
to make a GET request to the url
provided in the response after a specified period
. This creates a polling loop.Callback and Completion:
callback
function can be provided. This function is invoked after each successful fetch (both the initial request and every polling update), receiving the progress.SerializedProgress
object. This allows the UI to update with the latest progress information.status
that is not “Running” (e.g., “Finished”).startRequest
resolves with the final progress.SerializedProgress
object.spinner-sk
was provided, it is deactivated.Error Handling:
startRequest
is rejected with an error.Workflow Diagram:
Client UI startRequest Function Server ---------- --------------------- ------ | | | -- Call startRequest --> | | | -- POST to startingURL (body) --> | | | | | | <-- Response (SerializedProgress) -- | | | | -- (Optional) Activate -- | | Spinner | | | | | -- If status is "Running": --------> Schedule setTimeout(period) | | | | | V | | -- GET to progress.url -----------> | | | | | | <-- Response (SerializedProgress) -- | | | | | | --- (Invoke callback) ---------> Client UI (Update progress) | | | | | --- Loop back to "If status is 'Running'" | | | | -- If status is "Finished": -------> Resolve Promise | | | | -- (Optional) Deactivate | <----------------------------------- | Spinner | | | | <-- Promise Resolves ---- | | (SerializedProgress) |
Key Files:
progress.ts
:
startRequest
: The primary function that orchestrates the entire progress monitoring flow. It encapsulates the logic for making the initial POST request and subsequent GET requests for polling. The use of a single processFetch
internal function is a design choice to reduce code duplication, as the response handling logic is identical for both the initial and polling fetches.messagesToErrorString
: A utility function designed to extract a user-friendly error message from the messages
array within SerializedProgress
. It prioritizes messages with the key “Error” but falls back to concatenating all messages if no specific error message is found. This ensures that some form of feedback is available even if the server doesn't explicitly flag an error.messagesToPreString
: Formats messages for display, typically within a <pre>
tag, by putting each key-value pair on a new line. This is useful for presenting detailed progress logs.messageByName
: Allows retrieval of a specific message's value by its key from the messages
array, with a fallback if the key is not found. This is useful for extracting specific pieces of information from the progress updates (e.g., the current step number).elements-sk/modules/spinner-sk
: Used to visually indicate that a background task is in progress.perf/modules/json
: Provides the progress.SerializedProgress
type definition, ensuring consistency in how progress information is structured between the client and server.progress_test.ts
:
progress.ts
module.startRequest
correctly handles different server response scenarios: immediate completion, one or more polling steps, and network errors.messagesToErrorString
, messageByName
) with various inputs.fetch-mock
to simulate server responses, allowing for controlled testing of the asynchronous network interactions without relying on an actual backend. This is crucial for creating reliable and fast unit tests.The design of this module prioritizes a clear separation of concerns. startRequest
focuses on the communication and polling logic, while the utility functions provide convenient ways to interpret and display the progress information received from the server. The use of Promises simplifies handling asynchronous operations, and the optional callback provides flexibility for updating the UI in real-time.
query-chooser-sk
)The query-chooser-sk
module provides a user interface element for selecting and modifying query parameters. It's designed to offer a compact way to display the currently active query and provide a mechanism to change it through a dialog.
The primary goal of query-chooser-sk
is to present a summarized view of the current query and allow users to edit it in a more detailed interface. This is achieved by:
paramset-sk
element. This gives users a quick overview of the active filters.query-sk
in a dialog: The dialog contains a query-sk
element. This is where the user can interactively build or modify their query by selecting values for different parameters.query-sk
element, query-count-sk
is used to display how many items match the currently constructed query. This provides immediate feedback to the user as they refine their selection.query-chooser-sk
listens for query-change
events from the embedded query-sk
element. When a change occurs, query-chooser-sk
updates its own current_query
and re-renders, effectively propagating the change. It also emits its own query-change
event, allowing parent components to react to query modifications.This design separates the concerns of displaying the current state from the more complex interaction of query building. The dialog provides a focused environment for query modification without cluttering the main UI.
query-chooser-sk.ts
: This is the core TypeScript file defining the QueryChooserSk
custom element.paramset-sk
), the query editing interface (query-sk
), and the match count display (query-count-sk
).current_query
, paramset
, key_order
, and count_url
which are essential for its operation and for configuring its child elements._editClick
and _closeClick
methods handle the opening and closing of the dialog._queryChange
method is crucial for reacting to changes in the embedded query-sk
element and updating the current_query
.query-chooser-sk.html
(template within query-chooser-sk.ts
): This Lit HTML template defines the structure of the element.div
with class row
to display the “Edit” button and the paramset-sk
summary.div
with id dialog
acts as the container for query-sk
, query-count-sk
, and the “Close” button. The visibility of this dialog is controlled by adding/removing the display
class.query-chooser-sk.scss
: This file provides the styling for the element. It ensures proper layout of the button, summary, and the dialog content. It also includes theming support.index.ts
: A simple entry point that imports and registers the query-chooser-sk
custom element.query-chooser-sk-demo.html
/ query-chooser-sk-demo.ts
: These files provide a demonstration page for the element, showcasing its usage with sample data and event handling. fetchMock
is used in the demo to simulate the count_url
endpoint.query-chooser-sk_puppeteer_test.ts
: Contains Puppeteer tests to verify the rendering and basic functionality of the element.The typical workflow for a user interacting with query-chooser-sk
is as follows:
User sees current query summary & "Edit" button | | (User clicks "Edit") V Dialog appears, showing: - `query-sk` (for selecting parameters/values) - `query-count-sk` (displaying number of matches) - "Close" button | | (User interacts with `query-sk`, changing selections) V `query-sk` emits "query-change" event | V `query-chooser-sk` (_queryChange method): - Updates its `current_query` property/attribute - Re-renders to reflect new `current_query` in summary & `query-count-sk` - Emits its own "query-change" event (for parent components) | | (User is satisfied with the new query) V User clicks "Close" | V Dialog is hidden | V `query-chooser-sk` displays the updated query summary.
The paramset
attribute is crucial as it provides the available keys and values that query-sk
will use to render its selection interface. The key_order
attribute influences the order in which parameters are displayed within query-sk
. The count_url
is passed directly to query-count-sk
to fetch the number of matching items for the current query.
The query-count-sk
module provides a custom HTML element designed to display the number of results matching a given query. Its primary purpose is to offer a dynamic and responsive way to inform users about the scope of their queries in real-time, without requiring a full page reload or complex UI updates. This is particularly useful in applications where users frequently refine search criteria and need immediate feedback on the impact of those changes.
The core functionality revolves around the QueryCountSk
class, which extends ElementSk
. This class manages the state of the displayed count, handles asynchronous data fetching, and updates the UI accordingly.
Key Components and Design Decisions:
query-count-sk.ts
: This is the heart of the module.current_query
or url
attributes change, the element initiates a POST request to the specified url
.current_query
, and a default time window of the last 24 hours (begin
and end
timestamps). This design choice implies that the element is typically used for querying recent data.AbortController
. This is a crucial design decision for performance and responsiveness, especially when users rapidly change query parameters.count
(number of matches) and a paramset
(a read-only representation of parameters related to the query)._count
property stores the fetched count as a string, and _requestInProgress
is a boolean flag indicating whether a fetch operation is currently active. This flag is used to show/hide a loading spinner (spinner-sk
).lit-html
for efficient template rendering. The template displays the _count
and the spinner-sk
conditionally.paramset-changed
custom event is dispatched. This event carries the paramset
received from the server. This allows other components on the page to react to changes in the available parameters based on the current query results. This decoupling is a key design aspect for building modular UIs.errorMessage
utility (likely from perf/modules/errorMessage
). AbortErrors are handled gracefully by simply stopping the current operation without displaying an error, as this usually means the user initiated a new action.query-count-sk.scss
: Provides styling for the element, ensuring the count and spinner are displayed appropriately. The display: inline-block
and flexbox layout for the internal div
are chosen for simple alignment of the count and spinner.query-count-sk-demo.html
and query-count-sk-demo.ts
: These files provide a demonstration and testing environment for the query-count-sk
element.fetch-mock
to simulate server responses, allowing for isolated testing of the component's behavior.url
, current_query
).<error-toast-sk>
in the demo suggests that this is the intended mechanism for displaying errors surfaced by errorMessage
.index.ts
: A simple entry point that imports and registers the query-count-sk
custom element, making it available for use in an HTML page.Workflow for Displaying Query Count:
Initialization:
query-count-sk
element is added to the DOM.url
attribute (pointing to the backend endpoint) is set.Page query-count-sk | | |--(Set url)---->|
Query Update:
current_query
attribute is set or updated (e.g., by user input in another part of the application).Page query-count-sk | | |--(Set current_query)-->|
Data Fetching:
attributeChangedCallback
(or connectedCallback
on initial load) triggers the _fetch()
method._requestInProgress
is set to true
, and the spinner becomes visible.this.url
with the current_query
and time range.query-count-sk Server | | |--(Set _requestInProgress=true)------>| (Spinner shows) | | |----(POST / {q: current_query, ...})-->|
Response Handling:
Success:
{ count: N, paramset: {...} }
._count
is updated with N
._requestInProgress
is set to false
(spinner hides).paramset-changed
event is dispatched with the paramset
.query-count-sk Server | | |<----(HTTP 200, {count, paramset})----| | | |--(Update _count, _requestInProgress=false)-->| (Spinner hides, count updates) | | |--(Dispatch 'paramset-changed')------>| (Other components may react)
Error (e.g., network issue, server error):
_requestInProgress
is set to false
(spinner hides).error-toast-sk
).query-count-sk Server | | |<----(HTTP Error or Network Error)----| | | |--(Set _requestInProgress=false)------>| (Spinner hides) | | |--(Display error message)------------>|
Abort:
catch
block for AbortError
is entered.The design emphasizes responsiveness by aborting stale requests and provides a clear visual indication of ongoing activity (the spinner). The paramset-changed
event promotes loose coupling between components, allowing other parts of the application to adapt based on the query results without direct dependencies on query-count-sk
's internal implementation.
The regressions-page-sk
module provides a user interface for viewing and managing performance regressions. It allows users to select a “subscription” (often representing a team or area of ownership, like “Sheriff Config 1”) and then displays a list of detected performance anomalies (regressions or improvements) associated with that subscription.
The core functionality revolves around fetching and displaying this data in a user-friendly way.
Key Responsibilities and Components:
regressions-page-sk.ts
: This is the main TypeScript file that defines the RegressionsPageSk
custom HTML element.
State
interface, stateReflector
): The component maintains its UI state (selected subscription, whether to show triaged items or improvements, and a flag for using a Skia-specific backend) in the state
object. The stateReflector
utility is crucial here. It synchronizes this internal state with the URL query parameters. This means a user can bookmark a specific view (e.g., a particular subscription with improvements shown) and share it, or refresh the page and return to the same state.stateReflector
? It provides a clean way to manage application state that needs to be persistent across page loads and shareable via URLs, without manually parsing and updating the URL.fetchRegressions
, init
):init
method is called during component initialization and whenever the state changes significantly (like selecting a new subscription). It fetches the list of available subscriptions (sheriff lists) from either a legacy endpoint (/_/anomalies/sheriff_list
) or a Skia-specific one (/_/anomalies/sheriff_list_skia
) based on the state.useSkia
flag. The fetched subscriptions are then sorted alphabetically for display in a dropdown.fetchRegressions
method is responsible for fetching the actual anomaly data. It constructs a query based on the current state
(selected subscription, filters for triaged/improvements, and a cursor for pagination). It also chooses between legacy and Skia-specific anomaly list endpoints. The fetched anomalies are then appended to the cpAnomalies
array, and if a cursor is returned, a “Show More” button is made visible.template
, _render
): The component uses lit-html
for templating. The template
static method defines the HTML structure, which includes:<select id="filter">
) to choose a subscription.<subscription-table-sk>
to display details about the selected subscription and its associated alerts.<anomalies-table-sk>
to display the list of anomalies/regressions.spinner-sk
) to indicate loading states._render()
method (implicitly called by ElementSk
when properties change) re-renders the component with the latest data.filterChange
, triagedChange
, improvementChange
): These methods handle user interactions like selecting a subscription or toggling filters. They update the component's state
, trigger stateHasChanged
(which in turn updates the URL and can re-fetch data), and then explicitly call fetchRegressions
and _render
to reflect the changes.getRegTemplate
, regRowTemplate
): There's also code related to displaying regressions
directly in a table within this component (the regressions
property and getRegTemplate
). However, the primary display of anomalies seems to be delegated to anomalies-table-sk
. This older regression display logic might be for a previous version or a specific use case not currently active in the demo. The isRegressionImprovement
static method determines if a given regression object represents an improvement based on direction and cluster type.anomalies-table-sk
(external dependency): This component is responsible for rendering the detailed table of anomalies. regressions-page-sk
fetches the anomaly data and then passes it to anomalies-table-sk
for display. This promotes modularity, separating data fetching/management from presentation.
subscription-table-sk
(external dependency): This component displays information about the currently selected subscription, including any configured alerts. Similar to anomalies-table-sk
, it receives data from regressions-page-sk
.
regressions-page-sk.scss
: Provides styling for the regressions-page-sk
component, including colors for positive/negative changes and styles for spinners and buttons.
regressions-page-sk-demo.html
and regressions-page-sk-demo.ts
: These files set up a demonstration page for the regressions-page-sk
component.
regressions-page-sk-demo.ts
is particularly important for understanding how the component is intended to be used and tested. It initializes a global window.perf
object with configuration settings that the main component might rely on (though direct usage isn‘t evident in regressions-page-sk.ts
itself, it’s a common pattern in Perf).fetchMock
to simulate API responses for /users/login/status
, /_/subscriptions
, and /_/regressions
(which seems to be an older endpoint pattern compared to what regressions-page-sk.ts
uses). This mocking is crucial for creating a standalone demo environment.fetchMock
? It allows developers to work on and test the UI component without needing a live backend, ensuring predictable data and behavior for demos and tests.Workflow for Displaying Regressions:
Initialization (connectedCallback
, init
):
regressions-page-sk
element is added to the DOM.stateReflector
is set up to read initial state from URL or use defaults.init()
is called:<select id="filter">
).User Selects a Subscription (filterChange
):
filterChange("Sheriff Config 2")
is triggered.state.selectedSubscription
is updated to “Sheriff Config 2”.cpAnomalies
is cleared, anomalyCursor
is reset.stateHasChanged()
is called, updating the URL (e.g., ?selectedSubscription=Sheriff%20Config%202
).fetchRegressions()
is called.Fetching Anomalies (fetchRegressions
):
/_/anomalies/anomaly_list?sheriff=Sheriff%20Config%202
(or the Skia equivalent).Displaying Anomalies:
this.cpAnomalies
.subscriptionTable
is updated with subscription details and alerts from the response.anomaliesTable
(the anomalies-table-sk
instance) is populated with this.cpAnomalies
.User Action Component State API Interaction UI Update ----------- --------------- --------------- --------- Page Load | V regressions-page-sk.init() | state = {selectedSubscription:''} V fetch('/_/anomalies/sheriff_list') -> ["Sheriff1", "Sheriff2"] | subscriptionList = ["Sheriff1", "Sheriff2"] V Populate dropdown Disable filter buttons Selects "Sheriff1" | V regressions-page-sk.filterChange("Sheriff1") | state = {selectedSubscription:'Sheriff1', ...} | (URL updates via stateReflector) V regressions-page-sk.fetchRegressions() | anomaliesLoadingSpinner = true V fetch('/_/anomalies/anomaly_list?sheriff=Sheriff1') -> {anomaly_list: [...], anomaly_cursor: 'cursor123'} | cpAnomalies = [...], anomalyCursor = 'cursor123', showMoreAnomalies = true | anomaliesLoadingSpinner = false V Update anomaliesTable Update subscriptionTable Show "Show More" button Enable filter buttons Clicks "Show More" | V regressions-page-sk.fetchRegressions() (called by button click) | showMoreLoadingSpinner = true V fetch('/_/anomalies/anomaly_list?sheriff=Sheriff1&anomaly_cursor=cursor123') -> {anomaly_list: [more...], anomaly_cursor: null} | cpAnomalies = [all...], anomalyCursor = null, showMoreAnomalies = false | showMoreLoadingSpinner = false V Update anomaliesTable (append) Hide "Show More" button
Toggling Filters (e.g., “Show Triaged”, triagedChange
):
triagedChange()
is triggered.state.showTriaged
is toggled.stateHasChanged()
updates the URL (e.g., ?selectedSubscription=Sheriff%20Config%202&showTriaged=true
).fetchRegressions()
is called again, this time with triaged=true
in the query.The design separates concerns: regressions-page-sk
handles overall page logic, state, and orchestration of data fetching, while specialized components like anomalies-table-sk
and subscription-table-sk
handle the rendering of specific data views. The use of stateReflector
ensures the UI state is bookmarkable and shareable. The demo files with fetchMock
are critical for isolated development and testing of the UI component.
The report-page-sk
module is designed to display a detailed report page for performance anomalies. Its primary purpose is to provide users with a comprehensive view of selected anomalies, including their associated graphs and commit information, facilitating the analysis and understanding of performance regressions or improvements.
At its core, the report-page-sk
element orchestrates the display of several key pieces of information. It fetches anomaly data from a backend endpoint (/_/anomalies/group_report
) based on URL parameters (like revision, anomaly IDs, bug ID, etc.). This data is then used to populate an anomalies-table-sk
element, which presents a tabular view of the anomalies.
A crucial design decision is the use of an AnomalyTracker
class. This class is responsible for managing the state of each anomaly, including whether it's selected (checked) by the user, its associated graph, and the relevant time range for graphing. This separation of concerns keeps the main ReportPageSk
class cleaner and focuses its responsibilities on rendering and user interaction.
When an anomaly is selected in the table, report-page-sk
dynamically generates and displays an explore-simple-sk
graph for that anomaly. The explore-simple-sk
element is configured to show data around the anomaly's occurrence, typically a week before and after, to provide context. If multiple anomalies are selected, their graphs are displayed, and their heights are adjusted to fit the available space. A key feature is the synchronized X-axis across all displayed graphs, ensuring a consistent time scale for comparison.
The page also attempts to identify and display common commits related to the selected anomalies. It fetches commit details using the lookupCids
function and highlights commits that appear to be “roll” commits (e.g., “Roll repo from hash to hash”). For these roll commits, it provides a link to the underlying commit or the parent commit if the roll pattern is not directly parseable from the commit message, which can be helpful for developers to trace the source of a change.
Key Components and Responsibilities:
report-page-sk.ts
: This is the main TypeScript file defining the ReportPageSk
custom element.
ReportPageSk
class:/_/defaults/
) and then anomaly data based on URL parameters.AnomalyTracker
instance to manage the state of individual anomalies (selected, graphed, time range).anomalies-table-sk
and explore-simple-sk
graphs based on user interactions and fetched data. It uses the lit-html
library for templating.anomalies_checked
events from the anomalies-table-sk
to update the displayed graphs. It also handles x-axis-toggled
events from explore-simple-sk
to synchronize the x-axis across multiple graphs.explore-simple-sk
instance, configures its query based on the anomaly's test path, and sets the appropriate time range.spinner-sk
) during data fetching operations.AnomalyTracker
class:AnomalyDataPoint
objects, each containing an Anomaly
, its checked status, its associated ExploreSimpleSk
graph instance (if any), and its Timerange
.AnomalyDataPoint
interface: Defines the structure for storing information about a single anomaly within the AnomalyTracker
.report-page-sk.scss
: Contains the SASS/CSS styles for the report-page-sk
element, including styling for the common commits section and the dialog for displaying all commits (though the dialog itself is not fully implemented in the provided showAllCommitsTemplate
).
Data Fetching Workflow:
ReportPageSk
element is connected to the DOM.rev
, anomalyIDs
, bugID
) are read.fetchAnomalies()
is called./_/anomalies/group_report
with URL parameters in the body.anomaly_list
, timerange_map
, and selected_keys
.AnomalyTracker
is loaded with this data.anomalies-table-sk
is populated.User Interaction Workflow (Selecting an Anomaly):
anomalies-table-sk
.anomalies-table-sk
fires an anomalies_checked
custom event with the anomaly and its checked state.ReportPageSk
listens for this event.updateGraphs()
is called:addGraph()
is called.explore-simple-sk
instance is created and configured.AnomalyTracker
is updated with the new graph instance.AnomalyTracker
is updated to remove the graph reference.updateChartHeights()
is called to adjust the height of all visible graphs.The design emphasizes dynamic content loading and interactive exploration. By using separate custom elements for the table (anomalies-table-sk
) and graphs (explore-simple-sk
), the module maintains a good separation of concerns and leverages reusable components. The AnomalyTracker
further enhances this by encapsulating the state and logic related to individual anomalies.
The revision-info-sk
custom HTML element is designed to display information about anomalies detected around a specific revision. This is particularly useful for understanding the impact of a code change on performance metrics.
The core functionality revolves around fetching and presenting RevisionInfo
objects. A RevisionInfo
object contains details like the benchmark, bot, bug ID, start and end revisions of an anomaly, the associated test, and links to explore the anomaly further.
Key Components and Workflow:
revision-info-sk.ts
: This is the main TypeScript file defining the RevisionInfoSk
element.
State Management: The element maintains its state in a State
object, primarily storing the revisionId
. It utilizes stateReflector
from infra-sk/modules/statereflector
to keep the URL in sync with the element‘s state. This allows users to share links that directly open to a specific revision’s information.
URL change
-> stateReflector updates State.revisionId
-> getRevisionInfo() is called
User types revision ID and clicks "Get Revision Information"
-> State.revisionId updated
-> stateReflector updates URL
-> getRevisionInfo() is called
Data Fetching (getRevisionInfo
): When a revision ID is provided (either via URL or user input), this method is triggered.
spinner-sk
) to indicate loading.fetch
request to the /_/revision/?rev=<revisionId>
endpoint.RevisionInfo
objects, is parsed using jsonOrThrow
.revisionInfos
are stored, and the UI is re-rendered to display the information.Rendering (template
, getRevInfosTemplate
, revInfoRowTemplate
): Lit-html is used for templating.
template
) includes an input field for the revision ID, a button to trigger fetching, a spinner, and a container for the revision information.getRevInfosTemplate
generates an HTML table if revisionInfos
is populated. This table includes a header row with a “select all” checkbox and columns for bug ID, revision range, master, bot, benchmark, and test.revInfoRowTemplate
renders each individual RevisionInfo
as a row in the table. Each row has a checkbox for selection, a link to the bug (if any), a link to explore the anomaly, and the other relevant details.Multi-Graph Functionality: The element allows users to select multiple detected anomaly ranges and view them together on a multi-graph page.
checkbox-sk
) are provided for each revision info row and a “select all” checkbox. The toggleSelectAll
method handles the logic for the master checkbox.updateMultiGraphStatus
: This method is called whenever a checkbox state changes. It checks if any revisions are selected and enables/disables the “View Selected Graph(s)” button accordingly. It also updates the selectAll
state if no individual revisions are checked.getGraphConfigs
: This helper function takes an array of selected RevisionInfo
objects and transforms them into an array of GraphConfig
objects. Each GraphConfig
contains the query string associated with the anomaly.getMultiGraphUrl
: This asynchronous method constructs the URL for the multi-graph view.getGraphConfigs
to get the configurations for the selected revisions.updateShortcut
(from explore-simple-sk
) to generate a shortcut ID for the combined graph configurations. This typically involves a POST request to /_/shortcut/update
.begin
and end
timestamps) encompassing all selected anomalies.anomaly_ids
from the selected revisions to highlight them on the multi-graph page.begin
, end
timestamps, the shortcut
ID, the totalGraphs
, and highlight_anomalies
parameters.viewMultiGraph
: This method is called when the “View Selected Graph(s)” button is clicked.RevisionInfo
objects.getMultiGraphUrl
to generate the redirect URL.window.open(url, '_self')
) to the multi-graph page. If not, it displays an error message.Styling (revision-info-sk.scss
): Provides basic styling for the element, such as left-aligning table headers and styling the spinner.
index.ts
: Simply imports and thereby registers the revision-info-sk
custom element.
Demo Page (revision-info-sk-demo.html
, revision-info-sk-demo.ts
, revision-info-sk-demo.scss
):
revision-info-sk
element.revision-info-sk-demo.ts
file uses fetch-mock
to mock the /_/revision/
API endpoint. This is crucial for demonstrating the element's functionality without needing a live backend. When the demo page loads and the user interacts with the element (e.g., enters a revision ID ‘12345’), the mocked response is returned.Design Decisions and Rationale:
<revision-info-sk>
) promotes reusability across different parts of the Perf application or potentially other Skia web applications.stateReflector
enhances user experience by allowing direct navigation to a revision's details via URL and updating the URL as the user interacts with the element. This makes sharing and bookmarking specific views straightforward.async/await
makes the code easier to read and manage compared to traditional Promise chaining.getMultiGraphUrl
. This separates concerns and makes the process of generating the complex URL clearer. It relies on the explore-simple-sk
module's updateShortcut
function, promoting reuse of existing shortcut generation logic.jsonOrThrow
is used to simplify error handling for fetch requests. The viewMultiGraph
method also includes basic error handling if the URL generation fails.Workflow for Displaying Revision Information:
User Interaction / URL Change | v [revision-info-sk] stateReflector updates internal 'state.revisionId' | v [revision-info-sk] getRevisionInfo() called | +--------------------------------+ | | v v [revision-info-sk] shows spinner [revision-info-sk] makes fetch request to `/_/revision/?rev=<ID>` | | | v | [Backend] processes request, returns RevisionInfo[] | | | v +------------------> [revision-info-sk] receives JSON response, parses with jsonOrThrow | v [revision-info-sk] stores 'revisionInfos', hides spinner | v [revision-info-sk] re-renders using Lit-html templates to display table
Workflow for Viewing Multi-Graph:
User selects one or more revision info rows (checkboxes) | v [revision-info-sk] updateMultiGraphStatus() enables "View Selected Graph(s)" button | v User clicks "View Selected Graph(s)" button | v [revision-info-sk] viewMultiGraph() called | v [revision-info-sk] collects selected RevisionInfo objects | v [revision-info-sk] calls getMultiGraphUrl(selectedRevisions) | +------------------------------------------------------+ | | v v [getMultiGraphUrl] calls getGraphConfigs() to create GraphConfig[] [getMultiGraphUrl] calls updateShortcut(GraphConfig[]) | | (makes POST to /_/shortcut/update) | v | [Backend] returns shortcut ID | | +-------------------------------------> [getMultiGraphUrl] constructs final URL (with begin, end, shortcut, anomaly IDs) | v [viewMultiGraph] receives the multi-graph URL | v [Browser] navigates to the generated multi-graph URL
The split-chart-menu-sk
module provides a user interface element for selecting an attribute by which to split a chart. This is particularly useful in data visualization scenarios where users need to break down aggregated data into smaller, more specific views. For example, in a performance monitoring dashboard, a user might want to see performance metrics split by benchmark, specific test case (story), or sub-component (subtest).
The core functionality revolves around presenting a list of available attributes to the user in a dropdown menu. These attributes are dynamically derived from the underlying data. When an attribute is selected, the component emits an event, allowing other parts of the application to react and update the chart display accordingly.
Key Components and Design:
split-chart-menu-sk.ts
: This is the main TypeScript file that defines the SplitChartMenuSk
LitElement.
context
API (@consume
) to access data from two sources: dataframeContext
and dataTableContext
.dataframeContext
provides the DataFrame
(from //perf/modules/json:index_ts_lib
and //perf/modules/dataframe:dataframe_context_ts_lib
). The DataFrame
is the source from which the list of available attributes for splitting is derived. This design decouples the menu from the specifics of data fetching and management, allowing it to focus solely on the UI aspect of attribute selection. The getAttributes
function (from //perf/modules/dataframe:traceset_ts_lib
) is used to extract these attributes.dataTableContext
provides DataTable
(also from //perf/modules/dataframe:dataframe_context_ts_lib
). While consumed, its direct usage within this specific component‘s rendering logic isn’t immediately apparent in the provided render
method, but it might be used by other parts of the application or for future enhancements.<md-outlined-button>
) labeled “Split By” serves as the trigger to open the menu.<md-menu>
), which is populated with <md-menu-item>
elements, one for each attribute retrieved from the DataFrame
.menuOpen
state property controls the visibility of the menu. Clicking the button toggles this state. The menu also closes itself via the @closed
event.bubbleAttribute
method is called. This method dispatches a custom event named split-chart-selection
.SplitChartSelectionEventDetails
) contains the selected attribute
(a string).bubbles: true
) and pass through shadow DOM boundaries (composed: true
), making it easy for ancestor elements to listen and react to the selection. This event-driven approach is crucial for decoupling the menu from the chart component or any other component that needs to know about the selected split attribute.split-chart-menu-sk.css.ts
(style
). This keeps the component's presentation concerns separate from its logic. The styles ensure the component is displayed as an inline block and sets a default background color, also styling the Material button.split-chart-menu-sk.css.ts
: This file defines the CSS styles for the component using Lit‘s css
tagged template literal. The primary styling focuses on the host element’s positioning and background, and customizing the Material Design button's border radius.
index.ts
: This file simply imports and registers the split-chart-menu-sk
custom element, making it available for use in HTML.
Workflow: Selecting a Split Attribute
Initialization:
split-chart-menu-sk
component is rendered.DataFrame
from the dataframeContext
.getAttributes()
method is called (implicitly via the render method's map function) to populate the list of attributes for the menu.User Interaction:
menuClicked
handler is invoked -> this.menuOpen
becomes true
.<md-menu>
component becomes visible, displaying the list of attributes.User split-chart-menu-sk DataFrame | | | |---Clicks "Split By"->| | | |---Toggles menuOpen=true-->| | | | | |<--Displays Menu-------| | | |
Attribute Selection:
- User clicks on an attribute in the menu (e.g., "benchmark"). - The `click` handler on `<md-menu-item>` calls `this.bubbleAttribute("benchmark")`. - `bubbleAttribute` creates a `CustomEvent('split-chart-selection', {
detail: { attribute: “benchmark” } })`. - The event is dispatched.
``` User split-chart-menu-sk (Parent Component) | | | |---Clicks "benchmark"->| | | |---Calls bubbleAttribute("benchmark")-->| | | | | |---Dispatches "split-chart-selection" event--> (Listens for event) | | | | | | | |---Handles event, updates chart ```
Menu Closes:
<md-menu>
component emits a closed
event.menuClosed
handler is invoked -> this.menuOpen
becomes false
.This design ensures that split-chart-menu-sk
is a self-contained, reusable UI component whose sole responsibility is to provide a way to select a splitting attribute and communicate that selection to the rest of the application via a well-defined event. The use of context for data consumption and custom events for output makes it highly decoupled and easy to integrate.
The demo page (split-chart-menu-sk-demo.html
and split-chart-menu-sk-demo.ts
) demonstrates how to use the component and listen for the split-chart-selection
event. The Puppeteer test (split-chart-menu-sk_puppeteer_test.ts
) provides a basic smoke test and a visual regression test by taking a screenshot.
The subscription-table-sk
module provides a custom HTML element designed to display information about a “subscription” and its associated “alerts”. This is particularly useful in contexts where users need to understand the configuration of automated monitoring or alerting systems.
The core functionality is encapsulated within the subscription-table-sk.ts
file, which defines the SubscriptionTableSk
custom element. This element is built using Lit, a library for creating fast, lightweight web components.
Why and How:
The primary goal is to present complex subscription and alert data in a user-friendly and interactive manner. Instead of a static display, this component allows for toggling the visibility of the detailed alert configurations. This design choice avoids overwhelming the user with too much information upfront, providing a cleaner initial view focused on the subscription summary.
The SubscriptionTableSk
element takes Subscription
and Alert[]
objects as input. The Subscription
object contains general information like name, contact email, revision, bug tracking details (component, hotlists, priority, severity, CC emails). The Alert[]
array holds detailed configurations for individual alerts, including their query parameters, step algorithm, radius, and other specific settings.
Key Responsibilities and Components:
subscription-table-sk.ts
:SubscriptionTableSk
class: This is the heart of the module. It extends ElementSk
, a base class for Skia custom elements.subscription
and alerts
data internally.template
static method): It uses Lit's html
tagged template literal to define the structure and content of the element. It conditionally renders the subscription details and the alerts table based on the available data and the showAlerts
state.showAlerts
is true. This state is toggled by a button.load(subscription: Subscription, alerts: Alert[])
method: This public method is the primary way to feed data into the component. It updates the internal state and triggers a re-render.toggleAlerts()
method: This method flips the showAlerts
boolean flag and triggers a re-render, effectively showing or hiding the alerts table.formatRevision(revision: string)
method: A helper function to display the revision string as a clickable link, pointing to a specific configuration file URL. This improves usability by allowing users to quickly navigate to the source of the configuration.paramset-sk
integration: For displaying the alert query
, it utilizes the paramset-sk
element. The toParamSet
utility function (from infra-sk/modules/query
) is used to convert the query string into a format suitable for paramset-sk
, which then renders it as a structured set of key-value pairs. This enhances readability of complex query strings.subscription-table-sk.scss
): This file defines the visual appearance of the element. It uses SCSS and imports styles from shared libraries (themes_sass_lib
, buttons_sass_lib
, select_sass_lib
) to maintain a consistent look and feel with other Skia elements. The styles focus on clear presentation of information, with distinct sections for subscription details and the alerts table.Workflow: Displaying Subscription and Alerts
subscription-table-sk
is added to the DOM. <subscription-table-sk></subscription-table-sk>
load()
method on the element instance, passing in the Subscription
object and an array of Alert
objects. element.load(mySubscriptionData, myAlertsData);
SubscriptionTableSk
element updates its internal subscription
and alerts
properties.showAlerts
is set to false
by default upon loading new data._render()
method is called (implicitly by Lit or explicitly).template
function generates the HTML:click
event triggers the toggleAlerts()
method. - showAlerts
becomes true
. - _render()
is called again. - The template
function now also renders the <table id="alerts-table">
. - The table header is displayed. - For each Alert
object in ele.alerts
: - A table row (<tr>
) is created. - Cells (<td>
) display various alert properties (step algorithm, radius, k, etc.). - The alert query
is passed to a <paramset-sk>
element for structured display. - The button label changes to “Hide Alert Configurations”.Diagram: Data Flow and Rendering
External Code ---> subscriptionTableSkElement.load(subscription, alerts) | V SubscriptionTableSk Internal State: - this.subscription = subscription - this.alerts = alerts - this.showAlerts = false (initially or after load) | V _render() ------> Lit Template Evaluation | ------------------------------------- | | V (if this.subscription is not null) V (if this.showAlerts is true) Render Subscription Details Render Alerts Table - Name, Email, Revision (formatted link) - Iterate through this.alerts - Bug info, Hotlists, CCs - For each alert: - "Show/Hide Alerts" Button - Display properties in <td> - Use <paramset-sk> for query
Demo Page (subscription-table-sk-demo.html
, subscription-table-sk-demo.ts
)
The demo page serves as an example and testing ground.
subscription-table-sk-demo.html
: Sets up the basic HTML structure, including instances of subscription-table-sk
(one for light mode, one for dark mode to test theming) and buttons to interact with them. It also includes an error-toast-sk
for displaying potential errors.subscription-table-sk-demo.ts
: Contains JavaScript to:subscription-table-sk
element.Subscription
and Alert
data.load()
method on the subscription-table-sk
instances with the sample data.toggleAlerts()
method on the instances.This setup allows developers to see the component in action and verify its functionality with predefined data.
The test-picker-sk
module provides a custom HTML element, <test-picker-sk>
, designed to guide users in selecting a valid trace or test for plotting. It achieves this by presenting a series of dependent input fields, where the options available in each field are dynamically updated based on selections made in previous fields. This ensures that users can only construct valid combinations of parameters.
Core Functionality and Design:
The primary goal of test-picker-sk
is to simplify the process of selecting a specific data series (a “trace” or “test”) from a potentially large and complex dataset. This is often necessary in performance analysis tools where data is categorized by multiple parameters (e.g., benchmark, bot, specific test, sub-test variations).
The design enforces a specific order for filling out these parameters. This hierarchical approach is crucial because the valid options for a parameter often depend on the values chosen for its preceding parameters.
Key Components and Responsibilities:
test-picker-sk.ts
: This is the heart of the module, defining the TestPickerSk
custom element.
FieldInfo
class: This internal class is a simple data structure used to manage the state of each individual input field within the picker. It stores a reference to the PickerFieldSk
element, the parameter name (e.g., “benchmark”, “bot”), and the currently selected value.addChildField
): When a value is selected in a field, and if there are more parameters in the hierarchy, a new PickerFieldSk
input is dynamically added to the UI. The options for this new field are fetched from the backend. This progressive disclosure prevents overwhelming the user with too many options at once.callNextParamList
): The element interacts with a backend endpoint (/_/nextParamList/
). This endpoint is responsible for:_fieldData
, _currentIndex
): The _fieldData
array holds FieldInfo
objects for each parameter field. _currentIndex
tracks which field is currently active or the next to be added.value-changed
, plot-button-clicked
):value-changed
events from its child picker-field-sk
elements. When a value changes, it triggers logic to update subsequent fields and the match count.plot-button-clicked
custom event when the user clicks the “Add Graph” button. This event includes the fully constructed query string representing the selected trace.populateFieldDataFromQuery
): This method allows the picker to be initialized with a pre-existing query string. It will populate the fields sequentially based on the query parameters. If a parameter in the hierarchy is missing or empty in the query, the population stops at that point.onPlotButtonClick
, PLOT_MAXIMUM
): The “Add Graph” button is enabled only when the number of matching traces is within a manageable range (greater than 0 and less than or equal to PLOT_MAXIMUM
). This prevents users from attempting to plot an overwhelming number of traces.picker-field-sk
(Dependency): While not part of this module, test-picker-sk
heavily relies on the picker-field-sk
element. Each parameter in the test picker is represented by an instance of picker-field-sk
. This child component is responsible for displaying a label, an input field, and a dropdown menu of selectable options.
test-picker-sk.scss
: Defines the visual styling for the test-picker-sk
element and its internal components, ensuring a consistent look and feel. It styles the layout of the fields, the match count display, and the plot button.
Workflow: User Selecting a Test
Initialization (initializeTestPicker
):
test-picker-sk
is given an ordered list of parameter names (e.g., ['benchmark', 'bot', 'test']
) and optional default parameters.test-picker-sk
-> Backend (/_/nextParamList/
): Requests options for the first parameter (e.g., “benchmark”) with an empty query.User Interface: Backend: [test-picker-sk] | initializeTestPicker(['benchmark', 'bot', 'test'], {}) | ---> POST /_/nextParamList/ (q="") | (Processes request, queries data source) | <--- {paramset: {benchmark: ["b1", "b2"]}, count: 100} | (Renders first PickerFieldSk for "benchmark" with options "b1", "b2") [Benchmark: [select ▼]] [Matches: 100] [Add Graph (disabled)]
User Selects a Value:
picker-field-sk
for “benchmark” emits a value-changed
event.test-picker-sk
-> Backend: Requests options for the next parameter (“bot”), now including the selection benchmark=b1
in the query.User Interface: [Benchmark: [b1 ▼]] | (value-changed: {value: "b1"}) [test-picker-sk] | ---> POST /_/nextParamList/ (q="benchmark=b1") | (Processes request, filters based on benchmark=b1) | <--- {paramset: {bot: ["botX", "botY"]}, count: 20} | (Renders PickerFieldSk for "bot" with options "botX", "botY") [Benchmark: [b1 ▼]] [Bot: [select ▼]] [Matches: 20] [Add Graph (disabled)]
Process Repeats: This continues for each parameter in the hierarchy.
Final Selection and Plotting:
PLOT_MAXIMUM
, the “Add Graph” button enables.test-picker-sk
emits plot-button-clicked
with the final query (e.g., benchmark=b1&bot=botX&test=testZ
).User Interface: [Benchmark: [b1 ▼]] [Bot: [botX ▼]] [Test: [testZ ▼]] [Matches: 5] [Add Graph (enabled)] | (User clicks "Add Graph") [test-picker-sk] | emits 'plot-button-clicked' (detail: {query: "benchmark=b1&bot=botX&test=testZ"})
Why this Approach?
The test-picker-sk-demo.html
and test-picker-sk-demo.ts
files provide a runnable example of the component, mocking the backend /_/nextParamList/
endpoint to showcase its functionality without needing a live backend. This is essential for development and testing. The Puppeteer and Karma tests (test-picker-sk_puppeteer_test.ts
, test-picker-sk_test.ts
) ensure the component behaves as expected under various conditions.
The /modules/themes
module is responsible for defining the visual styling and theming for the application. It builds upon the base theming provided by infra-sk
and introduces application-specific overrides and additions.
Why and How:
The primary goal of this module is to establish a consistent and branded look and feel across the application. Instead of defining all styles from scratch, it leverages the infra-sk
theming library as a foundation. This promotes code reuse and ensures that common UI elements have a familiar appearance.
The approach taken is to:
themes.scss
file begins by importing the core styles from ../../../infra-sk/themes
. This brings in the foundational design system, including color palettes, typography, spacing, and component styles.https://fonts.googleapis.com/icon?family=Material+Icons
). This makes a wide range of standard icons readily available for use within the application's UI.infra-sk
theme and global changes from elements-sk
components. This means that themes.scss
focuses on styling aspects that are unique to this specific application or require modifications to the default infra-sk
appearance.Key Components and Files:
themes.scss
: This is the central SCSS (Sassy CSS) file for the module.
@import '../../../infra-sk/themes';
: This line incorporates the foundational theme from the infra-sk
library. The relative path indicates that infra-sk
is expected to be a sibling or ancestor directory in the project structure.@import url('https://fonts.googleapis.com/icon?family=Material+Icons');
: This directive pulls in the Material Icons font stylesheet, enabling the use of standard Google Material Design icons throughout the application.body { margin: 0; padding: 0; }
: This is an example of a global override. It resets the default browser margins and padding on the <body>
element, providing a cleaner baseline for layout. This is a common practice to ensure consistent spacing across different browsers. Other application-specific styles would follow this pattern, targeting specific elements or defining new CSS classes.BUILD.bazel
: This file defines how the themes.scss
file is processed and made available to the rest of the application.
sass_library
rule (defined in //infra-sk:index.bzl
) to compile the SCSS into CSS and declare it as a reusable library.load("//infra-sk:index.bzl", "sass_library")
: Imports the necessary Bazel rule for handling SASS compilation.sass_library(name = "themes_sass_lib", ...)
: Defines a SASS library target named themes_sass_lib
.srcs = ["themes.scss"]
: Specifies that themes.scss
is the source file for this library.visibility = ["//visibility:public"]
: Makes this compiled CSS library accessible to any other part of the project.deps = ["//infra-sk:themes_sass_lib"]
: Declares a dependency on the infra-sk
SASS library. This is crucial because themes.scss
imports styles from infra-sk
. The build system needs to know about this dependency to ensure infra-sk
styles are available during the compilation of themes.scss
.Workflow (Styling Application):
Browser Request --> HTML Document | v Link to Compiled CSS (from themes_sass_lib) | v Application of Styles: 1. Base browser styles 2. infra-sk/themes.scss styles (imported) 3. Material Icons styles (imported) 4. modules/themes/themes.scss overrides & additions (applied last, taking precedence) | v Rendered Page with Application-Specific Theme
In essence, this module provides a layered approach to theming. It starts with a robust base, incorporates external resources like icon fonts, and then applies specific customizations to achieve the desired visual identity for the application. The BUILD.bazel
file ensures that these SASS files are correctly processed and made available as CSS to the application during the build process.
This module provides a mechanism for formatting trace details and converting trace strings into query strings. The core idea is to offer a flexible way to represent and interpret trace information, accommodating different formatting conventions, particularly for Chrome-specific trace structures.
The “why” behind this module stems from the need to handle various trace formats. Different systems or parts of the application might represent trace identifiers (which are essentially a collection of parameters) in distinct ways. This module centralizes the logic for translating between these representations. For example, a compact string representation of a trace might be used in URLs or displays, while a more structured ParamSet
is needed for querying data.
The “how” is achieved through an interface TraceFormatter
and concrete implementations. This allows for different formatting strategies to be plugged in as needed. The GetTraceFormatter()
function acts as a factory, returning the appropriate formatter based on the application's configuration (window.perf.trace_format
).
Key Components/Files:
traceformatter.ts
: This is the central file containing the core logic.
TraceFormatter
interface: Defines the contract for all trace formatters. It mandates two primary methods:formatTrace(params: Params): string
: Takes a Params
object (a key-value map representing trace parameters) and returns a string representation of the trace. This is useful for displaying trace identifiers in a user-friendly or system-specific format.formatQuery(trace: string): string
: Takes a string representation of a trace and converts it into a query string (e.g., “key1=value1&key2=value2”). This is crucial for constructing API requests to fetch data related to a specific trace.DefaultTraceFormatter
class: Provides a basic implementation of TraceFormatter
.formatTrace
method generates a string like “Trace ID: ,key1=value1,key2=value2,...”. This is a generic way to represent the trace parameters.formatQuery
method currently returns an empty string, indicating that this default formatter doesn't have a specific logic for converting its trace string representation back into a query.ChromeTraceFormatter
class: Implements TraceFormatter
specifically for traces originating from Chrome's performance infrastructure.ChromeTraceFormatter
? Chrome's performance data often uses a hierarchical, slash-separated string to identify traces (e.g., master/bot/benchmark/test/subtest_1
). This formatter handles this specific convention.keys
array: This private property (['master', 'bot', 'benchmark', 'test', 'subtest_1', 'subtest_2', 'subtest_3']
) defines the expected order of parameters in the Chrome-style trace string. This order is significant for both formatting and parsing.formatTrace(params: Params): string
: It iterates through the predefined keys
and constructs a slash-separated string from the corresponding values in the input params
. Input Params: { master: "m", bot: "b", benchmark: "bm", test: "t" } keys: [ "master", "bot", "benchmark", "test", ... ] Output String: "m/b/bm/t"
formatQuery(trace: string): string
: This is the inverse operation. It takes a slash-separated trace string, splits it, and maps the parts back to the predefined keys
to build a ParamSet
. It then converts this ParamSet
into a standard URL query string. - Handling Statistics (Ad-hoc logic for Chromeperf/Skia bridge): A special piece of logic exists within formatQuery
related to window.perf.enable_skia_bridge_aggregation
. If a trace's ‘test’ value ends with a known statistic suffix (e.g., _avg
, _count
), this suffix is used to determine the stat
parameter in the output query, and the suffix is removed from the ‘test’ parameter. If no such suffix is found, a default stat
value of ‘value’ is added. This logic is a temporary measure to bridge formatting differences between Chromeperf and Skia systems and is intended to be removed once Chromeperf is deprecated. Input Trace String (enable_skia_bridge_aggregation = true): "master/bot/benchmark/test_name_max/subtest" Splits into: ["master", "bot", "benchmark", "test_name_max", "subtest"] Processed ParamSet: { master: ["master"], bot: ["bot"], benchmark: ["benchmark"], test: ["test_name"], stat: ["max"], subtest_1: ["subtest"] } Output Query: "master=master&bot=bot&benchmark=benchmark&test=test_name&stat=max&subtest_1=subtest"
STATISTIC_SUFFIX_TO_VALUE_MAP
: A map used by ChromeTraceFormatter
to translate common statistic suffixes (like “avg”, “count”) found in test names to their corresponding “stat” parameter values (like “value”, “count”).traceFormatterRecords
: A record (object map) that associates TraceFormat
enum values (like ''
for default, 'chrome'
for Chrome-specific) with their corresponding TraceFormatter
instances. This acts as a registry for available formatters.GetTraceFormatter()
function: This is the public entry point for obtaining a trace formatter. It reads window.perf.trace_format
(a global configuration setting) and returns the appropriate formatter instance from traceFormatterRecords
. If the format is not found, it defaults to DefaultTraceFormatter
.Global Config: window.perf.trace_format = "chrome" | v GetTraceFormatter() | v traceFormatterRecords["chrome"] | v Returns new ChromeTraceFormatter() instance
traceformatter_test.ts
: Contains unit tests for the ChromeTraceFormatter
, specifically focusing on the formatQuery
method and its logic for handling statistic suffixes under different configurations of window.perf.enable_skia_bridge_aggregation
.
This module depends on:
infra-sk/modules:query_ts_lib
: For the fromParamSet
function, used to convert a ParamSet
object into a URL query string.perf/modules/json:index_ts_lib
: For type definitions like Params
, ParamSet
, and TraceFormat
.perf/modules/paramtools:index_ts_lib
: For the makeKey
function, used by DefaultTraceFormatter
to create a string representation of a Params
object.perf/modules/window:window_ts_lib
: To access global configuration values like window.perf.trace_format
and window.perf.enable_skia_bridge_aggregation
.The triage-menu-sk
module provides a user interface element for managing and triaging anomalies in bulk. It's designed to streamline the process of handling multiple performance regressions or improvements detected in data.
The core purpose of this module is to allow users to efficiently take action on a set of selected anomalies. Instead of interacting with each anomaly individually, this menu provides centralized controls for common triage operations. This is crucial for workflows where many anomalies might be identified simultaneously, requiring a quick and consistent way to categorize or address them.
Key responsibilities and components:
triage-menu-sk.ts
: This is the heart of the module, defining the TriageMenuSk
custom element.
Anomaly
objects and associated trace_names
. This allows it to operate on multiple anomalies at once.new-bug-dialog-sk
element, allowing the user to create a new bug report associated with the selected anomalies.existing-bug-dialog-sk
element, enabling the user to link the selected anomalies to an already existing bug.NudgeEntry
class and related logic (generateNudgeButtons
, nudgeAnomaly
, makeNudgeRequest
) allow users to adjust the perceived start and end points of an anomaly. This is a subtle but important feature for refining the automated anomaly detection. The UI presents a set of buttons (e.g., -2, -1, 0, +1, +2) that shift the anomaly's boundaries._allowNudge
flag controls whether the nudge buttons are visible, allowing for contexts where nudging might not be appropriate (e.g., when multiple, disparate anomalies are selected)._anomalies
, _trace_names
) and the nudge options (_nudgeList
).makeEditAnomalyRequest
and makeNudgeRequest
methods handle sending HTTP POST requests to the /_/triage/edit_anomalies
endpoint. This endpoint is responsible for persisting the triage decisions (bug associations, ignore status, nudge adjustments) in the backend database.editAction
parameter in makeEditAnomalyRequest
can take values like IGNORE
, RESET
(to de-associate bugs), or implicitly associate with a bug ID when called from the bug dialogs.anomaly-changed
custom event. This event signals to parent components (likely a component displaying a list or plot of anomalies) that one or more anomalies have been modified and their representation needs to be updated. The event detail includes the affected traceNames
, the editAction
performed, and the updated anomalies
.Integration with Dialogs:
new-bug-dialog-sk
and existing-bug-dialog-sk
. When the user clicks “New Bug” or “Existing Bug”, this element calls the respective open()
methods on these dialog components.setAnomalies
methods, so the dialogs know which anomalies the bug report will be associated with.triage-menu-sk.html
(Implicit via Lit template in .ts
): Defines the visual structure of the menu, including the layout of the action buttons and the nudge buttons. The rendering is dynamic based on the number of selected anomalies and whether nudging is allowed.
triage-menu-sk.scss
: Provides the styling for the menu, ensuring it integrates visually with the surrounding application.
Key Workflow Example (Ignoring Anomalies):
triage-menu-sk
Receives Data: The parent component calls triageMenuSkElement.setAnomalies(selectedAnomalies, correspondingTraceNames, nudgeOptions)
.triage-menu-sk
re-renders, enabling the “Ignore” button (and potentially others). User Action (Selects Anomalies) --> Parent Component | v triage-menu-sk.setAnomalies() | v UI Renders (Buttons enabled)
User Click ("Ignore") --> triage-menu-sk.ignoreAnomaly() | v makeEditAnomalyRequest(anomalies, traces, "IGNORE") | v POST /_/triage/edit_anomalies | (Backend processes) v HTTP 200 OK | v Dispatch "anomaly-changed" event
makeEditAnomalyRequest
is called. It constructs a JSON payload with the anomaly keys, trace names, and the action “IGNORE”. This payload is sent to /_/triage/edit_anomalies
.triage-menu-sk
updates the local state of the anomalies (setting bug_id
to -2 for ignored anomalies) and dispatches the anomaly-changed
event.anomaly-changed
and updates its display to reflect that the anomalies are now ignored (e.g., by changing their color, removing them from an active list).The design decision to have triage-menu-sk
orchestrate calls to the backend and then emit a generic anomaly-changed
event decouples it from the specifics of how anomalies are displayed. Parent components only need to know that anomalies have changed and can react accordingly. The use of dedicated dialog components (new-bug-dialog-sk
, existing-bug-dialog-sk
) encapsulates the complexity of bug reporting, keeping the triage menu itself focused on initiating these actions.
triage-page-sk
)The triage-page-sk
module provides the user interface for viewing and triaging regressions in performance data. It allows users to filter regressions based on time range, commit status (all, regressions, untriaged), and alert configurations. The primary goal is to present a clear overview of regressions and facilitate the process of identifying their cause and impact.
The module is responsible for:
/_/reg/
) to retrieve regression information for a specified time range and filter criteria. This data is then rendered in a tabular format, showing commits along with any associated regressions.stateReflector
utility from infra-sk/modules/statereflector
is used for this purpose.<dialog>
) containing the cluster-summary2-sk
element is displayed. This dialog allows the user to view details of the regression and assign a triage status (e.g., “positive”, “negative”, “acknowledged”)./_/triage/
) to persist the decision.triage-status-sk
element, which shows its current triage state.triage-page-sk.ts
: This is the core TypeScript file defining the TriagePageSk
custom element.State
interface to manage the component's configuration (begin/end timestamps, subset filter, alert filter).connectedCallback
initializes the stateReflector
to synchronize the component's state with the URL.updateRange()
is a crucial method that fetches regression data from the /_/reg/
endpoint whenever the state changes (e.g., date range or filter selection). It uses the fetch
API for network requests.template
function (using lit/html
) defines the HTML structure of the component, including the filter controls, the main table displaying regressions, and the triage dialog.commitsChange
, filterChange
, rangeChange
, triage_start
, and triaged
manage user input and interactions with child components.triage_start
method is triggered when a user wants to triage a specific regression. It prepares the data for the cluster-summary2-sk
element and displays the triage dialog.triaged
method is called when the user submits a triage decision from the cluster-summary2-sk
dialog. It sends a POST request to /_/triage/
with the triage information.stepUpAt
, stepDownAt
, alertAt
, etc., are used to determine how to render cells in the regression table based on the data received.calc_all_filter_options
dynamically generates the list of available alert filters based on categories returned from the backend.triage-page-sk.scss
: Contains the SASS/CSS styles for the triage-page-sk
element.triage-page-sk-demo.html
/ triage-page-sk-demo.ts
: Provide a demonstration page for the triage-page-sk
element.<triage-page-sk>
. The TypeScript file simply imports the main component to register it.1. Initial Page Load and Data Fetch:
User navigates to page / URL with state parameters | V triage-page-sk.connectedCallback() | V stateReflector initializes state from URL (or defaults) | V triage-page-sk.updateRange() | V FETCH /_/reg/ with current state (begin, end, subset, alert_filter) | V Backend responds with RegressionRangeResponse (header, table, categories) | V triage-page-sk.reg is updated | V triage-page-sk.calc_all_filter_options() (if categories present) | V triage-page-sk._render() displays the regression table
2. User Changes Filter or Date Range:
User interacts with <select> (commits/filter) or <day-range-sk> | V Event handler (e.g., commitsChange, filterChange, rangeChange) updates this.state | V this.stateHasChanged() (triggers stateReflector to update URL) | V triage-page-sk.updateRange() | V FETCH /_/reg/ with new state | V Backend responds with updated RegressionRangeResponse | V triage-page-sk.reg is updated | V triage-page-sk._render() re-renders the regression table with new data
3. User Initiates Triage:
User clicks on a regression in the table (within a <triage-status-sk> element) | V <triage-status-sk> emits 'start-triage' event with details (alert, full_summary, cluster_type) | V triage-page-sk.triage_start(event) | V this.dialogState is populated with event.detail | V this._render() (updates the <cluster-summary2-sk> properties within the dialog) | V this.dialog.showModal() (displays the triage dialog)
4. User Submits Triage:
User interacts with <cluster-summary2-sk> in the dialog and clicks "Save" (or similar) | V <cluster-summary2-sk> emits 'triaged' event with details (columnHeader, triage status) | V triage-page-sk.triaged(event) | V Constructs TriageRequest body (cid, triage, alert, cluster_type) | V this.dialog.close() | V this.triageInProgress = true; this._render() (shows spinner) | V FETCH POST /_/triage/ with TriageRequest | V Backend responds (e.g., with a bug link if applicable) | V this.triageInProgress = false; this._render() (hides spinner) | V (Optional) If json.bug exists, window.open(json.bug) | V (Implicit) The <triage-status-sk> for the triaged item may update its display, or a full data refresh might be triggered if necessary to show the updated status.
triage-page-sk
, commit-detail-sk
, day-range-sk
, triage-status-sk
, cluster-summary2-sk
). This promotes modularity, reusability, and separation of concerns. Each component handles a specific piece of functionality.fetch
and Promises. Spinners (spinner-sk
) are used to provide visual feedback to the user during these operations.<dialog>
) is used for the triage process. This provides a focused interface for the user to review cluster details and make a triage decision without cluttering the main regression table.The triage-page-sk
serves as the central hub for users to actively engage with and manage performance regressions, making it a critical component in the performance monitoring workflow.
The triage-status-sk
module provides a custom HTML element designed to visually represent and interact with the triage status of a “cluster” within the Perf application. A cluster, in this context, likely refers to a group of related performance measurements or anomalies that require user attention and classification (triaging).
Core Functionality & Design:
The primary purpose of this element is to offer a concise and interactive way for users to understand the current triage state of a cluster and to initiate the triaging process.
Visual Indication: The element displays a button. The appearance of this button (specifically, an icon within it) changes based on the cluster's triage status: “positive,” “negative,” or “untriaged.” This provides an immediate visual cue to the user.
tricon2-sk
element to display the appropriate icon based on the triage.status
property. The styling for these states is defined in triage-status-sk.scss
, ensuring visual consistency with the application's theme (including dark mode).Initiating Triage: Clicking the button does not directly change the triage status within this element. Instead, it emits a custom event named start-triage
.
triage-status-sk
element focused and reusable. The actual triaging process likely involves a dialog or a more complex UI, which is beyond the scope of this simple button._start_triage
method is invoked on button click. This method constructs a detail
object containing all relevant information about the cluster (full_summary
, current triage
status, alert
configuration, cluster_type
, and a reference to the element itself) and dispatches the start-triage
CustomEvent
.Key Components & Files:
triage-status-sk.ts
: This is the heart of the module, defining the TriageStatusSk
custom element class which extends ElementSk
.triage
: An object of type TriageStatus
(defined in perf/modules/json
) holding the status
(‘positive’, ‘negative’, ‘untriaged’) and a message
string. This is the primary driver for the element's appearance.full_summary
: Potentially detailed information about the cluster, of type FullSummary
.alert
: Information about any alert configuration associated with the cluster, of type Alert
.cluster_type
: A string (‘high’ or ‘low’), likely indicating the priority or type of the cluster.lit-html
for templating (TriageStatusSk.template
). The template renders a <button>
containing a tricon2-sk
element. The class
of the button and the value
of the tricon2-sk
are bound to ele.triage.status
, dynamically changing the appearance._start_triage
method is responsible for creating and dispatching the start-triage
event.triage-status-sk.scss
: Defines the visual styling for the triage-status-sk
element. It includes specific styles for the different triage states (.positive
, .negative
, .untriaged
) and their hover states, ensuring they integrate with the application's themes (including dark mode variables like --positive
, --negative
, --surface
).index.ts
: A simple entry point that imports and thereby registers the triage-status-sk
custom element, making it available for use in HTML.triage-status-sk-demo.html
& triage-status-sk-demo.ts
: These files provide a demonstration page for the triage-status-sk
element.start-triage
event and how to programmatically set the triage
property of the element. This is crucial for developers to understand how to integrate and use the component.BUILD.bazel
: Defines how the module is built and its dependencies. It specifies tricon2-sk
as a UI dependency and includes necessary SASS and TypeScript libraries.triage-status-sk_puppeteer_test.ts
: Contains Puppeteer-based tests to ensure the element renders correctly and behaves as expected in a browser environment. This is important for maintaining code quality and preventing regressions.Workflow Example: User Initiates Triage
User sees a triage-status-sk button (e.g., showing an 'untriaged' icon) | V User clicks the button | V [triage-status-sk.ts] _start_triage() method is called | V [triage-status-sk.ts] Creates a 'detail' object with: - triage: { status: 'untriaged', message: '...' } - full_summary: { ... } - alert: { ... } - cluster_type: 'low' | 'high' - element: (reference to itself) | V [triage-status-sk.ts] Dispatches a 'start-triage' CustomEvent with the 'detail' object | V [Parent Component/Application Logic] Listens for 'start-triage' event | V [Parent Component/Application Logic] Receives event.detail | V [Parent Component/Application Logic] Uses the received data to: - Open a triage dialog - Populate the dialog with cluster details - Allow user to select a new triage status - (Potentially) update the original triage-status-sk element's 'triage' property after the dialog interaction is complete.
This design allows triage-status-sk
to be a focused, presentational component, while the more complex logic of handling the triage process itself is managed elsewhere in the application. This promotes separation of concerns and reusability.
The triage2-sk
module provides a custom HTML element for selecting a triage status. This element is designed to be a simple, reusable UI component for indicating whether a particular item is “positive”, “negative”, or “untriaged”. Its primary purpose is to offer a standardized way to represent and interact with triage states across different parts of the Perf application.
The core of the module is the triage2-sk
custom element, defined in triage2-sk.ts
. This element leverages the Lit library for templating and rendering. It presents three buttons, each representing one of the triage states:
<check-circle-icon-sk>
).<cancel-icon-sk>
).<help-icon-sk>
).The “why” behind this design is to provide a clear visual representation of the current triage status and an intuitive way for users to change it. By using distinct icons and styling for each state, the element aims to reduce ambiguity.
Key Implementation Details:
triage2-sk.ts
: This is the main TypeScript file defining the TriageSk
class, which extends ElementSk
.
value
attribute (and corresponding property). It can be one of “positive”, “negative”, or “untriaged”. If no value is provided, it defaults to “untriaged”.change
. The detail
property of this event contains the new triage status as a string (e.g., “positive”). This allows parent components to react to changes in the triage status. User clicks "Positive" button | V triage2-sk sets its 'value' attribute to "positive" | V triage2-sk dispatches a 'change' event with detail: "positive"
template
static method uses Lit's html
tagged template literal to define the structure of the element. It dynamically sets the selected
attribute on the appropriate button based on the current value
.value
attribute. When this attribute changes (either programmatically or through user interaction), the attributeChangedCallback
is triggered, which re-renders the component and dispatches the change
event.isStatus
function ensures that the value
property is always one of the allowed Status
types, defaulting to “untriaged” if an invalid value is encountered. This contributes to the robustness of the component.triage2-sk.scss
: This file contains the SASS styles for the triage2-sk
element.
index.ts
: This file serves as the entry point for the module, exporting the TriageSk
class and ensuring the custom element is defined.
Demo and Testing:
triage2-sk-demo.html
and triage2-sk-demo.ts
: Provide a simple demonstration page showcasing the element in various states and how to listen for the change
event. This is useful for manual testing and visual inspection.triage2-sk_test.ts
: Contains Karma unit tests that verify the event emission and value changes of the component.triage2-sk_puppeteer_test.ts
: Includes Puppeteer-based end-to-end tests that check the rendering of the component in a browser environment and capture screenshots for visual regression testing.The design choice of using custom elements and Lit allows for a modular and maintainable component that can be easily integrated into larger applications. The clear separation of concerns (logic in TypeScript, styling in SASS, and structure in the template) follows common best practices for web component development.
The tricon2-sk
module provides a custom HTML element <tricon2-sk>
designed to visually represent triage states. This component is crucial for user interfaces where quick identification of an item's status (e.g., in a bug tracker, code review system, or monitoring dashboard) is necessary.
The core idea is to offer a standardized, reusable icon that clearly communicates whether an item is “positive,” “negative,” or “untriaged.” This avoids inconsistencies and reduces cognitive load for users who frequently interact with such systems.
Key Components and Responsibilities:
tricon2-sk.ts
: This is the heart of the module. It defines the TriconSk
class, which extends ElementSk
(a base class for custom elements in the Skia infrastructure).
value
attribute.lit-html
library for templating, allowing for efficient rendering and updates.static template
function determines which icon to display (check-circle-icon-sk
for “positive”, cancel-icon-sk
for “negative”, and help-icon-sk
for “untriaged” or any other value). This design centralizes the icon selection logic.value
attribute is the primary interface for controlling the displayed icon. Changes to this attribute trigger a re-render via attributeChangedCallback
and _render()
.connectedCallback
ensures that the value
property is properly initialized if set before the element is attached to the DOM.check-circle-icon-sk
, cancel-icon-sk
, help-icon-sk
) from the elements-sk
module, promoting modularity and reuse of existing icon assets.tricon2-sk.scss
: This file handles the styling of the tricon2-sk
element and its internal icons.
--green
, --red
, --brown
) for the icon fill colors. This allows themes (defined in themes.scss
) to override these colors easily..body-sk
context and when .darkmode
is applied to .body-sk
. This ensures the icons maintain appropriate contrast and visibility across different UI themes. The fallback hardcoded colors (#388e3c
, etc.) provide a default styling if CSS variables are not defined by a theme.index.ts
: This file serves as the main entry point for the module when it's imported. Its sole responsibility is to import tricon2-sk.ts
, which in turn registers the <tricon2-sk>
custom element. This is a common pattern for organizing custom element definitions.
tricon2-sk-demo.html
and tricon2-sk-demo.ts
: These files create a demonstration page for the <tricon2-sk>
element.
tricon2-sk
element and how it appears in various theming contexts (default, with colors.css
theming, and with themes.css
in both light and dark modes). This is invaluable for development, testing, and documentation.<tricon2-sk>
element with different value
attributes. The accompanying TypeScript file simply imports the index.ts
of the tricon2-sk
module to ensure the custom element is defined before the browser tries to render it.tricon2-sk_puppeteer_test.ts
: This file contains automated UI tests for the tricon2-sk
element using Puppeteer.
tricon2-sk-demo.html
) in a headless browser, checks if the expected number of tricon2-sk
elements are present (a basic smoke test), and then takes a screenshot of the page. This ensures that changes to the component's appearance are caught early.Workflow: Displaying a Triage Icon
Usage: An application includes the <tricon2-sk>
element in its HTML, setting the value
attribute:
<tricon2-sk value="positive"></tricon2-sk>
Element Initialization (tricon2-sk.ts
):
TriconSk
class is instantiated.connectedCallback
is called, ensuring the value
property is synchronized with the attribute._render()
is called.Template Selection (tricon2-sk.ts
):
static template
function is invoked.this.value
(e.g., “positive”), it returns the corresponding HTML template: html<check-circle-icon-sk></check-circle-icon-sk>
.Icon Rendering:
<check-circle-icon-sk>
) renders itself.Styling (tricon2-sk.scss
):
- CSS rules are applied. For example, if the value is "positive": `tricon2-sk { check-circle-icon-sk { fill: var(--green); // Initially attempts to use the CSS variable } }` - If themes are active (e.g.,`.body-sk.darkmode`), more specific rules might override the fill color: `.body-sk.darkmode tricon2-sk {
check-circle-icon-sk { fill: #4caf50; // Specific dark mode color } }`
Diagram: Attribute Change leading to Icon Update
[User/Application sets/changes 'value' attribute on <tricon2-sk>] | v [<tricon2-sk> element] | +---------------------+ | attributeChangedCallback() is triggered | +---------------------+ | v [this._render()] | v [TriconSk.template(this)] <-- Reads current 'this.value' | +-------------+-------------+ | (value is | (value is | (value is other) | "positive") | "negative") | v v v [Returns [Returns [Returns <check-...>] <cancel-...>] <help-...>] | v [lit-html updates the DOM with the new icon template] | v [Browser renders the new icon with appropriate CSS styles]
The design decision to use distinct, imported icon components (check-circle-icon-sk
, etc.) rather than, for example, a single SVG sprite or dynamically generating SVG paths, promotes better separation of concerns. Each icon can be managed and updated independently. The use of CSS variables for theming is a standard and flexible approach, allowing consuming applications to easily adapt the icon colors to their specific look and feel without modifying the component's core logic or styles directly.
The trybot
module provides utilities for processing and analyzing results from Perf trybots. Trybots are automated systems that run performance tests on code changes before they are submitted. This module focuses on calculating and presenting metrics that help developers understand the performance impact of their changes.
The core functionality revolves around aggregating and averaging stddevRatio
values across different parameter combinations. The stddevRatio
is a key metric representing the change in performance relative to the standard deviation of the baseline. A positive stddevRatio
generally indicates a performance regression, while a negative value suggests an improvement.
The primary goal is to help developers quickly identify which aspects of their change (represented by key-value parameters like model=GCE
or test=MyBenchmark
) are contributing most significantly to performance changes, both positive and negative. By grouping results by these parameters and calculating average stddevRatio
, the module provides a summarized view that highlights potential problem areas or confirms expected improvements.
calcs.ts
: This file contains the logic for performing calculations on trybot results.
byParams(res: TryBotResponse): AveForParam[]
: This is the central function of the module.
Why: Developers need a way to understand the overall performance impact of their changes across various configurations (e.g., different devices, tests, or operating systems). Simply looking at individual trace results can be overwhelming. This function provides a summarized view by grouping results by their parameters.
How:
It takes a TryBotResponse
object, which contains a list of individual test results (res.results
). Each result includes a stddevRatio
and a set of params
(key-value pairs describing the test configuration).
It iterates through each result and then through each key-value pair within that result's params
.
For each unique key=value
string (e.g., “model=GCE”), it maintains a running total of stddevRatio
values, the count of traces contributing to this total (n
), and counts of traces with positive (high
) or negative (low
) stddevRatio
. This aggregation happens in the runningTotals
object.
Input TryBotResponse.results: [ { params: {arch: "arm", os: "android"}, stddevRatio: 1.5 }, { params: {arch: "x86", os: "linux"}, stddevRatio: -0.5 }, { params: {arch: "arm", os: "ios"}, stddevRatio: 2.0 } ] -> runningTotals intermediate state (simplified): "arch=arm": { totalStdDevRatio: 3.5, n: 2, high: 2, low: 0 } "os=android": { totalStdDevRatio: 1.5, n: 1, high: 1, low: 0 } "arch=x86": { totalStdDevRatio: -0.5, n: 1, high: 0, low: 1 } "os=linux": { totalStdDevRatio: -0.5, n: 1, high: 0, low: 1 } "os=ios": { totalStdDevRatio: 2.0, n: 1, high: 1, low: 0 }
After processing all results, it calculates the average stddevRatio
for each key=value
pair by dividing totalStdDevRatio
by n
.
It constructs an array of AveForParam
objects. Each object represents a key=value
parameter and includes its calculated average stddevRatio
, the total number of traces (n
) that matched this parameter, and the counts of high and low stddevRatio
traces.
Finally, it sorts this array in descending order based on the aveStdDevRatio
. This crucial step brings the parameters associated with the largest (potentially negative) performance regressions to the top, making them easy to identify.
AveForParam
interface: Defines the structure for the output of byParams
. It holds the aggregated average stddevRatio
for a specific keyValue
pair, along with counts of traces.
runningTotal
interface: An internal helper interface used during the aggregation process within byParams
to keep track of sums and counts before the final average is computed.
calcs_test.ts
: This file contains unit tests for the functions in calcs.ts
.
chai
for assertions. Tests cover scenarios like:byParams
should return an empty list.stddevRatio
for multiple traces sharing common parameters. For example, if two traces have test=1
, their stddevRatio
values should be averaged for the test=1
entry in the output.aveStdDevRatio
in descending order.Calculating Average StdDevRatio by Parameter:
TryBotResponse | v byParams(response) | | 1. Initialize `runningTotals` (empty map) | | 2. For each `result` in `response.results`: | | | |-> For each `param` (key-value pair) in `result.params`: | | | |--> Generate `runningTotalsKey` (e.g., "model=GCE") | |--> Retrieve or create `runningTotal` entry for `runningTotalsKey` | |--> Update `totalStdDevRatio`, `n`, `high`, `low` in the entry | | 3. Initialize `ret` (empty array of AveForParam) | | 4. For each `runningTotalKey` in `runningTotals`: | | | |-> Calculate `aveStdDevRatio` = `runningTotal.totalStdDevRatio` / `runningTotal.n` | |-> Create `AveForParam` object | |-> Push to `ret` | | 5. Sort `ret` by `aveStdDevRatio` (descending) | v Array of AveForParam
This workflow allows users to quickly pinpoint which configuration parameters (like specific device models, operating systems, or test names) are associated with the most significant average performance changes in a given trybot run. The sorting ensures that the most impactful parameters are immediately visible.
The trybot-page-sk
module provides a user interface for analyzing performance regressions. It allows users to select either a specific commit from the repository or a trybot run (representing a potential code change) and then analyze performance metrics associated with that selection. The core purpose is to help developers identify and understand performance impacts before or after code submission.
Key Responsibilities and Components:
User Input and Selection:
commit-detail-picker-sk
element. This allows them to investigate performance regressions that might have been introduced by a particular code change.trybot-page-sk.ts
. It appears to be a planned feature or a more complex interaction than commit selection.) The underlying TryBotRequest
interface includes fields like cl
and patch_number
, indicating the intent to support this.query-sk
. This query filters the performance traces to be considered (e.g., focusing on specific benchmarks, configurations, or architectures).paramset-sk
and query-count-sk
elements provide feedback on the current query, showing the matching parameters and the number of traces that fit the criteria. This helps users refine their query to target the relevant data.Data Fetching and Processing:
run
method is invoked. This method constructs a TryBotRequest
object based on the user's selections (commit number, query, or eventually CL/patch details)./_/trybot/load/
backend endpoint. This endpoint is responsible for fetching the relevant performance data (trace values, headers, parameter sets) for the specified commit/trybot and query. The startRequest
utility handles the asynchronous request and displays progress using a spinner-sk
.TryBotResponse
) contains the performance data, including:results
: An array of individual trace results, each containing parameter values (params
), actual metric values (values
), and a stddevRatio
(how many standard deviations the trace's value is from the median of its historical data).paramset
: The complete set of parameters found across all returned traces.header
: Information about the data points in each trace, likely including timestamps.byParams
function (from ../trybot/calcs
) is used to aggregate results by parameter key-value pairs, calculating average standard deviation ratios, counts, and high/low values for each group. This helps identify which parameters are most strongly correlated with performance changes.Results Display and Visualization:
timeline-icon-sk
) for a trace renders its values over time on a plot-simple-sk
element. Users can CTRL-click to plot multiple traces on the same graph for comparison.byParams
calculation. For each parameter key-value pair (e.g., “config=gles”), it shows the average standard deviation ratio, the number of traces (N) in that group, and the highest/lowest individual trace values.maxByParamsPlot
traces from the selected group (sorted by stddevRatio
) are plotted on a separate plot-simple-sk
.by-params-traceid
and by-params-paramset
respectively. paramset-sk
is used to display the parameters, highlighting the ones belonging to the focused trace.State Management:
stateReflector
to synchronize its internal state (this.state
, which is a TryBotRequest
object) with the URL. This means that the selected commit, query, and analysis type (“commit” or “trybot”) are reflected in the URL query parameters. This allows users to bookmark or share specific analysis views.stateHasChanged()
, which updates the URL via stateReflector
and re-renders the component.Styling and Structure:
trybot-page-sk.scss
file defines the visual appearance and layout of the component, including styles for the query section, results tables, and plot areas.Workflow Example (Commit Analysis):
User Selects Tab: User ensures the “Commit” tab is selected. [tabs-sk] --selects index 0--> [TrybotPageSk.tabSelected] --> state.kind = "commit" --> stateHasChanged()
User Selects Commit: User interacts with commit-detail-picker-sk
. [commit-detail-picker-sk] --commit-selected event--> [TrybotPageSk.commitSelected] --> state.commit_number = selected_commit_offset --> stateHasChanged() --> _render() (UI updates to show query section)
User Enters Query: User types into query-sk
.
[query-sk] --query-change event--> [TrybotPageSk.queryChange] --> state.query = new_query_string --> stateHasChanged() --> _render() (paramset-sk summary updates) [query-sk] --query-change-delayed event--> [TrybotPageSk.queryChangeDelayed] --> [query-count-sk].current_query = new_query_string (triggers count update)
User Clicks “Run”: [Run Button] --click--> [TrybotPageSk.run] --> spinner-sk.active = true --> startRequest('/_/trybot/load/', state, ...) --> HTTP POST to backend with { kind: "commit", commit_number: X, query: "Y" } <-- Backend responds with TryBotResponse (trace data, paramset, header) --> results = TryBotResponse --> byParams = byParams(results) --> spinner-sk.active = false --> _render() (results tables and plot areas become visible and populated)
User Interacts with Results:
- **Plotting Individual Trace:** `[Timeline Icon in Individual Table]
--click--> [TrybotPageSk.plotIndividualTrace(event, index)] --> individualPlot.addLines(...) --> displayedTrace = true --> _render() (individual plot becomes visible) - **Plotting By Params Group:**
[Timeline Icon in By Params Table] --click--> [TrybotPageSk.plotByParamsTraces(event, index)] --> Filters results.results for matching key=value --> byParamsPlot.addLines(...) --> byParamsParamSet.paramsets = [ParamSet of plotted traces] --> displayedByParamsTrace = true --> _render() (by params plot and its paramset become visible) - **Focusing Trace on By Params Plot:**
[by-params-plot] --trace_focused event--> [TrybotPageSk.byParamsTraceFocused] --> byParamsTraceID.innerText = focused_trace_name --> byParamsParamSet.highlight = fromKey(focused_trace_name) --> _render() (updates highlighted params in by-params-paramset)`
The design emphasizes providing both a high-level overview of potential regression areas (via “By Params”) and the ability to drill down into individual trace performance. The use of stddevRatio
as a primary metric helps quantify the significance of observed changes.
user-issue-sk
)The user-issue-sk
module provides a custom HTML element for associating and managing Buganizer issues with specific data points in the Perf application. This allows users to directly link performance regressions or anomalies to their corresponding bug reports, enhancing traceability and collaboration.
Why: Tracking issues related to performance data is crucial for effective debugging and resolution. This element centralizes the issue linking process within the Perf UI, providing a seamless experience for users to add, view, and remove bug associations.
How:
The core functionality revolves around the UserIssueSk
LitElement class. This class manages the display and interaction logic for associating a Buganizer issue with a data point identified by its trace_key
and commit_position
.
Key Responsibilities and Components:
alogin-sk
. This is essential because only logged-in users can add or remove issue associations. If a user is not logged in, they can only view existing issue links.bug_id
: This property determines the element's display.bug_id === 0
: Indicates no Buganizer issue is associated with the data point. The element will display an “Add Bug” button (if the user is logged in).bug_id > 0
: An existing Buganizer issue is linked. The element will display a link to the bug and, if the user is logged in, a “close” icon to remove the association.bug_id === -1
: This is a special state where the element renders nothing, effectively hiding itself. This might be used in scenarios where issue linking is not applicable._text_input_active
: A boolean flag that controls the visibility of the input field for entering a new bug ID.render()
method dynamically chooses between two main templates based on the bug_id
and login status:addIssueTemplate()
: Shown when bug_id === 0
and the user is logged in. It initially displays an “Add Bug” button. Clicking this button reveals an input field for the bug ID and confirm/cancel icons.showLinkTemplate()
: Shown when bug_id > 0
. It displays a formatted link to the Buganizer issue (using AnomalySk.formatBug
). If the user is logged in, a “close” icon is also displayed to allow removal of the issue link.addIssue()
: Triggered when a user submits a new bug ID. It makes a POST request to the /_/user_issue/save
endpoint with the trace_key
, commit_position
, and the new issue_id
.removeIssue()
: Triggered when a logged-in user clicks the “close” icon next to an existing bug link. It makes a POST request to the /_/user_issue/delete
endpoint with the trace_key
and commit_position
.user-issue-changed
. This event bubbles up and carries a detail
object containing the trace_key
, commit_position
, and the new bug_id
. This allows parent components or other parts of the application to react to changes in issue associations (e.g., by refreshing a list of user-reported issues).errorMessage
utility from perf/modules/errorMessage
to display feedback to the user in case of API errors or invalid input.Key Files:
user-issue-sk.ts
: This is the heart of the module. It defines the UserIssueSk
LitElement, including its properties, styles, templates, and logic for interacting with the backend API and handling user input. The design focuses on conditional rendering based on the bug_id
and user login status. The API calls are standard fetch
requests.index.ts
: A simple entry point that imports and registers the user-issue-sk
custom element, making it available for use in HTML.BUILD.bazel
: Defines the build dependencies for the element, including alogin-sk
for authentication, anomaly-sk
for bug link formatting, icon elements for the UI, and Lit libraries for web component development.Workflows:
Adding a New Issue: User (logged in) sees “Add Bug” button User clicks “Add Bug” -> activateTextInput()
is called -> _text_input_active
becomes true
-> Element re-renders to show input field, check icon, close icon User types bug ID into input field -> changeHandler()
updates _input_val
User clicks check icon -> addIssue()
is called -> Input validation (is _input_val
> 0?) -> POST request to /_/user_issue/save
with trace_key, commit_position, input_val -> On success: -> bug_id
is updated with _input_val
-> _input_val
reset to 0 -> _text_input_active
set to false
-> user-issue-changed
event is dispatched -> Element re-renders to show the new bug link and remove icon -> On failure: -> errorMessage
is displayed -> hideTextInput()
is called (resets state)
Viewing an Existing Issue: Element is initialized with bug_id > 0
-> render()
calls showLinkTemplate()
-> A link to perf.bug_host_url + bug_id
is displayed. -> If user is logged in, a “close” icon is also displayed.
Removing an Existing Issue: User (logged in) sees bug link and “close” icon User clicks “close” icon -> removeIssue()
is called -> POST request to /_/user_issue/delete
with trace_key, commit_position -> On success: -> bug_id
is set to 0 -> _input_val
reset to 0 -> _text_input_active
set to false
-> user-issue-changed
event is dispatched -> Element re-renders to show “Add Bug” button -> On failure: -> errorMessage
is displayed
The design prioritizes a clear separation of concerns: display logic is handled by LitElement's templating system, state is managed through properties, and backend interactions are encapsulated in dedicated asynchronous methods. The use of custom events allows for loose coupling with other components that might need to react to changes in issue associations.
The window
module is designed to provide utility functions related to the browser's window
object, specifically focusing on parsing and interpreting configuration data embedded within it. This approach centralizes the logic for accessing and processing global configurations, making it easier to manage and test.
A key responsibility of this module is to extract and process build tag information. This information is often embedded in the window.perf.image_tag
global variable, which is expected to be an SkPerfConfig
object (defined in //perf/modules/json:index_ts_lib
). The getBuildTag
function is the primary component for this task.
The getBuildTag
function takes an image tag string as input (or defaults to window.perf?.image_tag
). Its core purpose is to parse this string and categorize the build tag. The function employs a specific parsing logic based on the structure of the image tag:
Initial Validation:
@
character.@
or @
is the first/last character), it's considered an invalid tag.@
) starts with tag:
. If not, it's also an invalid tag.Input Tag String | V Split by '@' | V Check for at least 2 parts AND second part starts with "tag:" | +-- No --> Invalid Tag | V Proceed to type determination
Tag Type Determination: Based on the prefix of the raw tag (the part after tag:
):
- **Git Tag**: If the raw tag starts with `tag:git-`, it's classified as a 'git' type. The function extracts the first 7 characters of the Git hash. `rawTag starts with "tag:git-" | V Type: 'git' Tag: First 7 chars of Git hash` - **Louhi Build Tag**: If the raw tag has a specific length (>= 38 characters) and contains`louhi`at a particular position (substring from index 25 to 30), it's classified as a 'louhi' type. The function extracts a 7-character identifier (substring from index 31 to 38) which typically represents a hash or version.`rawTag length >= 38 AND rawTag[25:30] == "louhi" | V Type: 'louhi' Tag: rawTag[31:38]` - **Regular Tag**: If neither of the above conditions is met, it's considered a generic 'tag' type. The function returns the portion of the string after`tag:`. `Neither Git nor Louhi | V Type: 'tag' Tag: rawTag
after “tag:”`
This structured approach ensures that different build tag formats can be reliably identified and their relevant parts extracted. The decision to differentiate between ‘git’, ‘louhi’, and generic ‘tag’ types allows downstream consumers of this information to handle them appropriately. For instance, a ‘git’ tag might be used to link to a specific commit, while a ‘louhi’ tag might indicate a specific build from an internal CI system.
The module also extends the global Window
interface to declare the perf: SkPerfConfig
property. This is a TypeScript feature that provides type safety when accessing window.perf
, ensuring that developers are aware of its expected structure.
The window_test.ts
file provides unit tests for the getBuildTag
function, covering various scenarios including valid git tags, Louhi build tags, arbitrary tags, and different forms of invalid tags. These tests are crucial for verifying the correctness of the parsing logic and ensuring that changes to the function do not introduce regressions. The use of chai
for assertions is a standard practice for testing in this environment.
The word-cloud-sk
module provides a custom HTML element designed to visualize key-value pairs and their relative frequencies. This is particularly useful for displaying data from clusters or other datasets where understanding the distribution of different attributes is important.
The core idea is to present this frequency information in an easily digestible format, combining textual representation with a simple bar graph for each item. This allows users to quickly grasp the prevalence of certain key-value pairs within a dataset.
Key Components and Responsibilities:
word-cloud-sk.ts
: This is the heart of the module, defining the WordCloudSk
custom element which extends ElementSk
.
ElementSk
, it leverages common functionalities provided by the infra-sk
library for custom elements.lit-html
library for templating. The items
property, an array of ValuePercent
objects (defined in //perf/modules/json:index_ts_lib
), is the primary input. Each ValuePercent
object contains a value
(the key-value string) and a percent
(its frequency).items
and creates a table row for each. Each row displays the key-value string, its percentage as text, and a horizontal bar whose width is proportional to the percentage.connectedCallback
ensures that if the items
property is set before the element is fully connected to the DOM, it's properly upgraded and the element is rendered._render()
method is called whenever the items
property changes, ensuring the display is updated.word-cloud-sk.scss
: This file contains the SASS styles for the word-cloud-sk
element.
--light-gray
, --on-surface
, --primary
), allowing the component to adapt to different themes (like light and dark mode) defined in //perf/modules/themes:themes_sass_lib
and //elements-sk/modules:colors_sass_lib
.word-cloud-sk-demo.html
and word-cloud-sk-demo.ts
: These files provide a demonstration page for the word-cloud-sk
element.
word-cloud-sk-demo.html
includes multiple instances of the <word-cloud-sk>
tag, some within sections with different theming (e.g., dark mode). word-cloud-sk-demo.ts
then selects these instances and populates their items
property with sample data. This demonstrates how the component can be instantiated and how data is passed to it.index.ts
: This file simply imports and thereby registers the word-cloud-sk
custom element.
Workflow: Data Display
The primary workflow involves providing data to the word-cloud-sk
element and its subsequent rendering:
Instantiation: An instance of <word-cloud-sk>
is created in HTML.
<word-cloud-sk></word-cloud-sk>
Data Provision: The items
property of the element is set with an array of ValuePercent
objects.
// In JavaScript/TypeScript: const wordCloudElement = document.querySelector('word-cloud-sk'); wordCloudElement.items = [ { value: 'arch=x86', percent: 100 }, { value: 'config=565', percent: 60 }, // ... more items ];
Rendering (_render()
called in word-cloud-sk.ts
):
WordCloudSk
element iterates through the _items
array.<tr>
) is generated.item.value
is displayed in the first cell (<td>
).item.percent
is displayed as text (e.g., “60%”) in the second cell.<div>
element is created in the third cell. Its width
style is set to item.percent
pixels, creating a visual bar representation of the percentage.The overall structure rendered looks like this (simplified):
<table> <tr> <!-- For item 1 --> <td class="value">[item1.value]</td> <td class="textpercent">[item1.percent]%</td> <td class="percent"> <div style="width: [item1.percent]px"></div> </td> </tr> <tr> <!-- For item 2 --> <td class="value">[item2.value]</td> <td class="textpercent">[item2.percent]%</td> <td class="percent"> <div style="width: [item2.percent]px"></div> </td> </tr> <!-- ... more rows --> </table>
This process ensures that whenever the input data changes, the visual representation of the word cloud is automatically updated. The use of CSS variables for styling allows the component to seamlessly integrate into applications with different visual themes.
nanostat
is a command-line tool designed to compare and analyze the results of Skia's nanobench benchmark. It takes two JSON files generated by nanobench as input, representing “old” and “new” benchmark runs, and provides a statistical summary of the performance changes between them. This is particularly useful for developers to understand the performance impact of their code changes.
When making changes to a codebase, especially one as performance-sensitive as a graphics library like Skia, it's crucial to measure the impact on performance. Nanobench produces detailed raw data, but interpreting this data directly can be cumbersome. nanostat
was created to:
The core workflow of nanostat
involves several steps:
Input: It accepts two file paths as command-line arguments, pointing to the “old” and “new” nanobench JSON output files.
nanostat [options] old.json new.json
Parsing: The loadFileByName
function in main.go
is responsible for opening and parsing these JSON files. It uses the perf/go/ingest/format.ParseLegacyFormat
function to interpret the nanobench output structure and then perf/go/ingest/parser.GetSamplesFromLegacyFormat
to extract the raw sample values for each benchmark test. Each file's data is converted into a parser.SamplesSet
, which is a map where keys are test identifiers and values are slices of performance measurements (samples).
Statistical Analysis: The samplestats.Analyze
function (from the perf/go/samplestats
module) is the heart of the comparison. It takes the two parser.SamplesSet
(before and after samples) and a samplestats.Config
object as input. The configuration includes:
Alpha
: The significance level (default 0.05). A p-value below alpha indicates a significant difference.IQRR
: A boolean indicating whether to apply the Interquartile Range Rule to remove outliers from the sample data before analysis.All
: A boolean determining if all results (significant or not) should be displayed.Test
: The type of statistical test to perform (Mann-Whitney U test or Welch's T-test).Order
: The function used to sort the output rows.For each common benchmark test found in both input files, samplestats.Analyze
calculates statistics for both sets of samples (mean, percentage deviation) and then performs the chosen statistical test to compare the two distributions. This yields a p-value.
Filtering and Sorting: Based on the config
, samplestats.Analyze
filters out rows where the change is not statistically significant (if config.All
is false). The remaining rows are then sorted according to config.Order
.
Output Formatting: The formatRows
function in main.go
takes the analyzed and sorted samplestats.Row
data and prepares it for display.
config
, name
, test
). These are keys whose values differ across the benchmark results, helping to distinguish them.--all
flag is used.stdout
using text/tabwriter
to create a well-aligned table.Example output line:
old new delta stats name 2.15 ± 5% 2.00 ± 2% -7% (p=0.001, n=10+ 8) tabl_digg.skp
main.go
: This is the entry point of the application.
-alpha
, -sort
, -iqrr
, -all
, -test
).loadFileByName
to load and parse the input JSON files.samplestats.Config
based on the provided flags.samplestats.Analyze
to perform the statistical comparison.formatRows
to format the results for display.text/tabwriter
to print the formatted output to the console.actualMain(stdout io.Writer)
: Contains the main logic, allowing stdout
to be replaced for testing.loadFileByName(filename string) parser.SamplesSet
: Reads a nanobench JSON file, parses it, and extracts the performance samples. It leverages perf/go/ingest/format
and perf/go/ingest/parser
.formatRows(config samplestats.Config, rows []samplestats.Row) []string
: Takes the analysis results and formats them into a slice of strings, ready for tabular display. It intelligently includes relevant parameter keys in the output.main_test.go
: Contains unit tests for nanostat
.
nanostat
produces the expected output for various command-line flag combinations and input files.testdata/*.golden
) to compare actual output against expected output.TestMain_DifferentFlags_ChangeOutput(t *testing.T)
: The main test function that sets up different test cases.check(t *testing.T, name string, args ...string)
: A helper function that runs nanostat
with specified arguments, captures its output, and compares it against a corresponding golden file.README.md
: Provides user-facing documentation on how to install and use nanostat
, including examples and descriptions of command-line options.
Makefile
: Contains targets for building, testing, and regenerating test data (golden files). The regenerate-testdata
target is crucial for updating the golden files when the tool's output format or logic changes.
BUILD.bazel
: Defines how to build and test the nanostat
binary and its library using the Bazel build system. It lists dependencies on other Skia modules, such as:
//go/paramtools
: Used in formatRows
to work with parameter sets from benchmark results.//perf/go/ingest/format
: Used for parsing the legacy nanobench JSON format.//perf/go/ingest/parser
: Used to extract sample data from the parsed format.//perf/go/samplestats
: Provides the core statistical analysis functions (samplestats.Analyze
, samplestats.Order
, samplestats.Test
).perf/go/samplestats
: nanostat
heavily relies on this module for the actual statistical computations. This promotes code reuse and separation of concerns, keeping nanostat
focused on command-line parsing, file I/O, and output formatting.perf/go/ingest/format
and perf/go/ingest/parser
: These modules handle the complexities of interpreting the nanobench JSON structure, abstracting this detail away from nanostat
's main logic.-alpha
, -iqrr
, -all
, -sort
, -test
). This flexibility allows users to tailor the analysis to their specific needs. For example, the -iqrr
flag allows for more robust analysis by removing potential outlier data points that could skew results. The -test
flag allows users to choose between parametric (T-test) and non-parametric (U-test) statistical tests, depending on the assumptions they are willing to make about their data's distribution.text/tabwriter
provides a clean, aligned, and easy-to-read output format, which is essential for quickly scanning and understanding the performance changes.main_test.go
is a good practice for testing command-line tools. It makes it easy to verify that changes to the code don't unintentionally alter the output format or the results of the analysis. The Makefile
target regenerate-testdata
simplifies updating these files when intended changes occur.The /pages
module is responsible for defining the HTML structure and initial JavaScript and CSS for all the user-facing pages of the Skia Performance application. Each page represents a distinct view or functionality within the application, such as viewing alerts, exploring performance data, or managing regressions.
The core design philosophy is to keep the HTML files minimal and delegate the rendering and complex logic to custom HTML elements (Skia Elements). This promotes modularity and reusability of UI components.
Key Components and Responsibilities:
alerts.html
, newindex.html
):<head>
, <body>
).perf-scaffold-sk
custom element. This element acts as a common layout wrapper for all pages, providing consistent navigation, header, footer, and potentially other shared UI elements.perf-scaffold-sk
, they embed the primary custom element specific to that page's functionality (e.g., <alerts-page-sk>
, <explore-sk>
).{%- template "googleanalytics" . -%}
and {% .Nonce %}
for server-side rendering of common snippets and security nonces.window.perf = {%.context %};
script tag is used to pass initial data or configuration from the server (Go backend) to the client-side JavaScript. This context likely contains information needed by the page-specific custom element to initialize itself.alerts.ts
, newindex.ts
):<perf-scaffold-sk>
and the page-specific custom element (e.g., ../modules/alerts-page-sk
).alerts.scss
, newindex.scss
):@import 'body';
, which means they inherit base body styles from body.scss
.body.scss
provide, those styles would be defined here.body.scss
:<body>
element, such as removing default margins and padding. This ensures a consistent baseline across all pages.BUILD.bazel
:sk_page
rule from //infra-sk:index.bzl
.html_file
: The entry HTML file.ts_entry_point
: The entry TypeScript file.scss_entry_point
: The entry SCSS file.sk_element_deps
: A list of dependencies on other modules that provide the custom HTML elements used by the page. This is crucial for ensuring that elements like perf-scaffold-sk
and page-specific elements (e.g., alerts-page-sk
) are compiled and available.sass_deps
: Dependencies for SCSS, typically including :body_sass_lib
which refers to the body.scss
file.assets_serving_path
, nonce
, and production_sourcemap
.Workflow for a Page Request:
/alerts
).alerts.html
).{% .context %}
, the {% .Nonce %}
, and other templates like “googleanalytics” and “cookieconsent”.User Request ----> Go Backend ----> Template Processing (alerts.html + context) ----> HTML Response (URL Routing) (Injects window.perf data, nonce)
<script src="alerts.js"></script>
(or the equivalent generated by the build system), it fetches and executes alerts.ts
.alerts.ts
imports ../modules/perf-scaffold-sk
and ../modules/alerts-page-sk
. This registers these custom elements with the browser. Browser Receives HTML -> Parses HTML -> Encounters <script> for alerts.ts | -> Fetches and Executes alerts.ts | -> import '../modules/perf-scaffold-sk'; -> import '../modules/alerts-page-sk'; (Custom elements are now defined)
<perf-scaffold-sk>
and <alerts-page-sk>
). The JavaScript logic within these custom elements takes over, potentially fetching more data via AJAX using the initial window.perf
context if needed, and populating the page content. Custom Elements Registered -> Browser renders <perf-scaffold-sk> and <alerts-page-sk> | -> JavaScript within these elements executes (e.g., reads window.perf, makes AJAX calls, builds UI)
alerts.scss
) is also linked in the HTML (via the build system), and its styles (including those from body.scss
) are applied.This structure allows for a clean separation of concerns:
The help.html
page is slightly different as it directly embeds more static content (help text and examples) within its HTML structure using Go templating ({% range ... %}
). However, it still utilizes the perf-scaffold-sk
for consistent page layout and imports its JavaScript for any scaffold-related functionalities.
The newindex.html
and multiexplore.html
pages additionally include a div
with id="sidebar_help"
within the perf-scaffold-sk
. This suggests that the perf-scaffold-sk
might have a designated area or slot where page-specific help content can be injected, or that the page-specific JavaScript (explore-sk.ts
or explore-multi-sk.ts
) might dynamically populate or interact with this sidebar content.
/res
)The /res
module serves as a centralized repository for static assets required by the application. Its primary purpose is to provide a consistent and organized location for resources such as images, icons, and potentially other static files that are part of the user interface or overall application branding. By co-locating these assets, the module simplifies resource management, facilitates easier updates, and ensures that all parts of the application can reliably access necessary visual or static elements.
The decision to have a dedicated /res
module stems from the need to separate static content from dynamic code. This separation offers several benefits:
The internal structure of /res
is designed to categorize different types of assets. For instance, images are placed within a dedicated img
subdirectory. This categorization aids in discoverability and allows for type-specific processing or handling if needed in the future.
/res/img
(Submodule/Directory):/res
keeps the root of the resource module clean and allows for specific image-related build optimizations or management strategies. For example, image compression tools or sprite generation scripts could target this directory specifically./res/img/favicon.ico
:.ico
format is the traditional and most widely supported format for favicons, ensuring compatibility across different browsers and platforms. Placing it directly in the img
directory makes it easily discoverable by build tools and web servers, which often look for favicon.ico
in standard locations. Its presence here ensures that the application has a visual identifier in browser contexts.A typical workflow involving the /res
module might look like this:
Asset Creation/Acquisition: A designer creates a new icon or a new version of the application logo.
Designer Developer | | [New Image Asset] --> [Receives Asset]
Asset Placement: The developer places the new image file (e.g., new_icon.png
) into the appropriate subdirectory within /res
, likely /res/img/
.
Developer | [Places new_icon.png into /res/img/]
Referencing the Asset: Application code (e.g., HTML, CSS, JavaScript) that needs to display this icon will reference it using a path relative to how the assets are served.
Application Code (e.g., HTML) | <img src="/path/to/res/img/new_icon.png">
(Note: The exact /path/to/
depends on how the web server or build system exposes the /res
directory.)
Build Process: During the application build, files from the /res
module are typically copied to a public-facing directory in the build output.
Build System | [Reads /res/img/new_icon.png] --> [Copies to /public_output/img/new_icon.png]
Client Request: When a user accesses the application, their browser requests the asset. User's Browser Web Server | | [Requests /public_output/img/new_icon.png] ----> [Serves new_icon.png] | | [Displays new_icon.png] <------------------------+
This workflow highlights how the /res
module acts as the source of truth for static assets, which are then processed and served to the end-user. The favicon.ico
follows a similar, often more implicit, path as browsers automatically request it from standard locations.
The samplevariance
module is a command-line tool designed to analyze the variance of benchmark samples, specifically those generated by nanobench and stored in Google Cloud Storage (GCS). Nanobench typically produces multiple samples (e.g., 10) for each benchmark execution. This tool facilitates the examination of these samples across a large corpus of historical benchmark runs.
The primary motivation for this tool is to identify benchmarks exhibiting high variance in their results. High variance can indicate instability in the benchmark itself, the underlying system, or the measurement process. By calculating statistics like the ratio of the median to the minimum value for each set of samples, samplevariance
helps pinpoint traces that warrant further investigation.
The core workflow involves:
sampleInfo
struct.sampleInfo
structs from the workers and sorting them in descending order based on the calculated median/min ratio. This brings the traces with the highest variance to the top.[Flags] -> initialize() -> (ctx, bucket, objectPrefix, traceFilter, outputWriter) | v filenamesFromBucketAndObjectPrefix(ctx, bucket, objectPrefix) -> [filenames] | v samplesFromFilenames(ctx, bucket, traceFilter, [filenames]) | |--> [gcsFilenameChannel] -> Worker Goroutine 1 -> traceInfoFromFilename() -> [sampleInfo] --\ | | |--> [gcsFilenameChannel] -> Worker Goroutine 2 -> traceInfoFromFilename() -> [sampleInfo] ----> [aggregatedSamples] (mutex protected) | | |--> ... (up to workerPoolSize) | | | |--> [gcsFilenameChannel] -> Worker Goroutine N -> traceInfoFromFilename() -> [sampleInfo] --/ | v Sort([aggregatedSamples]) | v writeCSV([sortedSamples], topN, outputWriter) -> CSV Output
Key components and their responsibilities:
main.go
: This is the entry point of the application and orchestrates the entire process.
main()
: Drives the overall workflow: initialization, fetching filenames, processing samples, sorting, and writing the output.initialize()
: Handles command-line argument parsing. It sets up the GCS client, determines the input GCS path (defaulting to yesterday‘s data if not specified), parses the trace filter query, and configures the output writer (stdout or a specified file). The choice to default to yesterday’s data provides a convenient way to monitor recent benchmark stability without requiring explicit date specification.filenamesFromBucketAndObjectPrefix()
: Interacts with GCS to list all object names (filenames) under the specified bucket and prefix. It uses GCS client library features to efficiently retrieve only the names, minimizing data transfer.samplesFromFilenames()
: Manages the concurrent processing of benchmark files. It creates a channel (gcsFilenameChannel
) to distribute filenames to a pool of worker goroutines (workerPoolSize
). An errgroup
is used to manage these goroutines and propagate any errors. A mutex protects the shared samples
slice where results from workers are aggregated. This concurrent design is crucial for performance when dealing with a large number of benchmark files.traceInfoFromFilename()
: This function is executed by each worker goroutine. It takes a single GCS filename, reads the corresponding object from the bucket, parses the JSON content using format.ParseLegacyFormat
(from perf/go/ingest/format
) and parser.GetSamplesFromLegacyFormat
(from perf/go/ingest/parser
). For each trace that matches the traceFilter
(a query.Query
object from go/query
), it sorts the sample values, calculates the median (using stats.Sample.Quantile
from go-moremath/stats
) and minimum, and then computes their ratio. The use of established libraries for parsing and statistical calculation ensures correctness and leverages existing, tested code.writeCSV()
: Formats the processed sampleInfo
data into CSV format and writes it to the designated output writer. It includes a header row and then iterates through the sampleInfo
slice, writing each entry. It also handles the --top
flag to limit the number of output rows.sampleInfo
: A simple struct to hold the calculated statistics (trace ID, median, min, ratio) for a single benchmark trace's samples.sampleInfoSlice
: A helper type that implements sort.Interface
to allow sorting sampleInfo
slices by the ratio
field in descending order. This is key to presenting the most variant traces first.main_test.go
: Contains unit tests for the writeCSV
function. These tests verify that the CSV output is correctly formatted under different conditions, such as when writing all samples, a limited number of top samples, or when the number of samples is less than the requested top N. This ensures the output formatting logic is robust.
The design decision to use a worker pool (workerPoolSize
) for processing files in parallel significantly speeds up the analysis, especially when dealing with numerous benchmark result files often found in GCS. The use of golang.org/x/sync/errgroup
simplifies error handling in concurrent operations. Filtering capabilities (via the --filter
flag and go/query
) allow users to narrow down the analysis to specific subsets of benchmarks, making the tool more flexible and targeted. The output as a CSV file makes it easy to import the results into spreadsheets or other data analysis tools for further examination.
The /scripts
module provides tooling to support the data ingestion pipeline for Skia Perf. The primary focus is on automating the process of transferring processed data to the designated cloud storage location for further analysis and visualization within the Skia performance monitoring system.
The key responsibility of this module is to ensure reliable and timely delivery of performance data. This is achieved by interacting with Google Cloud Storage (GCS) using the gsutil
command-line tool.
The main component within this module is the upload_extracted_json_files.sh
script.
upload_extracted_json_files.sh
This shell script is responsible for uploading JSON files, which are assumed to be the output of a preceding data extraction or processing phase, to a specific Google Cloud Storage bucket (gs://skia-perf/nano-json-v1/
).
Design Rationale and Implementation Details:
gsutil
? gsutil
is the standard command-line tool for interacting with Google Cloud Storage. It provides robust features for uploading, downloading, and managing data in GCS buckets.-m
(parallel uploads)? The -m
flag in gsutil cp
enables parallel uploads. This is a crucial performance optimization, especially when dealing with a potentially large number of JSON files. By uploading multiple files concurrently, the overall time taken for the transfer is significantly reduced.cp -r
(recursive copy)? The -r
flag ensures that the entire directory structure under downloads/
is replicated in the destination GCS path. This is important for maintaining the organization of the data and potentially for downstream processing that might rely on the file paths.gs://skia-perf/nano-json-v1/$(date -u --date +1hour +%Y/%m/%d/%H)
)?gs://skia-perf/nano-json-v1/
: This is the base path in the GCS bucket designated for “nano” format JSON files, version 1. This structured naming helps in organizing different types and versions of data within the bucket.$(date -u --date +1hour +%Y/%m/%d/%H)
: This part dynamically generates a timestamped subdirectory structure.date -u
: Ensures the date is in UTC, providing a consistent timezone regardless of where the script is run.--date +1hour
: This is a deliberate choice to place the data into the next hour's ingestion slot. This likely provides a buffer, ensuring that all data generated within a given hour is reliably captured and processed for that hour, even if the script runs slightly before or after the hour boundary. It helps prevent data from being missed or attributed to the wrong time window due to minor timing discrepancies in script execution.+%Y/%m/%d/%H
: Formats the date and time into a hierarchical path (e.g., 2023/10/27/15
). This organization is beneficial for:Workflow:
The script executes a simple, linear workflow:
downloads/
directory in the current working directory as the source of JSON files. [Local Filesystem] | ./downloads/ (contains *.json files)
YYYY/MM/DD/HH
. date command ---> YYYY/MM/DD/HH (e.g., 2023/10/27/15) | Target GCS Path: gs://skia-perf/nano-json-v1/YYYY/MM/DD/HH/
gsutil
to recursively copy all contents from downloads/
to the generated GCS path, utilizing parallel uploads for efficiency. ./downloads/* ---(gsutil -m cp -r)---> gs://skia-perf/nano-json-v1/YYYY/MM/DD/HH/
This script assumes that the downloads/
directory exists in the location where the script is executed and contains the JSON files ready for upload. It also presumes that the user running the script has the necessary gsutil
tool installed and configured with appropriate permissions to write to the specified GCS bucket.
The /secrets
module is responsible for managing the creation and configuration of secrets required for various Skia Perf services to operate. These secrets primarily involve Google Cloud service accounts and OAuth credentials for email sending. The scripts in this module automate the setup of these credentials, ensuring that services have the necessary permissions to interact with Google Cloud APIs and other resources.
The design philosophy emphasizes secure and automated credential management. Instead of manual creation and configuration of secrets, these scripts provide a repeatable and version-controlled way to provision them. This reduces the risk of human error and ensures that services are configured with the principle of least privilege. For instance, service accounts are granted only the specific roles they need to perform their tasks.
1. Service Account Creation Scripts:
create-flutter-perf-service-account.sh
: This script provisions a Google Cloud service account specifically for the Flutter Perf instance. It leverages a common script (../../kube/secrets/add-service-account.sh
) to handle the underlying gcloud
commands.
add-service-account.sh
script, passing in parameters like the project ID, the desired service account name (“flutter-perf-service-account”), a descriptive display name, and the necessary IAM roles (roles/pubsub.editor
, roles/cloudtrace.agent
).create-perf-cockroachdb-backup-service-account.sh
: This script creates a dedicated service account for the Perf CockroachDB backup cronjob.
../../kube/secrets/add-service-account.sh
. It specifies the service account name (“perf-cockroachdb-backup”) and the roles/storage.objectAdmin
role, which grants permissions to manage objects in Cloud Storage buckets.create-perf-ingest-sa.sh
: This script is responsible for creating the perf-ingest
service account. This account is used by the Perf ingestion service, which processes and stores performance data.
gs://skia-perf
, gs://cluster-telemetry-perf
). A dedicated service account with these precise permissions is crucial for security and operational clarity. It also leverages Workload Identity, a more secure way for Kubernetes workloads to access Google Cloud services.../kube/config.sh
) and utility functions (../bash/ramdisk.sh
) for environment setup.perf-ingest
) using gcloud iam service-accounts create
.roles/pubsub.editor
: To publish messages to Pub/Sub.roles/cloudtrace.agent
: To send trace data.default/perf-ingest
in the skia-public
namespace) to the Google Cloud service account. This allows pods running as perf-ingest
in Kubernetes to impersonate the perf-ingest
Google Cloud service account without needing to mount service account key files directly. Kubernetes Pod (default/perf-ingest) ----> Impersonates ----> Google Cloud SA (perf-ingest@skia-public.iam.gserviceaccount.com) | +----> Accesses GCP Resources (Pub/Sub, Cloud Trace, GCS)
objectViewer
permissions on specific GCS buckets using gsutil iam ch
.perf-ingest.json
).perf-ingest
from this key file using kubectl create secret generic
. This secret can then be used by deployments that might not be able to use Workload Identity directly or for other specific use cases./tmp/ramdisk
) to avoid leaving sensitive key files on persistent storage.create-perf-sa.sh
: This script creates the primary skia-perf
service account. This is a general-purpose service account for the main Perf application.
gs://skia-perf
bucket. Similar to perf-ingest
, this service account uses Workload Identity for enhanced security when running within Kubernetes.create-perf-ingest-sa.sh
:skia-perf
service account.roles/cloudtrace.agent
and roles/pubsub.editor
.default/skia-perf
) to the skia-perf
Google Cloud service account.objectViewer
on the gs://skia-perf
GCS bucket.skia-perf
.2. Email Secrets Creation:
create-email-secrets.sh
: This script facilitates the creation of Kubernetes secrets necessary for Perf to send emails via Gmail. This typically involves an OAuth 2.0 flow.alertserver@skia.org
).alertserver-skia-org
).client_secret.json
file (obtained from the Google Cloud Console after enabling the Gmail API and creating OAuth 2.0 client credentials) to /tmp/ramdisk
.three_legged_flow
Go program (which must be built and installed separately from ../go/email/three_legged_flow
). This program initiates the OAuth 2.0 three-legged authentication flow. User Action: Run three_legged_flow --> Browser opens for Google Auth --> User authenticates as specified email | v three_legged_flow generates client_token.json
client_token.json
(containing the authorization token and refresh token) is generated in /tmp/ramdisk
, the script uses kubectl create secret generic
to create a Kubernetes secret named perf-${EMAIL}-secrets
. This secret contains both client_secret.json
and client_token.json
.client_token.json
file from the local filesystem because it contains a sensitive refresh token. The source of truth for this token becomes the Kubernetes secret./tmp/ramdisk
ensures that sensitive downloaded and generated files are stored in memory and are less likely to be inadvertently persisted.The common pattern across these scripts is the use of gcloud
for Google Cloud resource management and kubectl
for interacting with Kubernetes to store the secrets. The use of a ramdisk for temporary storage of sensitive files like service account keys and OAuth tokens is a security best practice. Workload Identity is preferred for service accounts running in GKE, reducing the need to manage and distribute service account key files.