blob: b1520710b8c7595c5a27ea97823482589fbab69a [file] [view]
# Triage
The new perf needs a new triaging page, one that tracks regressions per
commit, and allows multiple different queries, such as "source_type=skp" to be
run.
## Requirements
Right now alerting is run only on the last 50 commits as one monolithic query
for skps only. The new triaging page should:
- Run clustering with a window of +/- 5 commits on either side for each
commit.
- Only steps that occur at the selected commit will count as a Regression.
- Low and high clusters are stored per commit, per query. (As opposed to by
cluster only in the old system).
- Both low and high cluster regressions are stored, if present.
- Each query and low/high cluster needs to be triaged.
## Design
The UI will look roughly like this:
+----------------------------------+------------------------------------------+
| Commit | Queries |
+-------------------------------------------------------+---------------------+
| | source_type=skp | source_type=svg |
| | sub_result=min_ms | sub_result=min_ms |
| +--------+--------------------+------------+
| | High | Low | High | Low |
| +--------+--------------------+------------+
| 5166651 Fix FreePageCount alert. | | | | |
| 669f2da [task scheduler] Get the | | | | |
| a6ebcd2 Make Status categorize I | | | | |
| 501bc48 [CQ Watcher] Log when a | | | | ? |
| 1b31bb5 [CT] Reduce number of pa | | | | |
| ea80add Add ability to specify c | | | | |
| c844730 Fix alert queries due to | | | | |
+----------------------------------+--------+-----------+--------+------------+
- In the above sketch:
- '?' means an untriaged regression was found.
- '✓' means this change is acceptable.
- '✗' means bug.
- ' ' means sufficient data to cluster, but no regressions found.
- One column for each query, two sub-cols for low and high regressions.
- Queries can be added/removed semi-easily (flags or metadata).
- The queries need to be stored with per-commit data, since they may change
over time, i.e. we may add new queries or stop running old queries.
- Each non-empty query/commit/(low|high) cell has:
- Status:
- untriaged - Analysis found regression, but not yet triaged.
- negative - This is a regression.
- positive - An expected change in behavior, or a noisy cluster.
- A text comment.
- Each low/high cell pops up a cluster-summary2-sk that allows inspecting
the centroid of the regression, the members of the cluster, and triaging
the cluster.
The data stored for each commit of the triage page analysis will be:
// map[query]Regression.
map[string]Regression
type Regression struct {
Low *cluster2.ClusterSummary, // Can be nil.
High *cluster2.ClusterSummary, // Can be nil.
Frame dataframe.FrameResponse,
LowStatus TriageStatus
HighStatus TriageStatus
}
type TriageStatus struct {
Status string
Message string
}
Status is "untriaged", "positive", or "negative".
The alerting analysis will run the following analysis continuously:
The alert system will define a range [last 50 commits].
For each commit:
For each query:
* Do clustering.
* Save results into database for regressions that show up
for that commit.
## UI Model
The map[string]Regression is delivered to the browser as the following
struct to make it easier to use Polymer repeat templates:
{
header: [ "query1", "query2", "query3", ...],
table: [
{ id: cid1, cols: [ Regression, Regression, Regression, ...], },
{ id: cid2, cols: [ Regression, null, Regression, ...], },
{ id: cid3, cols: [ Regression, Regression, Regression, ...], },
]
}
Note that the list of queries is the union of the current queries being run, and
all the queries that appear in all the map[string]Regression's.
The UI for Regression must be able to handle null for a value, which signifies
a cell for which no data exists.