blob: bb93588f8f2926b610f2214a1e88b9ad404d4389 [file] [log] [blame] [view]
Triage
======
The new perf needs a new triaging page, one that tracks regressions per
commit, and allows multiple different queries, such as "source_type=skp" to be
run.
Requirements
------------
Right now alerting is run only on the last 50 commits as one monolithic query
for skps only. The new triaging page should:
* Run clustering with a window of +/- 5 commits on either side for each
commit.
* Only steps that occur at the selected commit will count as a Regression.
* Low and high clusters are stored per commit, per query. (As opposed to by
cluster only in the old system).
* Both low and high cluster regressions are stored, if present.
* Each query and low/high cluster needs to be triaged.
Design
------
The UI will look roughly like this:
+----------------------------------+------------------------------------------+
| Commit | Queries |
+-------------------------------------------------------+---------------------+
| | source_type=skp | source_type=svg |
| | sub_result=min_ms | sub_result=min_ms |
| +--------+--------------------+------------+
| | High | Low | High | Low |
| +--------+--------------------+------------+
| 5166651 Fix FreePageCount alert. | | | | |
| 669f2da [task scheduler] Get the | | | | |
| a6ebcd2 Make Status categorize I | | | | |
| 501bc48 [CQ Watcher] Log when a | | | | ? |
| 1b31bb5 [CT] Reduce number of pa | | | | |
| ea80add Add ability to specify c | | | | |
| c844730 Fix alert queries due to | | | | |
+----------------------------------+--------+-----------+--------+------------+
* In the above sketch:
* '?' means an untriaged regression was found.
* '✓' means this change is acceptable.
* '✗' means bug.
* ' ' means sufficient data to cluster, but no regressions found.
* One column for each query, two sub-cols for low and high regressions.
* Queries can be added/removed semi-easily (flags or metadata).
* The queries need to be stored with per-commit data, since they may change
over time, i.e. we may add new queries or stop running old queries.
* Each non-empty query/commit/(low|high) cell has:
* Status:
* untriaged - Analysis found regression, but not yet triaged.
* negative - This is a regression.
* positive - An expected change in behavior, or a noisy cluster.
* A text comment.
* Each low/high cell pops up a cluster-summary2-sk that allows inspecting
the centroid of the regression, the members of the cluster, and triaging
the cluster.
The data stored for each commit of the triage page analysis will be:
// map[query]Regression.
map[string]Regression
type Regression struct {
Low *cluster2.ClusterSummary, // Can be nil.
High *cluster2.ClusterSummary, // Can be nil.
Frame dataframe.FrameResponse,
LowStatus TriageStatus
HighStatus TriageStatus
}
type TriageStatus struct {
Status string
Message string
}
Status is "untriaged", "positive", or "negative".
The alerting analysis will run the following analysis continuously:
The alert system will define a range [last 50 commits].
For each commit:
For each query:
* Do clustering.
* Save results into database for regressions that show up
for that commit.
UI Model
--------
The map[string]Regression is delivered to the browser as the following
struct to make it easier to use Polymer repeat templates:
{
header: [ "query1", "query2", "query3", ...],
table: [
{ id: cid1, cols: [ Regression, Regression, Regression, ...], },
{ id: cid2, cols: [ Regression, null, Regression, ...], },
{ id: cid3, cols: [ Regression, Regression, Regression, ...], },
]
}
Note that the list of queries is the union of the current queries being run, and
all the queries that appear in all the map[string]Regression's.
The UI for Regression must be able to handle null for a value, which signifies
a cell for which no data exists.