perf/DESIGN.md - buildbot - Git at Google

 DESIGN
 ======

 Overview
 --------
 Provides interactive dashboard for Skia performance data.

 Code Locations
 --------------

 The code for the server along with VM instance setup scripts is kept in:

   * https://skia.googlesource.com/buildbot/+/master/perf/


 Architecture
 ------------

 This is the general flow of data for the Skia performance application.
 The frontend is available at http://skiaperf.com.

 ```

                +-------------+
                |             |
                |   Ingress   |
                |             |
                |             |
                |             |
                +-------------+
                           ^
                           |
               GKE Instance| skia-perf
                           |
                        ---+
                        |
             +----------+-------------+
             |        Perf (Go)       |
             +------------------------+
               ^    ^
               |    |
               |    |
               |    | +--------------------+
               |    | | Perf Ingester (Go) |
               |    | +--+-----------------+
               |    |    |       ^
               |    |    |       |
               v    |    |       |
     +---------+-+  |    | +-----+----+
     | Datastore |  |    | | Google   |
     |           |  |    | | Storage  |
     +-----------+  |    | +----------+
                    |    v
                  +-+--------+
                  |   Tile   |
                  |   Store  |
                  +----------+

 ```

 Perf is a Go application that serves the HTML, CSS, JS and the JSON representations
 that the JS needs. It loads test results in the form of 'tiles' from the Tile Store.
 It combines that data with data about commits and annotations from Google Datastore
 and serves that the UI.

 The Perf Ingester is a separate application that periodically queries for fresh
 data from Google Storage and then writes Traces into the Tile Store. The Tile Store
 is currently implemented on top of Google BigTable.

 Users
 -----

 Users must be logged in to access some content or to make some changes in the
 application, such as changing the status of perf alerts. User authentication
 is handled through OAuth 2.0, in this case specifically tied to the Google
 implementation. Once the OAuth 2.0 permission grant is complete then the users
 email is used as an identifer. The authentication is not stored on the server,
 instead it is stored as a cookie in the browser and verified when
 authentication is needed.

 There are two APIs, one in Go and another in Javascript that are used to
 access the current user and their logged in status:

 In Go the login.LoggedInAs(), see go/login/login.go.

 In Javascript the interface is sk.Login which is a Promise, see
 res/imp/login.html.

 Monitoring
 ----------

 Monitoring of the application is done via Graphite at https://grafana2.skia.org.
 Both system and application level metrics are monitored.


 Clustering
 ----------

 The clustering is done by using k-means clustering over normalized Traces. The
 Traces are normalized by filling in missing data points so that there is a
 data point for every commit, and then scaling the data to have a mean of 0.0
 and a standard deviation of 1.0. See the docs for ctrace.NewFullTrace().

 The distance metric used is Euclidean distance between the traces.

 After clustering is complete we calculate some metrics for each cluster by
 curve fitting a step function to the centroid. We record the location of the
 step, the size of the step, and the least squares error of the curve fit. From
 that data we calculate the "Regression" value, which measures how much like a
 step function the centroid is, and is calculated by:

   Regression = StepSize / LeastSquaresError.


 The better the fit the larger the Regression, because LSE gets smaller
 with a better fit. The higher the Step Size the larger the Regression.

 A cluster is considered "Interesting" if the Regression value is large enough.
 The current cutoff for Interestingness is:

   |Regression| > 150

 Where negative Regression values mean possible regressions, and positive
 values mean possible performance improvement.

 Alerting
 --------

 A dashboard is needed to report clusters that look "Interesting", i.e. could
 either be performance regressions, improvements, or other anomalies. The
 current k-means clustering and calculating the Regression statistic for each
 cluster does a good job of indicating when something Interesting has happened,
 but a more structured system is needed that:

   * Runs the clustering on a periodic basis.
   * Allows flagging of interesting clusters as either ignorable or a bug.
   * Finds clusters that are the same from run to run.

 The last step, finding clusters that are the same, will be done by
 fingerprinting, i.e. use the first 20 traces of each cluster will be used as a
 fingerprint for a cluster. That is, if a new cluster has some (or even one) of
 the same traces as the first 20 traces in an existing cluster, then they are
 the same cluster. Note that we use the first 20 because traces are stored
 sorted on how close they are to the centroid for the cluster.

 Algorithm:
   Run clustering and pick out the "Interesting" clusters.
   Compare all the Interestin clusters to all the existing relevant clusters,
     where "relevant" clusters are ones whose Hash/timestamp of the step
     exists in the current tile.
   Start with an empty "list".
   For each cluster:
     For each relevant existing cluster:
       Take the top 20 keys from the existing cluster and count how many appear
       in the cluster.
     If there are no matches then this is a new cluster, add it to the "list".
     If there are matches, possibly to multiple existing clusters, find the
     existing cluster with the most matches.
       Take the better of the two clusters (old/new) based on the better
       Regression score, i.e. larger |Regression|, and update that in the "list".
   Save all the clusters in the "list" back to the db.

 This algorithm should keep already triaged clusters in their triaged
 state while adding new unique clusters as they appear.

 Example
 ~~~~~~~

 Let's say we have three existing clusters with the following trace ids:

     C[1], C[2], C[3,4]

 And we run clustering and get the followin four new clusters:

     N[1], N[3], N[4], N[5]

 In the end we should end up with the following clusters:

     C[1] or N[1]
     C[2]
     C[3,4] or N[3] or N[4]
     N[5]

 Where the 'or' chooses the cluster with the higher |Regression| value.

 Each unique cluster that's found will be stored in the datastore. The schema
 will be:

     CREATE TABLE clusters (
       id         INT          NOT NULL AUTO_INCREMENT PRIMARY KEY,
       ts         TIMESTAMP    NOT NULL,
       hash       TEXT         NOT NULL,
       regression FLOAT        NOT NULL,
       cluster    MEDIUMTEXT   NOT NULL,
       status     TEXT         NOT NULL,
       message    TEXT         NOT NULL
     );

 Where:
   'cluster' is the JSON serialized ClusterSummary struct.
   'ts' is the timestamp of the step in the step function.
   'status' is "New" for a new cluster, "Ignore", or "Bug".
   'hash' is the git hash at the step point.
   'message' is either a note on why this cluster is ignored, or a bug #.

 Note that only the id may remain stable over time. If a new cluster is found
 that matches the fingerprint of an exisiting cluster, but has a higher
 regression value, than the new cluster values will be written into the
 'clusters' table, including the ts, hash, and regression values.

 ~~~~~~~

 Trace IDs
 ---------

 Normal Trace IDs are of the form:

     ,key=value,key2=value2,

 See go/query for more details on structured keys.

 There are two other forms of trace ids:

   * Formula traces - A formula trace contains a formula to be evaluated which
     may generate either a single Formula trace that is added to the plot, such
     as ave(), or it may generate multiple calculated traces that are added to
     the plot, such as norm(). Note that formula traces are stored in shortcuts
     and added to plots even if it contains no data.

     Formula traces have IDs that begin with @. For example:

       norm(filter("config=8888"))

         or

       norm(filter("#54"))

 Installation
 ------------
 See the README file.

 Ingestion
 ---------

 Ingestion is now event driven, using PubSub events from GCS as files
 are written. The naming convention for those PubSub topics is:

     <app name>-<function>-<instance>

 For example, for Perf ingestion of Skia data the topic will be:

     perf-ingestion-skia
	DESIGN
	======

	Overview
	--------
	Provides interactive dashboard for Skia performance data.

	Code Locations
	--------------

	The code for the server along with VM instance setup scripts is kept in:

	* https://skia.googlesource.com/buildbot/+/master/perf/


	Architecture
	------------

	This is the general flow of data for the Skia performance application.
	The frontend is available at http://skiaperf.com.

	```

	+-------------+
	\| \|
	\| Ingress \|
	\| \|
	\| \|
	\| \|
	+-------------+
	^
	\|
	GKE Instance\| skia-perf
	\|
	---+
	\|
	+----------+-------------+
	\| Perf (Go) \|
	+------------------------+
	^ ^
	\| \|
	\| \|
	\| \| +--------------------+
	\| \| \| Perf Ingester (Go) \|
	\| \| +--+-----------------+
	\| \| \| ^
	\| \| \| \|
	v \| \| \|
	+---------+-+ \| \| +-----+----+
	\| Datastore \| \| \| \| Google \|
	\| \| \| \| \| Storage \|
	+-----------+ \| \| +----------+
	\| v
	+-+--------+
	\| Tile \|
	\| Store \|
	+----------+

	```

	Perf is a Go application that serves the HTML, CSS, JS and the JSON representations
	that the JS needs. It loads test results in the form of 'tiles' from the Tile Store.
	It combines that data with data about commits and annotations from Google Datastore
	and serves that the UI.

	The Perf Ingester is a separate application that periodically queries for fresh
	data from Google Storage and then writes Traces into the Tile Store. The Tile Store
	is currently implemented on top of Google BigTable.

	Users
	-----

	Users must be logged in to access some content or to make some changes in the
	application, such as changing the status of perf alerts. User authentication
	is handled through OAuth 2.0, in this case specifically tied to the Google
	implementation. Once the OAuth 2.0 permission grant is complete then the users
	email is used as an identifer. The authentication is not stored on the server,
	instead it is stored as a cookie in the browser and verified when
	authentication is needed.

	There are two APIs, one in Go and another in Javascript that are used to
	access the current user and their logged in status:

	In Go the login.LoggedInAs(), see go/login/login.go.

	In Javascript the interface is sk.Login which is a Promise, see
	res/imp/login.html.

	Monitoring
	----------

	Monitoring of the application is done via Graphite at https://grafana2.skia.org.
	Both system and application level metrics are monitored.


	Clustering
	----------

	The clustering is done by using k-means clustering over normalized Traces. The
	Traces are normalized by filling in missing data points so that there is a
	data point for every commit, and then scaling the data to have a mean of 0.0
	and a standard deviation of 1.0. See the docs for ctrace.NewFullTrace().

	The distance metric used is Euclidean distance between the traces.

	After clustering is complete we calculate some metrics for each cluster by
	curve fitting a step function to the centroid. We record the location of the
	step, the size of the step, and the least squares error of the curve fit. From
	that data we calculate the "Regression" value, which measures how much like a
	step function the centroid is, and is calculated by:

	Regression = StepSize / LeastSquaresError.


	The better the fit the larger the Regression, because LSE gets smaller
	with a better fit. The higher the Step Size the larger the Regression.

	A cluster is considered "Interesting" if the Regression value is large enough.
	The current cutoff for Interestingness is:

	\|Regression\| > 150

	Where negative Regression values mean possible regressions, and positive
	values mean possible performance improvement.

	Alerting
	--------

	A dashboard is needed to report clusters that look "Interesting", i.e. could
	either be performance regressions, improvements, or other anomalies. The
	current k-means clustering and calculating the Regression statistic for each
	cluster does a good job of indicating when something Interesting has happened,
	but a more structured system is needed that:

	* Runs the clustering on a periodic basis.
	* Allows flagging of interesting clusters as either ignorable or a bug.
	* Finds clusters that are the same from run to run.

	The last step, finding clusters that are the same, will be done by
	fingerprinting, i.e. use the first 20 traces of each cluster will be used as a
	fingerprint for a cluster. That is, if a new cluster has some (or even one) of
	the same traces as the first 20 traces in an existing cluster, then they are
	the same cluster. Note that we use the first 20 because traces are stored
	sorted on how close they are to the centroid for the cluster.

	Algorithm:
	Run clustering and pick out the "Interesting" clusters.
	Compare all the Interestin clusters to all the existing relevant clusters,
	where "relevant" clusters are ones whose Hash/timestamp of the step
	exists in the current tile.
	Start with an empty "list".
	For each cluster:
	For each relevant existing cluster:
	Take the top 20 keys from the existing cluster and count how many appear
	in the cluster.
	If there are no matches then this is a new cluster, add it to the "list".
	If there are matches, possibly to multiple existing clusters, find the
	existing cluster with the most matches.
	Take the better of the two clusters (old/new) based on the better
	Regression score, i.e. larger \|Regression\|, and update that in the "list".
	Save all the clusters in the "list" back to the db.

	This algorithm should keep already triaged clusters in their triaged
	state while adding new unique clusters as they appear.

	Example
	~~~~~~~

	Let's say we have three existing clusters with the following trace ids:

	C[1], C[2], C[3,4]

	And we run clustering and get the followin four new clusters:

	N[1], N[3], N[4], N[5]

	In the end we should end up with the following clusters:

	C[1] or N[1]
	C[2]
	C[3,4] or N[3] or N[4]
	N[5]

	Where the 'or' chooses the cluster with the higher \|Regression\| value.

	Each unique cluster that's found will be stored in the datastore. The schema
	will be:

	CREATE TABLE clusters (
	id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
	ts TIMESTAMP NOT NULL,
	hash TEXT NOT NULL,
	regression FLOAT NOT NULL,
	cluster MEDIUMTEXT NOT NULL,
	status TEXT NOT NULL,
	message TEXT NOT NULL
	);

	Where:
	'cluster' is the JSON serialized ClusterSummary struct.
	'ts' is the timestamp of the step in the step function.
	'status' is "New" for a new cluster, "Ignore", or "Bug".
	'hash' is the git hash at the step point.
	'message' is either a note on why this cluster is ignored, or a bug #.

	Note that only the id may remain stable over time. If a new cluster is found
	that matches the fingerprint of an exisiting cluster, but has a higher
	regression value, than the new cluster values will be written into the
	'clusters' table, including the ts, hash, and regression values.

	~~~~~~~

	Trace IDs
	---------

	Normal Trace IDs are of the form:

	,key=value,key2=value2,

	See go/query for more details on structured keys.

	There are two other forms of trace ids:

	* Formula traces - A formula trace contains a formula to be evaluated which
	may generate either a single Formula trace that is added to the plot, such
	as ave(), or it may generate multiple calculated traces that are added to
	the plot, such as norm(). Note that formula traces are stored in shortcuts
	and added to plots even if it contains no data.

	Formula traces have IDs that begin with @. For example:

	norm(filter("config=8888"))

	or

	norm(filter("#54"))

	Installation
	------------
	See the README file.

	Ingestion
	---------

	Ingestion is now event driven, using PubSub events from GCS as files
	are written. The naming convention for those PubSub topics is:

	<app name>-<function>-<instance>

	For example, for Perf ingestion of Skia data the topic will be:

	perf-ingestion-skia