perf/server/DESIGN.md - buildbot - Git at Google

 DESIGN
 ======


 Overview
 --------
 Provides interactive dashboard for Skia performance data.

 Code Locations
 --------------

 The code for the server along with VM instance setup scripts is kept in:

   * https://skia.googlesource.com/buildbot/+/master/perf/server


 Architecture
 ------------

 This is the general flow of data for the Skia performance application.
 The frontend is available at http://skiaperf.com.


                +-------------+
                |             |
                |   Browser   |
                |             |
                |             |
                |             |
                +----------^--+
                           |
           +--------------------+----+-----+
           |   GCE Instance|skia+perf+b    |
           |               |               |
           |   +-----------+----------+    |
           |   |     Squid3           |    |
           |   |                      |    |
           |   +--------^-------------+    |
           |            |                  |
           | +----------+-------------+    |
           | |        Perf (Go)       |    |
           | | ^    ^                 |    |
           | +------------------------+    |
           |   |    |                      |
           |   |    |                      |
           |   |    | +------------------+ |
           |   |    | |Tile Pipeline (Go)| |
           |   |    | |            ^     | |
           |   |    | +--+---------------+ |
           |   |    |    |         |       |
           +-------------------------------+
               |    |    |         |
     +---------+-+  |    | +-------+--+
     |   MySQL   |  |    | | BigQuery |
     |           |  |    | |          |
     |           |  |    | |          |
     |           |  |    | |          |
     |           |  |    | |          |
     |           |  |    | |          |
     +-----------+  |    | +----------+
                    |    |
                  +-+----v---+
                  |   Tile   |
                  |   Repo   |
                  |          |
                  |          |
                  |          |
                  |          |
                  +----------+

 Perf is a Go application that serves the HTML, CSS, JS and the JSON representations
 that the JS needs. It loads test results in the form of 'tiles' from the Tile Repo.
 It combines that data with data about commits and annotations from the MySQL data base
 and serves that the UI.

 The Tile Pipeline is a separate application that periodically queries for fresh
 data from BigQuery and then writes Tiles into the Tile Repo. Note that when
 ingestion moves out of prod and into the same server we can do the tile updates
 immediately after ingestion is done.

 Tile Repo will be represented internally as an interface, the first
 implemetation will be as files on the local disk, with a directory tree that
 contains gzipped JSON files called tiles. Note that we may alternatively use Go
 native gob encoded files and just transform then into JSON when serving them to
 the UI.

 Each tile contains exactly 16 points of every trace for a dataset.  The one
 exception being the last.gz tile, which may contain less that 16 points; see
 below for an explanation of that.  The Tile Repo directory structure is:

     $TILE_REPO_ROOT/<dataset>/<scale>/<tilenumber>.gz

 Where:

   * dataset = {skps|micro}
   * scale = 0..5 The scale factor of 4^N, so points in the /0/ directory
                  represent 1:1 with test results, while tiles in the /1/
                  directory have every fourth commit with data, and /2/
                  has every 16th commit with data.
   * tilenumber = The number of the tile, at the given scale, starting at BOT
                  (Beginning of Time).

 So the data in:

     /skps/0/0.gz

 contains the data for the first 16 commits from the BOT that have test data.


     /micro/3/6.gz

 contains the 16 commits per trace of every 64th commit that has data and is the
 7th data set in that order. So it contains 16 points per trace, and each point
 falls between 6 * 64 * 16=6144 and 7 * 64 * 16=7168, i.e it is a slice of 16
 commits that represent a range of 1024 commits.

 A manifest file will be available that give the commit timestamp ranges for
 each tile.

 When navigating the UI users can select the tiles they are looking at (<, >)
 and also change the scaling factor that they are looking at (+,-).

 Each /dataset/scale directory also contains a file, last.gz, that contains the
 most recent data, from 1 to 16 points. The Tile Pipeline process will update
 the 'last.gz' files in each directory and write new tile files as new data
 arrives. Last.gz will contain the id of the tile that appears just before it,
 that way the UI can request /skps/0/last.gz and know how to proceed from
 there.

 URL Structure
 -------------

 The URL structure for retrieving Datasets is TBD.


 Navigating
 ----------

 For each point if the user wants to zoom out, add 1 to the scale factor and
 divide tilenumber by two. Do the opposite to zoom in.  To move forwards or
 backwards in time add or subtract 1 to the tile number. The actual UI
 mechanisms for navigating around traces are TBD, this is just a description of
 how the tiles are arranged.


 Tile Pipeline Algorithm
 -----------------------
 Start at /0/last.gz and find the previous tilenumber.gz, open that and find
 the last githash. Query for all data newer than that githash.  Group into tiles
 of 16 githashes. Put the remainder (or the last 16 if the remainder is 0) into
 /0/last.gz.

 Now do the same for /1/, but the data comes from /0/ and not from BigQuery.
 I.e.  /1/0.gz is just a sampling of /0/0.gz, /0/1.gz, /0/2.gz and /0/3.gz.
 At each level only rewrite last.gz and write out any new complete tiles as
 they are filled.


 Perf Stats Database
 -------------------

 The data for the performance metrics are kept in the BigQuery tables stored
 in the google.com:chrome-skia project. Note that this is a different project
 from where the data is accessed, which is by VM instances running under
 the google.com:skia-buildbots project. For this to work the service account
 email of the VM needs to be added to the permissions group of the
 google.com:chrome-skia project. If this isn't done then the BigQuery access
 will fail with a 403 error.


 Logs
 ----

 We use the https://github.com/golang/glog for logging, which puts Google style
 Error, Warning and Info logs in /tmp on the server under the 'perf' account.


 Debugging Tips
 --------------

 Starting the application is done via /etc/init.d/perf which does the
 backgrounding itself via start-stop-daemon, which means that if the app
 crashes when first starting then nothing will make it to the logs. To debug
 the cause in that case edit /etc/init.d/perf and remove the --background
 flag and then run:

   $ sudo /etc/init.d/perf start

 And you should get stdout and stderr output.

 Monitoring
 ----------

 Monitoring of the application is done via Graphite at http://skiamonitor.com.
 Both system and application level metrics are monitored.


 Annotations Database
 --------------------

 A Cloud SQL (a cloud version of MySQL) database is used to keep information on
 Skia git revisions and their corresponding annotations. The database will be
 updated when users add/edit/delete annotations via the dashboard UI.

 All passwords for MySQL are stored in valentine (search "skia perf").

 To connect to the database from authorized network (including skia-perf GCE):

     $ mysql -h 173.194.104.24 -u root -p

 Initial setup of the database, the users, and the tables:

     CREATE DATABASE skia;
     USE skia;
     CREATE USER 'readonly'@'%' IDENTIFIED BY <password in valentine>;
     GRANT SELECT ON *.* TO 'readonly'@'%';
     CREATE USER 'readwrite'@'%' IDENTIFIED BY <password in valentine>;
     GRANT SELECT, DELETE, UPDATE, INSERT ON *.* TO 'readwrite'@'%';

     // Table for storing annotations.
     CREATE TABLE notes (
       id     INT       NOT NULL AUTO_INCREMENT PRIMARY KEY,
       type   TINYINT,
       author TEXT,
       notes  TEXT      NOT NULL
     );

     // Table for storing git revision information.
     CREATE TABLE githash (
       githash   VARCHAR(40)   NOT NULL PRIMARY KEY,
       ts        TIMESTAMP     NOT NULL,
       gitnumber INT           NOT NULL,
       author    TEXT          NOT NULL,
       message   TEXT          NOT NULL
     );

     // Table for mapping revisions and annotations. This support many-to-many
     // mapping.
     CREATE TABLE githashnotes (
       githash VARCHAR(40)  NOT NULL,
       ts      TIMESTAMP    NOT NULL,
       id      INT          NOT NULL,

       FOREIGN KEY (githash) REFERENCES githash(githash),
       FOREIGN KEY (id) REFERENCES notes(id)
     );

     CREATE TABLE shortcuts (
       id      INT             NOT NULL AUTO_INCREMENT PRIMARY KEY,
       traces  MEDIUMTEXT      NOT NULL
     );

 Common queries that the dashboard will use:

     INSERT INTO notes (type, author, notes) VALUES (1, 'bsalomon', 'Alert!');

     SELECT LAST_INSERT_ID();

     INSERT INTO githashnotes (ts, id) VALUES (<githash_ts>, <last_insert_id>);

 The above set of commands will usually be used together to add new annotations
 and associate them with corresponding git commits. The commands below remove an
 annotation and its associations with any commit.

     DELETE FROM githashnotes WHERE id = <id_to_delete>;

     DELETE FROM notes WHERE id = <id_to_delete>;

 Since the data size is relatively small, the dashboard server can keep a copy of
 all recent commit info (e.g., for constructing a "blamelist"), annotations, and
 their many-to-many relationship for use in the context.

 Password for the database will be stored in the metadata instance. To see the
 current password stored in metadata and the fingerprint:

     gcutil --project=google.com:skia-buildbots getinstance [skia-perf GCE instance]

 To set the mysql password that perf is to use:

     gcutil --project=google.com:skia-buildbots setinstancemetadata [skia-perf GCE instance] --metadata=readonly:[password-from-valentine] --metadata=readwrite:[password-from-valentine] --fingerprint=[the metadata fingerprint]


 Startup and config
 ------------------
 The server is started and stopped via:

     sudo /etc/init.d/perf [start|stop|restart]

 But sysv init only handles starting and stopping a program once, so we use
 Monit to monitor the application and restart it if it crashes. The config
 is in:

     /etc/monit/conf.d/perf

 Installation
 ------------
 See the README file.
	DESIGN
	======


	Overview
	--------
	Provides interactive dashboard for Skia performance data.

	Code Locations
	--------------

	The code for the server along with VM instance setup scripts is kept in:

	* https://skia.googlesource.com/buildbot/+/master/perf/server


	Architecture
	------------

	This is the general flow of data for the Skia performance application.
	The frontend is available at http://skiaperf.com.


	+-------------+
	\| \|
	\| Browser \|
	\| \|
	\| \|
	\| \|
	+----------^--+
	\|
	+--------------------+----+-----+
	\| GCE Instance\|skia+perf+b \|
	\| \| \|
	\| +-----------+----------+ \|
	\| \| Squid3 \| \|
	\| \| \| \|
	\| +--------^-------------+ \|
	\| \| \|
	\| +----------+-------------+ \|
	\| \| Perf (Go) \| \|
	\| \| ^ ^ \| \|
	\| +------------------------+ \|
	\| \| \| \|
	\| \| \| \|
	\| \| \| +------------------+ \|
	\| \| \| \|Tile Pipeline (Go)\| \|
	\| \| \| \| ^ \| \|
	\| \| \| +--+---------------+ \|
	\| \| \| \| \| \|
	+-------------------------------+
	\| \| \| \|
	+---------+-+ \| \| +-------+--+
	\| MySQL \| \| \| \| BigQuery \|
	\| \| \| \| \| \|
	\| \| \| \| \| \|
	\| \| \| \| \| \|
	\| \| \| \| \| \|
	\| \| \| \| \| \|
	+-----------+ \| \| +----------+
	\| \|
	+-+----v---+
	\| Tile \|
	\| Repo \|
	\| \|
	\| \|
	\| \|
	\| \|
	+----------+

	Perf is a Go application that serves the HTML, CSS, JS and the JSON representations
	that the JS needs. It loads test results in the form of 'tiles' from the Tile Repo.
	It combines that data with data about commits and annotations from the MySQL data base
	and serves that the UI.

	The Tile Pipeline is a separate application that periodically queries for fresh
	data from BigQuery and then writes Tiles into the Tile Repo. Note that when
	ingestion moves out of prod and into the same server we can do the tile updates
	immediately after ingestion is done.

	Tile Repo will be represented internally as an interface, the first
	implemetation will be as files on the local disk, with a directory tree that
	contains gzipped JSON files called tiles. Note that we may alternatively use Go
	native gob encoded files and just transform then into JSON when serving them to
	the UI.

	Each tile contains exactly 16 points of every trace for a dataset. The one
	exception being the last.gz tile, which may contain less that 16 points; see
	below for an explanation of that. The Tile Repo directory structure is:

	$TILE_REPO_ROOT/<dataset>/<scale>/<tilenumber>.gz

	Where:

	* dataset = {skps\|micro}
	* scale = 0..5 The scale factor of 4^N, so points in the /0/ directory
	represent 1:1 with test results, while tiles in the /1/
	directory have every fourth commit with data, and /2/
	has every 16th commit with data.
	* tilenumber = The number of the tile, at the given scale, starting at BOT
	(Beginning of Time).

	So the data in:

	/skps/0/0.gz

	contains the data for the first 16 commits from the BOT that have test data.


	/micro/3/6.gz

	contains the 16 commits per trace of every 64th commit that has data and is the
	7th data set in that order. So it contains 16 points per trace, and each point
	falls between 6 * 64 * 16=6144 and 7 * 64 * 16=7168, i.e it is a slice of 16
	commits that represent a range of 1024 commits.

	A manifest file will be available that give the commit timestamp ranges for
	each tile.

	When navigating the UI users can select the tiles they are looking at (<, >)
	and also change the scaling factor that they are looking at (+,-).

	Each /dataset/scale directory also contains a file, last.gz, that contains the
	most recent data, from 1 to 16 points. The Tile Pipeline process will update
	the 'last.gz' files in each directory and write new tile files as new data
	arrives. Last.gz will contain the id of the tile that appears just before it,
	that way the UI can request /skps/0/last.gz and know how to proceed from
	there.

	URL Structure
	-------------

	The URL structure for retrieving Datasets is TBD.


	Navigating
	----------

	For each point if the user wants to zoom out, add 1 to the scale factor and
	divide tilenumber by two. Do the opposite to zoom in. To move forwards or
	backwards in time add or subtract 1 to the tile number. The actual UI
	mechanisms for navigating around traces are TBD, this is just a description of
	how the tiles are arranged.


	Tile Pipeline Algorithm
	-----------------------
	Start at /0/last.gz and find the previous tilenumber.gz, open that and find
	the last githash. Query for all data newer than that githash. Group into tiles
	of 16 githashes. Put the remainder (or the last 16 if the remainder is 0) into
	/0/last.gz.

	Now do the same for /1/, but the data comes from /0/ and not from BigQuery.
	I.e. /1/0.gz is just a sampling of /0/0.gz, /0/1.gz, /0/2.gz and /0/3.gz.
	At each level only rewrite last.gz and write out any new complete tiles as
	they are filled.


	Perf Stats Database
	-------------------

	The data for the performance metrics are kept in the BigQuery tables stored
	in the google.com:chrome-skia project. Note that this is a different project
	from where the data is accessed, which is by VM instances running under
	the google.com:skia-buildbots project. For this to work the service account
	email of the VM needs to be added to the permissions group of the
	google.com:chrome-skia project. If this isn't done then the BigQuery access
	will fail with a 403 error.


	Logs
	----

	We use the https://github.com/golang/glog for logging, which puts Google style
	Error, Warning and Info logs in /tmp on the server under the 'perf' account.


	Debugging Tips
	--------------

	Starting the application is done via /etc/init.d/perf which does the
	backgrounding itself via start-stop-daemon, which means that if the app
	crashes when first starting then nothing will make it to the logs. To debug
	the cause in that case edit /etc/init.d/perf and remove the --background
	flag and then run:

	$ sudo /etc/init.d/perf start

	And you should get stdout and stderr output.

	Monitoring
	----------

	Monitoring of the application is done via Graphite at http://skiamonitor.com.
	Both system and application level metrics are monitored.


	Annotations Database
	--------------------

	A Cloud SQL (a cloud version of MySQL) database is used to keep information on
	Skia git revisions and their corresponding annotations. The database will be
	updated when users add/edit/delete annotations via the dashboard UI.

	All passwords for MySQL are stored in valentine (search "skia perf").

	To connect to the database from authorized network (including skia-perf GCE):

	$ mysql -h 173.194.104.24 -u root -p

	Initial setup of the database, the users, and the tables:

	CREATE DATABASE skia;
	USE skia;
	CREATE USER 'readonly'@'%' IDENTIFIED BY <password in valentine>;
	GRANT SELECT ON . TO 'readonly'@'%';
	CREATE USER 'readwrite'@'%' IDENTIFIED BY <password in valentine>;
	GRANT SELECT, DELETE, UPDATE, INSERT ON . TO 'readwrite'@'%';

	// Table for storing annotations.
	CREATE TABLE notes (
	id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
	type TINYINT,
	author TEXT,
	notes TEXT NOT NULL
	);

	// Table for storing git revision information.
	CREATE TABLE githash (
	githash VARCHAR(40) NOT NULL PRIMARY KEY,
	ts TIMESTAMP NOT NULL,
	gitnumber INT NOT NULL,
	author TEXT NOT NULL,
	message TEXT NOT NULL
	);

	// Table for mapping revisions and annotations. This support many-to-many
	// mapping.
	CREATE TABLE githashnotes (
	githash VARCHAR(40) NOT NULL,
	ts TIMESTAMP NOT NULL,
	id INT NOT NULL,

	FOREIGN KEY (githash) REFERENCES githash(githash),
	FOREIGN KEY (id) REFERENCES notes(id)
	);

	CREATE TABLE shortcuts (
	id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
	traces MEDIUMTEXT NOT NULL
	);

	Common queries that the dashboard will use:

	INSERT INTO notes (type, author, notes) VALUES (1, 'bsalomon', 'Alert!');

	SELECT LAST_INSERT_ID();

	INSERT INTO githashnotes (ts, id) VALUES (<githash_ts>, <last_insert_id>);

	The above set of commands will usually be used together to add new annotations
	and associate them with corresponding git commits. The commands below remove an
	annotation and its associations with any commit.

	DELETE FROM githashnotes WHERE id = <id_to_delete>;

	DELETE FROM notes WHERE id = <id_to_delete>;

	Since the data size is relatively small, the dashboard server can keep a copy of
	all recent commit info (e.g., for constructing a "blamelist"), annotations, and
	their many-to-many relationship for use in the context.

	Password for the database will be stored in the metadata instance. To see the
	current password stored in metadata and the fingerprint:

	gcutil --project=google.com:skia-buildbots getinstance [skia-perf GCE instance]

	To set the mysql password that perf is to use:

	gcutil --project=google.com:skia-buildbots setinstancemetadata [skia-perf GCE instance] --metadata=readonly:[password-from-valentine] --metadata=readwrite:[password-from-valentine] --fingerprint=[the metadata fingerprint]


	Startup and config
	------------------
	The server is started and stopped via:

	sudo /etc/init.d/perf [start\|stop\|restart]

	But sysv init only handles starting and stopping a program once, so we use
	Monit to monitor the application and restart it if it crashes. The config
	is in:

	/etc/monit/conf.d/perf

	Installation
	------------
	See the README file.