blob: e66e30faeb2b54378ce083aec2e29be7c1ac8009 [file] [log] [blame] [view]
DESIGN
======
Overview
--------
Provides interactive dashboard for Skia performance data.
Code Locations
--------------
The code for the server along with VM instance setup scripts is kept in:
* https://skia.googlesource.com/buildbot/+/master/perf/server
Architecture
------------
This is the general flow of data for the Skia performance application.
The frontend is available at http://skiaperf.com.
                                         
             +-------------+             
             |             |             
             |   Browser   |             
             |             |             
             |             |             
             |             |             
             +----------^--+             
                        |                
        +--------------------+----+-----+
        |   GCEInstance|skia+perf+b    |
        |               |               |
        |   +-----------+----------+    |
        |   |     Squid3           |    |
        |   |                      |    |
        |   +--------^-------------+    |
        |            |                  |
        | +----------+-------------+    |
        | |    Perf (Go)      |    |
        | | ^    ^                 |    |
        | +------------------------+    |
        |   |    |                      |
        |   |    |                      |
        |   |    | +------------------+ |
        |   |    | |TilePipeline (Go)| |
        |   |    | |            ^     | |
        |   |    | +--+---------------+ |
        |   |    |    |         |       |
        +-------------------------------+
            |    |    |         |        
  +---------+-+  |    | +-------+--+     
  |   MySQL   |  |    | | BigQuery |     
  |           |  |    | |          |     
  |           |  |    | |          |     
  |           |  |    | |          |     
  |           |  |    | |          |     
  |           |  |    | |          |     
  +-----------+  |    | +----------+     
                 |    |                  
               +-+----v---+              
               |   Tile   |              
               |   Repo   |              
               |          |              
               |          |              
               |          |              
               |          |              
               +----------+              
                                         
Perf is a Go application that serves the HTML, CSS, JS and the JSON representations
that the JS needs. It loads test results in the form of 'tiles' from the Tile Repo.
It combines that data with data about commits and annotations from the MySQL data base
and serves that the UI.
The Tile Pipeline is a separate application that periodically queries for fresh
data from BigQuery and then writes Tiles into the Tile Repo. Note that when
ingestion moves out of prod and into the same server we can do the tile updates
immediately after ingestion is done.
Tile Repo will be represented internally as an interface, the first
implemetation will be as files on the local disk, with a directory tree that
contains gzipped JSON files called tiles. Note that we may alternatively use Go
native gob encoded files and just transform then into JSON when serving them to
the UI.
Each tile contains exactly 16 points of every trace for a dataset. The one
exception being the last.gz tile, which may contain less that 16 points; see
below for an explanation of that. The Tile Repo directory structure is:
$TILE_REPO_ROOT/<dataset>/<scale>/<tilenumber>.gz
Where:
* dataset = {skps|micro}
* scale = 0..5 The scale factor of 4^N, so points in the /0/ directory
represent 1:1 with test results, while tiles in the /1/
directory have every fourth commit with data, and /2/
has every 16th commit with data.
* tilenumber = The number of the tile, at the given scale, starting at BOT
(Beginning of Time).
So the data in:
/skps/0/0.gz
contains the data for the first 16 commits from the BOT that have test data.
/micro/3/6.gz
contains the 16 commits per trace of every 64th commit that has data and is the
7th data set in that order. So it contains 16 points per trace, and each point
falls between 6 * 64 * 16=6144 and 7 * 64 * 16=7168, i.e it is a slice of 16
commits that represent a range of 1024 commits.
A manifest file will be available that give the commit timestamp ranges for
each tile.
When navigating the UI users can select the tiles they are looking at (<, >)
and also change the scaling factor that they are looking at (+,-).
Each /dataset/scale directory also contains a file, last.gz, that contains the
most recent data, from 1 to 16 points. The Tile Pipeline process will update
the 'last.gz' files in each directory and write new tile files as new data
arrives. Last.gz will contain the id of the tile that appears just before it,
that way the UI can request /skps/0/last.gz and know how to proceed from
there.
URL Structure
-------------
The URL structure for retrieving Datasets is TBD.
Navigating
----------
For each point if the user wants to zoom out, add 1 to the scale factor and
divide tilenumber by two. Do the opposite to zoom in. To move forwards or
backwards in time add or subtract 1 to the tile number. The actual UI
mechanisms for navigating around traces are TBD, this is just a description of
how the tiles are arranged.
Tile Pipeline Algorithm
-----------------------
Start at /0/last.gz and find the previous tilenumber.gz, open that and find
the last githash. Query for all data newer than that githash. Group into tiles
of 16 githashes. Put the remainder (or the last 16 if the remainder is 0) into
/0/last.gz.
Now do the same for /1/, but the data comes from /0/ and not from BigQuery.
I.e. /1/0.gz is just a sampling of /0/0.gz, /0/1.gz, /0/2.gz and /0/3.gz.
At each level only rewrite last.gz and write out any new complete tiles as
they are filled.
Perf Stats Database
-------------------
The data for the performance metrics are kept in the BigQuery tables stored
in the google.com:chrome-skia project. Note that this is a different project
from where the data is accessed, which is by VM instances running under
the google.com:skia-buildbots project. For this to work the service account
email of the VM needs to be added to the permissions group of the
google.com:chrome-skia project. If this isn't done then the BigQuery access
will fail with a 403 error.
Logs
----
We use the https://github.com/golang/glog for logging, which puts Google style
Error, Warning and Info logs in /tmp on the server under the 'perf' account.
Debugging Tips
--------------
Starting the application is done via /etc/init.d/perf which does the
backgrounding itself via start-stop-daemon, which means that if the app
crashes when first starting then nothing will make it to the logs. To debug
the cause in that case edit /etc/init.d/perf and remove the --background
flag and then run:
$ sudo /etc/init.d/perf start
And you should get stdout and stderr output.
Monitoring
----------
Monitoring of the application is done via Graphite at http://skiamonitor.com.
Both system and application level metrics are monitored.
Annotations Database
--------------------
A Cloud SQL (a cloud version of MySQL) database is used to keep information on
Skia git revisions and their corresponding annotations. The database will be
updated when users add/edit/delete annotations via the dashboard UI.
All passwords for MySQL are stored in valentine (search "skia perf").
To connect to the database from authorized network (including skia-perf GCE):
$ mysql -h 173.194.104.24 -u root -p
Initial setup of the database, the users, and the tables:
CREATE DATABASE skia;
USE skia;
CREATE USER 'readonly'@'%' IDENTIFIED BY <password in valentine>;
GRANT SELECT ON *.* TO 'readonly'@'%';
CREATE USER 'readwrite'@'%' IDENTIFIED BY <password in valentine>;
GRANT SELECT, DELETE, UPDATE, INSERT ON *.* TO 'readwrite'@'%';
// Table for storing annotations.
CREATE TABLE notes (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
type TINYINT,
author TEXT,
notes TEXT NOT NULL
);
// Table for storing git revision information.
CREATE TABLE githash (
githash VARCHAR(40) NOT NULL PRIMARY KEY,
ts TIMESTAMP NOT NULL,
gitnumber INT NOT NULL,
author TEXT NOT NULL,
message TEXT NOT NULL
);
// Table for mapping revisions and annotations. This support many-to-many
// mapping.
CREATE TABLE githashnotes (
githash VARCHAR(40) NOT NULL,
ts TIMESTAMP NOT NULL,
id INT NOT NULL,
FOREIGN KEY (githash) REFERENCES githash(githash),
FOREIGN KEY (id) REFERENCES notes(id)
);
CREATE TABLE shortcuts (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
traces MEDIUMTEXT NOT NULL
);
Common queries that the dashboard will use:
INSERT INTO notes (type, author, notes) VALUES (1, 'bsalomon', 'Alert!');
SELECT LAST_INSERT_ID();
INSERT INTO githashnotes (ts, id) VALUES (<githash_ts>, <last_insert_id>);
The above set of commands will usually be used together to add new annotations
and associate them with corresponding git commits. The commands below remove an
annotation and its associations with any commit.
DELETE FROM githashnotes WHERE id = <id_to_delete>;
DELETE FROM notes WHERE id = <id_to_delete>;
Since the data size is relatively small, the dashboard server can keep a copy of
all recent commit info (e.g., for constructing a "blamelist"), annotations, and
their many-to-many relationship for use in the context.
Password for the database will be stored in the metadata instance. To see the
current password stored in metadata and the fingerprint:
gcutil --project=google.com:skia-buildbots getinstance [skia-perf GCE instance]
To set the mysql password that perf is to use:
gcutil --project=google.com:skia-buildbots setinstancemetadata [skia-perf GCE instance] --metadata=readonly:[password-from-valentine] --metadata=readwrite:[password-from-valentine] --fingerprint=[the metadata fingerprint]
Startup and config
------------------
The server is started and stopped via:
sudo /etc/init.d/perf [start|stop|restart]
But sysv init only handles starting and stopping a program once, so we use
Monit to monitor the application and restart it if it crashes. The config
is in:
/etc/monit/conf.d/perf
Installation
------------
See the README file.