Cluster Telemetry Deployment & Maintenance Guide

The target audience for this document includes Google employees on the Skia infra team. Other Googlers can request access to the tools and resources mentioned here by contacting skiabot@. Access is not available to non-Googlers, but you could create your own setup if desired.

Intro

General overview of CT is available here.

Code locations

Frontend server code lives in:

  • go/ctfe/...
  • res/...
  • templates/...
  • elements.html
  • ../package.json
  • sys/...

To build the frontend server, use make ctfe. To create a new docker build, use make ctfe_release. To deploy the latest docker build on the kubernetes ctfe service, use make ctfe_push.

Master code lives in:

  • go/master_scripts/...
  • go/poller/...
  • go/frontend/...
  • py/...

To build the master binaries, use make master_scripts. To create a new docker build, use make ctmaster_release. To deploy the latest docker build on the kubernetes ct-master servce, use make ctmaster_push.

Worker code lives in go/worker_scripts/...

To build the worker binaries, use make worker_scripts.

Prober config is in probers.json5. Alerts config is in ../promk/prometheus/alerts_public.yml.

Running locally

Frontend

To start a local server, run:

make ctfe_debug && ctfe --local=true \
  --port=:8000 \
  --host=<your hostname>.cnc.corp.google.com \
  --namespace=cluster-telemetry-staging

You can then access the server at localhost:8000 or <your hostname>.cnc.corp.google.com:8000.

The host argument is optional and allows others to log in to your server. Initially you will get an error when logging in; follow the instructions on the error page. The host argument will also be included in metrics names.

To test prober config changes, edit the config in probers.json5 to point to localhost:8000, then run make && prober --use_metadata=false from ../prober/.

To check metrics from a locally running server or prober, use Prometheus.

Master

The master poller and master scripts require a /b directory containing various repos and files. To run the master poller in dry-run mode (not very useful), run

make poller && poller --local=true \
  --dry_run

To run master scripts locally, you may want to modify the code to skip steps or exit early, e.g.:

diff --git a/ct/go/master_scripts/build_chromium/main.go b/ct/go/master_scripts/build_chromium/main.go
index 1c5c273..34ceb3a 100644
--- a/ct/go/master_scripts/build_chromium/main.go
+++ b/ct/go/master_scripts/build_chromium/main.go
@@ -73,6 +73,11 @@ func main() {
        // Ensure webapp is updated and completion email is sent even if task fails.
        defer updateWebappTask()
        defer sendEmail(emailsArr)
+       if 1 == 1 {
+               time.Sleep(10 * time.Second)
+               taskCompletedSuccessfully = true
+               return
+       }
        // Cleanup tmp files after the run.
        defer util.CleanTmpDir()
        // Finish with glog flush and how long the task took.
  • Master scripts include capture_archives_on_workers, create_pagesets_on_workers, run_chromium_perf_on_workers.

You can run the poller as:

make poller && poller --local=true

Workers

The Makefile has examples of running the worker scripts locally.

TODO(benjaminwagner): Add local flag and kill e2e_tests from Makefile.

Running in prod

Frontend

The CTFE production datastore instance is here. The staging datastore instance is here.

The frontend runs on a Google Cloud Kubernetes service named ctfe. Its dockerfile is in ctfe/Dockerfile.

To build the frontend server, use make ctfe. To create a new docker build, use make ctfe_release. To deploy the latest docker build on the kubernetes ctfe service, use make ctfe_push. You can then see the updated frontend at ct.skia.org.

To access ctfe directly, use kubectl exec -it $(kubectl get pod --selector="app=ctfe" -o jsonpath='{.items[0].metadata.name}') bash.

Frontend logs are available here.

Master

The poller and master scripts run on a Google cloud kubernetes service named ct-master. Its dockerfile is in ct-master/Dockerfile.

To push a new build to the poller safely, check that the CTFE task queue is empty and then run make ctmaster_release && make ctmaster_push.

To access ct-master directly, use kubectl exec -it $(kubectl get pod --selector="app=ct-master" -o jsonpath='{.items[0].metadata.name}') bash.

Poller logs are available here.

Workers

Worker scripts are part of the docker build when ctmaster_release is run. However, from time to time, it may be necessary to perform maintenance tasks on all worker machines. In this case, the run_on_swarming_bots script can be used to update all CT bots.

Other maintenance

Updating pagesets

TODO(rmistry): Where do CSV files come from, where to put in GS.

Access to Golo

CT's Golo bots are visible here.

To log in to Golo bots, see go/chrome-infra-build-access.