blob: 1b6ae7ac19bf8ffeef380cdd34565d000b526bef [file] [log] [blame] [view] [edit]
# Cluster Telemetry Deployment & Maintenance Guide
The target audience for this document includes Google employees on the Skia
infra team. Other Googlers can request access to the tools and resources
mentioned here by contacting skiabot@. Access is not available to non-Googlers,
but you could create your own setup if desired.
## Intro
General overview of CT is available [here](https://skia.org/dev/testing/ct).
## Code locations
Frontend server code lives in:
- go/ctfe/...
- res/...
- templates/...
- elements.html
- ../package.json
- sys/...
To build the frontend server, use `make ctfe`. To create a new docker build,
use `make ctfe_release`.
To deploy the latest docker build on the kubernetes ctfe service, use
`make ctfe_push`.
Master code lives in:
- go/master_scripts/...
- go/poller/...
- go/frontend/...
- py/...
To build the master binaries, use `make master_scripts`. To create a new docker
build, use `make ctmaster_release`.
To deploy the latest docker build on the kubernetes ct-master servce, use
`make ctmaster_push`.
Worker code lives in go/worker_scripts/...
To build the worker binaries, use `make worker_scripts`.
Prober config is in probers.json5. Alerts config is in
../promk/prometheus/alerts_public.yml.
## Running locally
### Frontend
To start a local server, run:
```
make ctfe_debug && ctfe --local=true \
--port=:8000 \
--host=<your hostname>.cnc.corp.google.com \
--namespace=cluster-telemetry-staging
```
You can then access the server at [localhost:8000](http://localhost:8000/) or
<your hostname\>.cnc.corp.google.com:8000.
The `host` argument is optional and allows others to log in to your
server. Initially you will get an error when logging in; follow the instructions
on the error page. The `host` argument will also be included in metrics names.
To test prober config changes, edit the config in probers.json5 to
point to localhost:8000, then run `make && prober --use_metadata=false` from ../prober/.
To check metrics from a locally running server or prober, use Prometheus.
### Master
The master poller and master scripts require a /b directory containing various
repos and files. To run the master poller in dry-run mode (not very useful), run
```
make poller && poller --local=true \
--dry_run
```
To run master scripts locally, you may want to modify the code to skip steps or
exit early, e.g.:
```
diff --git a/ct/go/master_scripts/build_chromium/main.go b/ct/go/master_scripts/build_chromium/main.go
index 1c5c273..34ceb3a 100644
--- a/ct/go/master_scripts/build_chromium/main.go
+++ b/ct/go/master_scripts/build_chromium/main.go
@@ -73,6 +73,11 @@ func main() {
// Ensure webapp is updated and completion email is sent even if task fails.
defer updateWebappTask()
defer sendEmail(emailsArr)
+ if 1 == 1 {
+ time.Sleep(10 * time.Second)
+ taskCompletedSuccessfully = true
+ return
+ }
// Cleanup tmp files after the run.
defer util.CleanTmpDir()
// Finish with glog flush and how long the task took.
```
- Master scripts include `capture_archives_on_workers`,
`create_pagesets_on_workers`,
`run_chromium_perf_on_workers`.
You can run the poller as:
```
make poller && poller --local=true
```
### Workers
The Makefile has examples of running the worker scripts locally.
TODO(benjaminwagner): Add local flag and kill e2e_tests from Makefile.
## Running in prod
### Frontend
The CTFE production datastore instance is
[here](https://console.cloud.google.com/datastore/entities;kind=ChromiumPerfTasks;ns=cluster-telemetry/query/kind?project=skia-public).
The staging datastore instance is
[here](https://console.cloud.google.com/datastore/entities;kind=ChromiumPerfTasks;ns=cluster-telemetry-staging/query/kind?project=skia-public).
The frontend runs on a Google Cloud Kubernetes service named
[ctfe](https://console.cloud.google.com/kubernetes/service/us-central1-a/skia-public/default/ctfe?project=skia-public&organizationId=433637338589).
Its dockerfile is in ctfe/Dockerfile.
To build the frontend server, use `make ctfe`. To create a new docker build,
use `make ctfe_release`.
To deploy the latest docker build on the kubernetes ctfe service, use
`make ctfe_push`. You can then see the updated frontend at
[ct.skia.org](https://ct.skia.org/).
To access ctfe directly, use `kubectl exec -it $(kubectl get pod
--selector="app=ctfe" -o jsonpath='{.items[0].metadata.name}') bash`.
Frontend logs are available [here](https://console.cloud.google.com/logs/viewer?project=skia-public&advancedFilter=logName%3D%22projects%2Fskia-public%2Flogs%2Fctfe%22).
### Master
The poller and master scripts run on a Google cloud kubernetes service named
[ct-master](https://console.cloud.google.com/kubernetes/service/us-central1-a/skia-public/default/ct-master?project=skia-public&organizationId=433637338589).
Its dockerfile is in ct-master/Dockerfile.
To push a new build to the poller safely, check that the CTFE task queue is
empty and then run `make ctmaster_release && make ctmaster_push`.
To access ct-master directly, use `kubectl exec -it $(kubectl get pod
--selector="app=ct-master" -o jsonpath='{.items[0].metadata.name}') bash`.
Poller logs are available [here](https://console.cloud.google.com/logs/viewer?project=skia-public&advancedFilter=logName%3D%22projects%2Fskia-public%2Flogs%2Fct-master%22).
### Workers
Worker scripts are part of the docker build when `ctmaster_release` is run.
However, from time to time, it may be necessary to perform maintenance
tasks on all worker machines. In this case, the
[run_on_swarming_bots](https://skia.googlesource.com/buildbot/+show/master/scripts/run_on_swarming_bots/)
script can be used to update all
[CT bots](https://chrome-swarming.appspot.com/botlist?c=id&c=os&c=task&c=status&f=pool%3ACT&l=1000&s=id%3Aasc).
## Other maintenance
### Updating pagesets
TODO(rmistry): Where do CSV files come from, where to put in GS.
## Access to Golo
CT's Golo bots are visible [here](https://chrome-swarming.appspot.com/botlist?c=id&c=task&c=os&c=status&d=asc&f=pool%3ACT&k=zone&s=id).
To log in to Golo bots, see [go/chrome-infra-build-access](http://go/chrome-infra-build-access).