blob: e8d806da4a60775fae3cffa6381a00455c7ac39c [file] [view]
# Infra Team Launch Checklist
## Assumptions
Your service:
- Is checked in to the `go.skia.org/infra` Git repo.
- Is an HTTP server written in Go with a front end in WebComponents (legacy apps
have Polymer)
- Will run in a Docker Container.
- Will use a `*.skia.org` domain.
- Does not handle personal data (additional steps will be required).
- Is not intended to be used by the public at large (additional steps will be
required).
[JSFiddle](https://github.com/google/skia-buildbot/tree/master/jsfiddle) is a
recent service that was launched that demonstrates the above. See go/ for the
server code and modules/ and pages/ for the front end.
## Coding
Use `go.skia.org/infra/go/sklog` for logging.
Add flags to your main package like:
port = flag.String("port", ":8002", "HTTP service port (e.g., ':8002')")
local = flag.Bool("local", false, "Running locally if true. As opposed to in production.")
promPort = flag.String("prom_port", ":20000", "Metrics service address (e.g., ':10110')")
resourcesDir = flag.String("resources_dir", "./dist", "The directory to find HTML, JS, and CSS files. If blank the current directory will be used.")
Call `common.InitWithMust([opt], [opt])` in your main function.
Use `go.skia.org/infra/go/login` paired with
`../../../infra-sk/modules/login.ts` (Legacy Polymer apps use
`res/imp/login.html`) and/or `go.skia.org/infra/go/webhook` for authentication.
When using OAuth, see the secrets section below for including client secrets.
Wrap your `http.Handler` (many services use
[chi.NewRouter()](https://github.com/go-chi/chi) with
`go.skia.org/infra/go/httputils.LoggingGzipRequestResponse` to provide
monitoring and logging of HTTP requests and responses. Then, wrap it in
`go.skia.org/infra/go/httputils.HealthzAndHTTPS` to add an unlogged /healthz
endpoint for use with GKE health monitoring and various HTTPS configuration.
Use `go.skia.org/infra/go/httputils.DefaultClientConfig` for HTTP clients, which
provides several features:
- ensures requests time out within a reasonable limit
- tracks how much load we place on external services
- optionally adds authentication to requests
- optionally adds automatic retries with exponential backoff
- optionally treats non-2xx responses as errors
Write your code with security in mind:
- Make sure your service is listed in the
[team security scan](http://go/skia-infra-scan).
- Follow the [team security guidelines](http://go/skia-infra-sec) when
developing your applicaation.
If you add any critical TODOs while you're coding, file a blocking bug for the
issue.
If your application requires Puppeteer tests, it should be explicitly opted in
by making any necessary changes to //puppeteer-tests/docker/run-tests.sh.
## Makefiles
It is customary to have the following commands in a Makefile for the service.
- `build` : Build a development version of the front end.
- `serve` : Run the demo pages of the front end in a "watch" mode. This command
is for primary development of front end pages.
- `core` : Build the server components.
- `watch` : Build a development version of the front end in watch mode. This
command is for running the server, but also making changes to the front end.
- `release` : Build a Docker container with the front end and backend parts in
it (see below).
## Docker
Running apps in Docker makes deployment and local testing much much easier. It
additionally allows integration with GKE.
Add an entry to the project BUILD.bazel file (e.g. `jsfiddle/Dockerfile`) to
build the Docker image for your service using the `skia_app_container` macro.
When choosing a base image, default to `//kube/basealpine:basealpine` unless
you need a bunch of CIPD packages, in which case `//kube/base-cipd:base-cipd`
can be used.
It is customary to include an ENTRYPOINT and CMD with sensible defaults for the
app. It's also a best practice to run the app as USER skia unless root is
absolutely needed.
Putting all the above together, a bare-bones example would look something
like:
skia_app_container(
name = "my_app_container",
dirs = {
"/usr/local/bin": [
[
"//my-app/go/my-app:my-app",
"0755",
],
],
},
entrypoint = "/usr/local/bin/my-app",
repository = "skia-public/my-app",
)
## Secrets and Service Accounts
If your app needs access to a GCS bucket or other similar things, it is
recommended you create a new service account for your app. See below for linking
it into the container.
Use an existing `create-sa.sh` script (e.g. `create-jsfiddle-sa.sh`) and tweak
the name, committing it into the app's root directory. Run this once to create
the service account and create the secrets in GKE.
## Authentication
Almost all applications should use
[google.DefaultTokenSource()](https://pkg.go.dev/golang.org/x/oauth2/google#DefaultTokenSource)
to create an
[oauth2.TokenSource](https://pkg.go.dev/golang.org/x/oauth2#TokenSource) to be
used for authenticated access to APIs and resources.
The call to
[google.DefaultTokenSource()](https://pkg.go.dev/golang.org/x/oauth2/google#DefaultTokenSource)
will follow the search algorithm in
[FindDefaultCredentialsWithParams](https://pkg.go.dev/golang.org/x/oauth2/google#FindDefaultCredentialsWithParams).
To run applications locally authenticated as yourself you can run:
```
gcloud auth application-default login
```
Which will place credentials at:
```
$HOME/.config/gcloud/application_default_credentials.json
```
that will be picked up by the application.
If you wish to override that behavior and use a different set of credentials
then set the Environment Variable `GOOGLE_APPLICATION_CREDENTIALS` that points
to a different file, such as a `key.json` file for a specific service account.
When running in kubernetes
[google.DefaultTokenSource()](https://pkg.go.dev/golang.org/x/oauth2/google#DefaultTokenSource)
will pick up credentials from GCP metadata or
[workload identity](http://go/skia-workload-identity).
## Using Git
Use of the Git binary itself is strongly discouraged unless it is unavoidable.
Please consider an alternative:
- go/gitiles provides an API for retrieving commit information, file contents,
git log, etc, via HTTP for repos hosted on Googlesource.
- go/gitstore provides a low-level interface for retrieving commit metadata by
time or index. This data is stored in BigTable and is ingested by the
`gitsync` app, which also sends PubSub messages for low-latency updates.
- go/vcsinfo/bt_vcs provides a similar interface for retrieving metadata but
adds caching and packages Gitiles into a common API.
- go/git/repograph provides a complete in-memory graph of a repository for fast
traversal. It loads data via go/gitstore and can be automatically updated via
PubSub.
- go/gerrit provides access to the Gerrit API, including uploading and
committing changes to repos which use Gerrit.
The following are valid reasons to use the Git binary itself:
- You need to do more complex write operations, eg. merges.
- You need a full local checkout of some code, eg. to compile and run tests, or
to run a script. Note that you can use go/gitiles to download a standalone
script, so a full checkout should not be necessary unless your use case
requires a large or changing set of files.
If you do need Git for your app, use the `base-cipd` Docker image, which
includes a pinned version of Git (as well as other tools). Do not install Git
via the package manager in your Docker image.
## Launching
- Write/update design doc so that others understand how to use, maintain, and
improve your service. `DESIGN.md` typically has high level design structures
(e.g. where is data stored, how do the pieces of software interact, etc).
`PROD.md` has an overview of the alerts and any other notes for maintaining
the service.
- Do some back-of-the-envelope calculations to make sure your service can handle
the expected load. If your service is latency-sensitive, measure the latency
under load.
- Test on browsers that your users will be using, at least Chrome on desktop and
ideally Chrome on Android.
- Create an `app.yaml` in
[k8s-config](https://skia.googlesource.com/k8s-config/+show/master/) This
controls how your app will be run in GKE. See
[these docs](https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/)
for more on the schema. Commit this, then watch the cluster to ensure that the
service is auto-deployed.
- Metrics are customarily made available at port 20000. To configure metrics
scraping the port should be named 'prom'. See [go/skia-infra-metrics](http://go/skia-infra-metrics)
for more details.
```yaml
ports:
- containerPort: 20000
name: prom
```
- Clusters run with [Cluster
Autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler),
which means that every pod should have the following annotation:
```yaml
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
```
If you need finer grained control over how your pods are started and stopped
that can be done by defining a
[PodDisruptionBudget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/).
CockroachDB defines a PodDisruptionBudget and is a good example of such a
budget.
- Metrics will be available on
[prom2.skia.org](https://prom2.skia.org/).
- The metrics will be labeled `app=<foo>` where `foo` is the first argument to
`common.InitWithMust`.
- If you have secrets (like a service account), bind it to the deployment by
adding the following to `app.yaml`:
```yaml
spec:
automountServiceAccountToken: false
...
containers:
- name: my-container
...
volumeMounts:
- name: my-app-sa
mountPath: /var/secrets/google
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/google/key.json
...
volumes:
- name: my-app-sa
secret:
secretName: my-app
```
- If you use OAuth, it should be configured to use the \*.skia.org cookie (the
default). Additionally, you will need to mount the secrets to use with your
login.Init\* code:
```yaml
spec:
...
containers:
- name: my-container
...
volumeMounts:
- name: skia-org-legacy-login-secrets
mountPath: /etc/skia.org/
...
volumes:
- name: skia-org-legacy-login-secrets
secret:
secretName: skia-org-legacy-login-secrets
```
- It is possible to test your service/config without making it publicly visible.
- Wait for your `app.yaml` to be auto-deployed.
- Identify a pod name, `kubectl get pods | grep [my-app]` where my-app is the
name of the new service.
- Forward a local port (e.g. 8083) to a port on the pod (e.g. the HTTP port
8000): `kubectl port-forward my-app-7bf542629-jujzm 8083:8000`
- Navigate a browser to <http://localhost:8083> to see your service.
- If you have simple routing needs, to make your service visible to the public
add a
[`skia.org.domain` annotation to your Service YAML](https://skia.googlesource.com/buildbot/+doc/refs/heads/master/skfe/README.md)
with the domain name and deploy your updated yaml with `kubectl apply`.
If your routing is more complicated you can skip the YAML annotation and write
the routing rules directly into `infra/skfe/k8s/default.conf`.
Either way you then push a new version of nginx-skia-org:
```
cd infra/skfe
make k8s_push
```
And watch that the new instances start running:
```
watch kubectl get pods -lapp=nginx-skia-org
```
- Add prober rules to `probers.json` in your application directory.
- Ideally, probe all public HTML pages and all nullipotent JSON endpoints. You
can write functions in `prober/go/prober/main.go` to check the response body
if desired.
- Add additional stats gathering to your program using
`go.skia.org/infra/go/metrics2`, e.g. to ensure liveness/heartbeat of any
background processes.
- Add alert rules to
[alerts_public](https://skia.googlesource.com/buildbot/+show/master/promk/prometheus/alerts_public.yml).
The alerts may link to a production manual, `PROD.md`, checked into the
application source directory. Examples:
- All prober rules.
- Additional stats from metrics2. Legacy apps have their alert rules in
`prometheus/sys/alert.rules`
- Some
[general metrics](https://skia.googlesource.com/buildbot/+show/master/promk/prometheus/alerts_general.yml)
apply to all apps and may not need to be added explicitly for your
application, such as: - Too many goroutines. - Free disk space on the instance
and any attached disks. - This is also for alerts that apply to skia-public
and skia-corp projects.
- Check your alert rules by running `make validate` in `promk/` (Legacy apps
should run that commaind in `prometheus/`).
- Then, after landing your valid alerts, run
`make push_config && make push_config_corp` in `promk/` (Again, legacy apps
should do `make push` in `prometheus/`).
- Tell people about your new service.
- Be prepared for bug reports. :-)
# Continuous Deployment
Some apps are set up to be continuously re-built and re-deployed on every commit
of Skia or Skia Infra. To do that, see
[docker_pushes_watcher/README.md](./docker_pushes_watcher/README.md).
# CockroachDB
When standing up a new CockroachDB cluster the following extra steps
need to be done:
1. Modify the YML file to add ephemeral-storage to the resource requests:
```
resources:
requests:
cpu: '2'
memory: '32Gi'
ephemeral-storage: 100M
```
2. Update `managed-prometheus-pod-monitoring.yml` because CockroachDB hosts metrics at a different path than the expected default of `/metrics`.
```
# CockroachDB hosts metrics at a different path.
apiVersion: monitoring.googleapis.com/v1
kind: ClusterPodMonitoring
metadata:
name: perf-cockroachdb-cluster-pod-monitoring
spec:
selector:
matchLabels:
app: perf-cockroachdb
endpoints:
- port: prom
path: _status/vars
interval: 15s
targetLabels:
fromPod:
- from: app
- from: appgroup
```
3. Rename the HTTP port from `http` to `prom` so it gets scraped:
```
ports:
- containerPort: 26257
name: grpc
- containerPort: 8080
name: prom
```