blob: 11c5bd9216333347faf6c6bd9e0f58f929e7b26a [file] [log] [blame] [view]
# Infra Team Launch Checklist
## Assumptions
Your service:
- Is checked in to the `go.skia.org/infra` Git repo.
- Is an HTTP server written in Go with a front end in WebComponents (legacy apps
have Polymer)
- Will run in a Docker Container.
- Will use a `*.skia.org` domain.
- Does not handle personal data (additional steps will be required).
- Is not intended to be used by the public at large (additional steps will be
required).
[JSFiddle](https://github.com/google/skia-buildbot/tree/master/jsfiddle) is a
recent service that was launched that demonstrates the above. See go/ for the
server code and modules/ and pages/ for the front end.
## Coding
Use `go.skia.org/infra/go/sklog` for logging.
Add flags to your main package like:
port = flag.String("port", ":8002", "HTTP service port (e.g., ':8002')")
local = flag.Bool("local", false, "Running locally if true. As opposed to in production.")
promPort = flag.String("prom_port", ":20000", "Metrics service address (e.g., ':10110')")
resourcesDir = flag.String("resources_dir", "./dist", "The directory to find HTML, JS, and CSS files. If blank the current directory will be used.")
Call `common.InitWithMust([opt], [opt])` in your main function.
Use `go.skia.org/infra/go/login` paired with
`../../../infra-sk/modules/login.ts` (Legacy Polymer apps use
`res/imp/login.html`) and/or `go.skia.org/infra/go/webhook` for authentication.
When using OAuth, see the secrets section below for including client secrets.
Wrap your `http.Handler` (many services use
[mux.NewRouter()](https://github.com/gorilla/mux) with
`go.skia.org/infra/go/httputils.LoggingGzipRequestResponse` to provide
monitoring and logging of HTTP requests and responses. Then, wrap it in
`go.skia.org/infra/go/httputils.HealthzAndHTTPS` to add an unlogged /healthz
endpoint for use with GKE health monitoring and various HTTPS configuration.
Use `go.skia.org/infra/go/httputils.DefaultClientConfig` for HTTP clients, which
provides several features:
- ensures requests time out within a reasonable limit
- tracks how much load we place on external services
- optionally adds authentication to requests
- optionally adds automatic retries with exponential backoff
- optionally treats non-2xx responses as errors
Write your code with security in mind:
- Make sure your service is listed in the
[team security scan](http://go/skia-infra-scan).
- Follow the [team security guidelines](http://go/skia-infra-sec) when
developing your applicaation.
If you add any critical TODOs while you're coding, file a blocking bug for the
issue.
If your application requires Puppeteer tests, it should be explicitly opted in
by making any necessary changes to //puppeteer-tests/docker/run-tests.sh.
## Makefiles
It is customary to have the following commands in a Makefile for the service.
- `build` : Build a development version of the front end.
- `serve` : Run the demo pages of the front end in a "watch" mode. This command
is for primary development of front end pages.
- `core` : Build the server components.
- `watch` : Build a development version of the front end in watch mode. This
command is for running the server, but also making changes to the front end.
- `release` : Build a Docker container with the front end and backend parts in
it (see below).
- `push` : Depends on release, and then pushes to GKE using `pushk`.
## Docker
Running apps in Docker makes deployment and local testing much much easier. It
additionally allows integration with GKE. Some legacy apps are not yet run in
Docker, but it is the goal to have everything on GKE+Docker.
Create a Dockerfile for your app in the root of the project folder (e.g.
`jsfiddle/Dockerfile`). If there are multiple services, put them in a named
folder (e.g. `fiddlek/fiddle/Dockerfile`, `fiddlek/fiddler/Dockerfile`).
When choosing a base image, consider our light wrappers, found in `kube/*`. For
example, `kube/basealpine/Dockerfile` which can be used by having
`FROM gcr.io/skia-public/basealpine:3.9` as the first line in a Dockerfile.
We have a helper script for 'installing' an app into a Docker container,
`bash/docker_build.sh`. A call to this script is customarily put in a bash
script which is called by `make release`. See `jsfiddle/build_release` for an
example. To integrate docker_build.sh into the actual container, add a
`COPY . /` to copy the executable(s) and HTML/JS/CSS from the build context into
the container. Legacy apps have a similar set-up, but for building a Debian
package instead of a container.
It is customary to include an ENTRYPOINT and CMD with sensible defaults for the
app. It's also a best practice to run the app as USER skia unless root is
absolutely needed.
Putting all the above together, a bare-bones Dockerfile would look something
like:
FROM gcr.io/skia-public/basealpine:3.9
COPY . /
USER skia
ENTRYPOINT ["/usr/local/bin/my_app_name"]
CMD ["--port=:8000", "--resources_dir=/usr/local/share/my_app_name/"]
## Secrets and Service Accounts
If your app needs access to a GCS bucket or other similar things, it is
recommended you create a new service account for your app. See below for linking
it into the container.
Use an existing `create-sa.sh` script (e.g. `create-jsfiddle-sa.sh`) and tweak
the name, committing it into the app's root directory. Run this once to create
the service account and create the secrets in GKE.
## Authentication
Almost all applications should use
[google.DefaultTokenSource()](https://pkg.go.dev/golang.org/x/oauth2/google#DefaultTokenSource)
to create an
[oauth2.TokenSource](https://pkg.go.dev/golang.org/x/oauth2#TokenSource) to be
used for authenticated access to APIs and resources.
The call to
[google.DefaultTokenSource()](https://pkg.go.dev/golang.org/x/oauth2/google#DefaultTokenSource)
will follow the search algorithm in
[FindDefaultCredentialsWithParams](https://pkg.go.dev/golang.org/x/oauth2/google#FindDefaultCredentialsWithParams).
To run applications locally authenticated as yourself you can run:
```
gcloud auth application-default login
```
Which will place credentials at:
```
$HOME/.config/gcloud/application_default_credentials.json
```
that will be picked up by the application.
If you wish to override that behavior and use a different set of credentials
then set the Environment Variable `GOOGLE_APPLICATION_CREDENTIALS` that points
to a different file, such as a `key.json` file for a specific service account.
When running in kubernetes
[google.DefaultTokenSource()](https://pkg.go.dev/golang.org/x/oauth2/google#DefaultTokenSource)
will pick up credentials from GCP metadata or
[workload identity](http://go/skia-workload-identity).
## Using Git
Use of the Git binary itself is strongly discouraged unless it is unavoidable.
Please consider an alternative:
- go/gitiles provides an API for retrieving commit information, file contents,
git log, etc, via HTTP for repos hosted on Googlesource.
- go/gitstore provides a low-level interface for retrieving commit metadata by
time or index. This data is stored in BigTable and is ingested by the
`gitsync` app, which also sends PubSub messages for low-latency updates.
- go/vcsinfo/bt_vcs provides a similar interface for retrieving metadata but
adds caching and packages Gitiles into a common API.
- go/git/repograph provides a complete in-memory graph of a repository for fast
traversal. It loads data via go/gitstore and can be automatically updated via
PubSub.
- go/gerrit provides access to the Gerrit API, including uploading and
committing changes to repos which use Gerrit.
The following are valid reasons to use the Git binary itself:
- You need to do more complex write operations, eg. merges.
- You need a full local checkout of some code, eg. to compile and run tests, or
to run a script. Note that you can use go/gitiles to download a standalone
script, so a full checkout should not be necessary unless your use case
requires a large or changing set of files.
If you do need Git for your app, use the `base-cipd` Docker image, which
includes a pinned version of Git (as well as other tools). Do not install Git
via the package manager in your Docker image.
## Launching
- Write/update design doc so that others understand how to use, maintain, and
improve your service. `DESIGN.md` typically has high level design structures
(e.g. where is data stored, how do the pieces of software interact, etc).
`PROD.md` has an overview of the alerts and any other notes for maintaining
the service.
- Do some back-of-the-envelope calculations to make sure your service can handle
the expected load. If your service is latency-sensitive, measure the latency
under load.
- Test on browsers that your users will be using, at least Chrome on desktop and
ideally Chrome on Android.
- Create an `app.yaml` in
[k8s-config](https://skia.googlesource.com/k8s-config/+show/master/) This
controls how your app will be run in GKE. See
[these docs](https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/)
for more on the schema. Commit this, then run `pushk appname` to make the
configuration active.
- Metrics are customarily made available at port 20000. To configure metrics
scraping the port should be named 'prom'. See [go/skia-infra-metrics](http://go/skia-infra-metrics)
for more details.
```yaml
ports:
- containerPort: 20000
name: prom
```
- Clusters run with [Cluster
Autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler),
which means that every pod should have the following annotation:
```yaml
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
```
If you need finer grained control over how your pods are started and stopped
that can be done by defining a
[PodDisruptionBudget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/).
CockroachDB defines a PodDisruptionBudget and is a good example of such a
budget.
- Metrics will be available on
[prom2.skia.org](https://prom2.skia.org/).
- The metrics will be labeled `app=<foo>` where `foo` is the first argument to
`common.InitWithMust`.
- If you have secrets (like a service account), bind it to the deployment by
adding the following to `app.yaml`:
```yaml
spec:
automountServiceAccountToken: false
...
containers:
- name: my-container
...
volumeMounts:
- name: my-app-sa
mountPath: /var/secrets/google
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/google/key.json
...
volumes:
- name: my-app-sa
secret:
secretName: my-app
```
- If you use OAuth, it should be configured to use the \*.skia.org cookie (the
default). Additionally, you will need to mount the secrets to use with your
login.Init\* code:
```yaml
spec:
...
containers:
- name: my-container
...
volumeMounts:
- name: skia-org-legacy-login-secrets
mountPath: /etc/skia.org/
...
volumes:
- name: skia-org-legacy-login-secrets
secret:
secretName: skia-org-legacy-login-secrets
```
- It is possible to test your service/config without making it publicly visible.
- Deploy your `app.yaml` either with `pushk` or `kubectl apply -f app.yaml`
- Identify a pod name, `kubectl get pods | grep [my-app]` where my-app is the
name of the new service.
- Forward a local port (e.g. 8083) to a port on the pod (e.g. the HTTP port
8000): `kubectl port-forward my-app-7bf542629-jujzm 8083:8000`
- Navigate a browser to <http://localhost:8083> to see your service.
- If you have simple routing needs, to make your service visible to the public
add a
[`skia.org.domain` annotation to your Service YAML](https://skia.googlesource.com/buildbot/+doc/refs/heads/master/skfe/README.md)
with the domain name and deploy your updated yaml with `kubectl apply`.
If your routing is more complicated you can skip the YAML annotation and write
the routing rules directly into `infra/skfe/k8s/default.conf`.
Either way you then push a new version of nginx-skia-org:
```
cd infra/skfe
make k8s_push
```
And watch that the new instances start running:
```
watch kubectl get pods -lapp=nginx-skia-org
```
- Add prober rules to `probers.json` in your application directory.
- Ideally, probe all public HTML pages and all nullipotent JSON endpoints. You
can write functions in `prober/go/prober/main.go` to check the response body
if desired.
- Add additional stats gathering to your program using
`go.skia.org/infra/go/metrics2`, e.g. to ensure liveness/heartbeat of any
background processes.
- Add alert rules to
[alerts_public](https://skia.googlesource.com/buildbot/+show/master/promk/prometheus/alerts_public.yml).
The alerts may link to a production manual, `PROD.md`, checked into the
application source directory. Examples:
- All prober rules.
- Additional stats from metrics2. Legacy apps have their alert rules in
`prometheus/sys/alert.rules`
- Some
[general metrics](https://skia.googlesource.com/buildbot/+show/master/promk/prometheus/alerts_general.yml)
apply to all apps and may not need to be added explicitly for your
application, such as: - Too many goroutines. - Free disk space on the instance
and any attached disks. - This is also for alerts that apply to skia-public
and skia-corp projects.
- Check your alert rules by running `make validate` in `promk/` (Legacy apps
should run that commaind in `prometheus/`).
- Then, after landing your valid alerts, run
`make push_config && make push_config_corp` in `promk/` (Again, legacy apps
should do `make push` in `prometheus/`).
- Tell people about your new service.
- Be prepared for bug reports. :-)
# Continuous Deployment
Some apps are set up to be continuously re-built and re-deployed on every commit
of Skia or Skia Infra. To do that, see
[docker_pushes_watcher/README.md](./docker_pushes_watcher/README.md).