Infra Team Launch Checklist

Assumptions

Your service:

  • Is checked in to the go.skia.org/infra Git repo.
  • Is an HTTP server written in Go with a front end in WebComponents (legacy apps have Polymer)
  • Will run in a Docker Container.
  • Will use a *.skia.org domain.
  • Does not handle personal data (additional steps will be required).
  • Is not intended to be used by the public at large (additional steps will be required).

JSFiddle is a recent service that was launched that demonstrates the above. See go/ for the server code and modules/ and pages/ for the front end.

Coding

Use go.skia.org/infra/go/sklog for logging.

Add flags to your main package like:

port         = flag.String("port", ":8002", "HTTP service port (e.g., ':8002')")
local        = flag.Bool("local", false, "Running locally if true. As opposed to in production.")
promPort     = flag.String("prom_port", ":20000", "Metrics service address (e.g., ':10110')")
resourcesDir = flag.String("resources_dir", "./dist", "The directory to find HTML, JS, and CSS files. If blank the current directory will be used.")

Call common.InitWithMust([opt], [opt]) in your main function.

Use go.skia.org/infra/go/login paired with ../../../infra-sk/modules/login.ts (Legacy Polymer apps use res/imp/login.html) and/or go.skia.org/infra/go/webhook for authentication. When using OAuth, see the secrets section below for including client secrets.

Wrap your http.Handler (many services use chi.NewRouter() with go.skia.org/infra/go/httputils.LoggingGzipRequestResponse to provide monitoring and logging of HTTP requests and responses. Then, wrap it in go.skia.org/infra/go/httputils.HealthzAndHTTPS to add an unlogged /healthz endpoint for use with GKE health monitoring and various HTTPS configuration.

Use go.skia.org/infra/go/httputils.DefaultClientConfig for HTTP clients, which provides several features:

  • ensures requests time out within a reasonable limit
  • tracks how much load we place on external services
  • optionally adds authentication to requests
  • optionally adds automatic retries with exponential backoff
  • optionally treats non-2xx responses as errors

Write your code with security in mind:

If you add any critical TODOs while you're coding, file a blocking bug for the issue.

If your application requires Puppeteer tests, it should be explicitly opted in by making any necessary changes to //puppeteer-tests/docker/run-tests.sh.

Makefiles

It is customary to have the following commands in a Makefile for the service.

  • build : Build a development version of the front end.
  • serve : Run the demo pages of the front end in a “watch” mode. This command is for primary development of front end pages.
  • core : Build the server components.
  • watch : Build a development version of the front end in watch mode. This command is for running the server, but also making changes to the front end.
  • release : Build a Docker container with the front end and backend parts in it (see below).
  • push : Depends on release, and then pushes to GKE using pushk.

Docker

Running apps in Docker makes deployment and local testing much much easier. It additionally allows integration with GKE. Some legacy apps are not yet run in Docker, but it is the goal to have everything on GKE+Docker.

Create a Dockerfile for your app in the root of the project folder (e.g. jsfiddle/Dockerfile). If there are multiple services, put them in a named folder (e.g. fiddlek/fiddle/Dockerfile, fiddlek/fiddler/Dockerfile).

When choosing a base image, consider our light wrappers, found in kube/*. For example, kube/basealpine/Dockerfile which can be used by having FROM gcr.io/skia-public/basealpine:3.9 as the first line in a Dockerfile.

We have a helper script for ‘installing’ an app into a Docker container, bash/docker_build.sh. A call to this script is customarily put in a bash script which is called by make release. See jsfiddle/build_release for an example. To integrate docker_build.sh into the actual container, add a COPY . / to copy the executable(s) and HTML/JS/CSS from the build context into the container. Legacy apps have a similar set-up, but for building a Debian package instead of a container.

It is customary to include an ENTRYPOINT and CMD with sensible defaults for the app. It's also a best practice to run the app as USER skia unless root is absolutely needed.

Putting all the above together, a bare-bones Dockerfile would look something like:

FROM gcr.io/skia-public/basealpine:3.9

COPY . /

USER skia

ENTRYPOINT ["/usr/local/bin/my_app_name"]
CMD ["--port=:8000", "--resources_dir=/usr/local/share/my_app_name/"]

Secrets and Service Accounts

If your app needs access to a GCS bucket or other similar things, it is recommended you create a new service account for your app. See below for linking it into the container.

Use an existing create-sa.sh script (e.g. create-jsfiddle-sa.sh) and tweak the name, committing it into the app's root directory. Run this once to create the service account and create the secrets in GKE.

Authentication

Almost all applications should use google.DefaultTokenSource() to create an oauth2.TokenSource to be used for authenticated access to APIs and resources.

The call to google.DefaultTokenSource() will follow the search algorithm in FindDefaultCredentialsWithParams.

To run applications locally authenticated as yourself you can run:

gcloud auth application-default login

Which will place credentials at:

$HOME/.config/gcloud/application_default_credentials.json

that will be picked up by the application.

If you wish to override that behavior and use a different set of credentials then set the Environment Variable GOOGLE_APPLICATION_CREDENTIALS that points to a different file, such as a key.json file for a specific service account.

When running in kubernetes google.DefaultTokenSource() will pick up credentials from GCP metadata or workload identity.

Using Git

Use of the Git binary itself is strongly discouraged unless it is unavoidable. Please consider an alternative:

  • go/gitiles provides an API for retrieving commit information, file contents, git log, etc, via HTTP for repos hosted on Googlesource.
  • go/gitstore provides a low-level interface for retrieving commit metadata by time or index. This data is stored in BigTable and is ingested by the gitsync app, which also sends PubSub messages for low-latency updates.
  • go/vcsinfo/bt_vcs provides a similar interface for retrieving metadata but adds caching and packages Gitiles into a common API.
  • go/git/repograph provides a complete in-memory graph of a repository for fast traversal. It loads data via go/gitstore and can be automatically updated via PubSub.
  • go/gerrit provides access to the Gerrit API, including uploading and committing changes to repos which use Gerrit.

The following are valid reasons to use the Git binary itself:

  • You need to do more complex write operations, eg. merges.
  • You need a full local checkout of some code, eg. to compile and run tests, or to run a script. Note that you can use go/gitiles to download a standalone script, so a full checkout should not be necessary unless your use case requires a large or changing set of files.

If you do need Git for your app, use the base-cipd Docker image, which includes a pinned version of Git (as well as other tools). Do not install Git via the package manager in your Docker image.

Launching

  • Write/update design doc so that others understand how to use, maintain, and improve your service. DESIGN.md typically has high level design structures (e.g. where is data stored, how do the pieces of software interact, etc). PROD.md has an overview of the alerts and any other notes for maintaining the service.
  • Do some back-of-the-envelope calculations to make sure your service can handle the expected load. If your service is latency-sensitive, measure the latency under load.
  • Test on browsers that your users will be using, at least Chrome on desktop and ideally Chrome on Android.
  • Create an app.yaml in k8s-config This controls how your app will be run in GKE. See these docs for more on the schema. Commit this, then run pushk appname to make the configuration active.
  • Metrics are customarily made available at port 20000. To configure metrics scraping the port should be named ‘prom’. See go/skia-infra-metrics for more details.
ports:
  - containerPort: 20000
    name: prom
  • Clusters run with Cluster Autoscaler, which means that every pod should have the following annotation:
annotations:
  cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'

If you need finer grained control over how your pods are started and stopped that can be done by defining a PodDisruptionBudget. CockroachDB defines a PodDisruptionBudget and is a good example of such a budget.

  • Metrics will be available on prom2.skia.org.
  • The metrics will be labeled app=<foo> where foo is the first argument to common.InitWithMust.
  • If you have secrets (like a service account), bind it to the deployment by adding the following to app.yaml:
spec:
  automountServiceAccountToken: false
  ...
  containers:
    - name: my-container
      ...
      volumeMounts:
        - name: my-app-sa
          mountPath: /var/secrets/google
      env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /var/secrets/google/key.json
      ...
  volumes:
    - name: my-app-sa
      secret:
        secretName: my-app
  • If you use OAuth, it should be configured to use the *.skia.org cookie (the default). Additionally, you will need to mount the secrets to use with your login.Init* code:
spec:
  ...
  containers:
    - name: my-container
      ...
      volumeMounts:
        - name: skia-org-legacy-login-secrets
          mountPath: /etc/skia.org/
      ...
  volumes:
    - name: skia-org-legacy-login-secrets
      secret:
        secretName: skia-org-legacy-login-secrets
  • It is possible to test your service/config without making it publicly visible.

    • Deploy your app.yaml either with pushk or kubectl apply -f app.yaml
    • Identify a pod name, kubectl get pods | grep [my-app] where my-app is the name of the new service.
    • Forward a local port (e.g. 8083) to a port on the pod (e.g. the HTTP port 8000): kubectl port-forward my-app-7bf542629-jujzm 8083:8000
    • Navigate a browser to http://localhost:8083 to see your service.
  • If you have simple routing needs, to make your service visible to the public add a skia.org.domain annotation to your Service YAML with the domain name and deploy your updated yaml with kubectl apply.

    If your routing is more complicated you can skip the YAML annotation and write the routing rules directly into infra/skfe/k8s/default.conf.

    Either way you then push a new version of nginx-skia-org:

    cd infra/skfe
    make k8s_push
    

    And watch that the new instances start running:

    watch kubectl get pods -lapp=nginx-skia-org
    
  • Add prober rules to probers.json in your application directory.

    • Ideally, probe all public HTML pages and all nullipotent JSON endpoints. You can write functions in prober/go/prober/main.go to check the response body if desired.
  • Add additional stats gathering to your program using go.skia.org/infra/go/metrics2, e.g. to ensure liveness/heartbeat of any background processes.

  • Add alert rules to alerts_public. The alerts may link to a production manual, PROD.md, checked into the application source directory. Examples:

    • All prober rules.
    • Additional stats from metrics2. Legacy apps have their alert rules in prometheus/sys/alert.rules
  • Some general metrics apply to all apps and may not need to be added explicitly for your application, such as: - Too many goroutines. - Free disk space on the instance and any attached disks. - This is also for alerts that apply to skia-public and skia-corp projects.

  • Check your alert rules by running make validate in promk/ (Legacy apps should run that commaind in prometheus/).

  • Then, after landing your valid alerts, run make push_config && make push_config_corp in promk/ (Again, legacy apps should do make push in prometheus/).

  • Tell people about your new service.

  • Be prepared for bug reports. :-)

Continuous Deployment

Some apps are set up to be continuously re-built and re-deployed on every commit of Skia or Skia Infra. To do that, see docker_pushes_watcher/README.md.

CockroachDB

When standing up a new CockroachDB cluster the following extra steps need to be done:

  1. Modify the YML file to add ephemeral-storage to the resource requests:
resources:
  requests:
  cpu: '2'
  memory: '32Gi'
  ephemeral-storage: 100M
  1. Update managed-prometheus-pod-monitoring.yml because CockroachDB hosts metrics at a different path than the expected default of /metrics.
# CockroachDB hosts metrics at a different path.
apiVersion: monitoring.googleapis.com/v1
kind: ClusterPodMonitoring
metadata:
  name: perf-cockroachdb-cluster-pod-monitoring
spec:
  selector:
    matchLabels:
      app: perf-cockroachdb
  endpoints:
    - port: prom
      path: _status/vars
      interval: 15s
  targetLabels:
    fromPod:
      - from: app
      - from: appgroup
  1. Rename the HTTP port from http to prom so it gets scraped:
ports:
  - containerPort: 26257
    name: grpc
  - containerPort: 8080
    name: prom