GitSync Production Manual

General Metrics

The following dashboard is for the skia-public instances: https://grafana2.skia.org/d/Onp7_5FWk/gitsync

The following dashboard is for the skia-corp instances: https://skia-mon.corp.goog/d/Wi0Yu5FZk/gitsync

Some things to look for:

  • Do goroutines or memory increase continuously (e.g leaks)?
  • Have any repos taken more than a few seconds to sync?
  • Is there an elevated error rate?

General Logs

Logs for GitSync instances in skia-public/skia-corp are in the usual GKE container grouping, for example: https://console.cloud.google.com/logs/viewer?project=skia-public&resource=container&logName=projects%2Fskia-public%2Flogs%2Fgitsync2

Alerts

Items below here should include target links from alerts.

GitSyncStalled

This alert means we haven't successfully synced a repo in over 5 minutes. This could be due to failure to communicate with the Gitiles server, or because of a problem with GitSync itself. Check the logs for details.

Key metrics: liveness_last_successful_git_sync_s

GitSyncErrorRate

The log error rate is elevated. There are a number of possible causes; check the logs and verify that things are working as expected.

Key metrics: rate(num_log_lines{level=“ERROR”,app=~“gitsync.*”}[30m])