blob: 032c67a1b787a9e189a1456cb3b9c7c0b5ee4e3a [file] [log] [blame] [view]
GitSync Production Manual
=========================
General Metrics
===============
The following dashboard is for the skia-public instances:
<https://grafana2.skia.org/d/Onp7_5FWk/gitsync>
The following dashboard is for the skia-corp instances:
<https://skia-mon.corp.goog/d/Wi0Yu5FZk/gitsync>
Some things to look for:
- Do goroutines or memory increase continuously (e.g leaks)?
- Have any repos taken more than a few seconds to sync?
- Is there an elevated error rate?
General Logs
============
Logs for GitSync instances in skia-public/skia-corp are in the usual
GKE container grouping, for example:
<https://console.cloud.google.com/logs/viewer?project=skia-public&resource=container&logName=projects%2Fskia-public%2Flogs%2Fgitsync2>
Alerts
======
Items below here should include target links from alerts.
GitSyncStalled
--------------
This alert means we haven't successfully synced a repo in over 5 minutes. This
could be due to failure to communicate with the Gitiles server, or because of a
problem with GitSync itself. Check the logs for details.
Key metrics: liveness_last_successful_git_sync_s
GitSyncErrorRate
----------------
The log error rate is elevated. There are a number of possible causes; check the
logs and verify that things are working as expected.
Key metrics: rate(num_log_lines{level="ERROR",app=~"gitsync.\*"}[30m])