tree: 0cb2d8d6bcb13c712b3e01833470fbbee7d99d87 [path history] [tgz]
  1. alerts_nginx
  2. alertserver_init
  3. bashrc
  4. collectd
  5. influxdb-config.toml
  6. monitor_nginx
  7. monitoring_monit
  8. monitrc
  9. README.md
  10. setup.sh
  11. vm_config.sh
  12. vm_create_instance.sh
  13. vm_delete_instance.sh
  14. vm_push_update.sh
  15. vm_setup_instance.sh
compute_engine_scripts/monitoring/README.md

Monitoring (Grains, Prober and Alert Server)

The monitoring server runs InfluxDB to accept and manage timeseries data and uses Grafana to construct dashboards for that data. InfluxDB has a module to make it compatible with Graphite/Carbon, which we used to use to store timeseries data before InfluxDB. Our servers still upload metrics using this Graphite/Carbon API, so you'll see mentions of Graphite or Carbon here and there.

In addition this server also hosts the prober, which monitors the uptime of our servers and pumps the results of those probes into InfluxDB.

Alert Server periodically queries InfluxDB and generates alerts based on rules defined in monitoring/alerts.cfg.

Logs for all applications are served from skiamonitor.com:10115 which is restricted to internal IPs only.

Full Server Setup

Do once

$ ./vm_create_instance.sh
$ ./vm_setup_instance.sh

Make sure to ‘set daemon 2’ in /etc/monit/monitrc so that monit runs every 2 seconds.

Make sure to log in InfluxDB at port 10117 and create the ‘graphite’ and ‘grafana’ databases. Username and Password should also be set according to valentine.

Once that is done then set the Metadata for the instance using cloud.google.com/console, see below:

Prober

The prober requres one piece of metadata, the API Key for making requests to the project hosting API. They API Key value can be found here:

https://console.developers.google.com/project/31977622648/apiui/credential

Set that as the value for the metadata key:

apikey

Grains

Grains is the Grafana/InfluxDB proxy and needs the following metadata values set:

cookiesalt
client_id
client_secret
influxdb_name
influxdb_password

The client_id and client_secret come from here:

https://console.developers.google.com/project/31977622648/apiui/credential

Look for the Client ID that has a Redirect URI for skiamonitor.com.

For ‘cookiesalt’ and the influx db values search for ‘skiamonitor’ in valentine.

AlertServer

AlertServer periodically queries InfluxDB and generates alerts based on rules in the alerts.cfg file. It needs the following metadata values:

cookiesalt
client_id
client_secret
influxdb_name
influxdb_password
gmail_clientid
gmail_clientsecret
gmail_cached_token

The client_id and client_secret come from here:

https://console.developers.google.com/project/31977622648/apiui/credential

Look for the Client ID that has a Redirect URI for skiamonitor.com.

For ‘cookiesalt’ and the influx db values search for ‘skiamonitor’ in valentine.

The gmail_clientid and gmail_clientsecret come from here:

https://console.developers.google.com/project/31977622648/apiui/credential

Look for the section titled, “Client ID for native application.”

The gmail_cached_token can be generated by running the server and clicking the authorization link while signed in as skia.buildbots@gmail.com

Do on update

$ ./vm_push_update.sh

Notes

To SSH into the instance:

gcutil --project=google.com:skia-buildbots ssh --ssh_user=default skia-monitoring

If you need to modify the constants for the vm_XXX.sh scripts they are specified in vm_config.sh.