The goal of this service is to add metrics for when the keys of the service accounts in Skia's cloud projects are going to expire, so we can get alerts based on them.
Items below here should include target links from alerts.
This alert signifies that the specified service account's key is expiring within 30 days.
Create a new key to replace it or directly delete it if it is no longer required.
To update chrome-swarming-bots and skolo-jumphost service accounts together, you can do:
(obtain breakglass, instructions are in http://go/skia-infra-iac-handbook) $ cd skolo $ ./refresh-skolo-swarming-bot-service-account-keys.sh $ ./refresh-jumphost-service-account.sh $ cd ansible $ ansible-playbook ./switchboard/build_and_release_metadata_server_ansible.yml (wait for the CL generated by the above to land) $ ansible-playbook ./switchboard/jumphosts.yml
If running this script fails with:
ERROR: (gcloud.beta.iam.service-accounts.keys.create) FAILED_PRECONDITION: Precondition check failed.
Then that means the service account has too many keys (10 is the limit) and you will need to delete old expired keys before creating a new key.
To confirm that all the metadata servers have restarted you can run:
ansible jumphosts -a "ps aux" | grep metadata
Once new keys have been created, you will need to delete the old ones to make the alerts go away. It's generally a good idea to verify that the old key is no longer being used via the service account key metrics in the Cloud Console, which are under IAM & Admin > Service Accounts > (account) > Metrics. It might take several hours to see the switch to the new key.
Key metrics: sa_key_expiration_s
This alert signifies that the specified service account's key has expired.
Delete the expired key from pantheon if it is no longer required.
Key metrics: sa_key_expiration_s