compute_engine_scripts/buildbots/README - buildbot - Git at Google

 This directory contains scripts to automate the creation, setup and deletion of
 Skia's GCE buildbot machines.


 Directory Contents
 ==================

 - vm_config.sh
 Instantiates constants that are used by the below scripts.

 - vm_firewalls_create.sh
 Creates the firewalls required for the Skia buildbots. Allows access from Google
 machines to port 10115. Allows access to ports 10116 and 10117 to everyone.

 - vm_create_masters.sh
 Creates the master instances in the specified $ZONE_TAG.

 - vm_create_slaves.sh
 Creates the slaves instances in the specified $ZONE_TAG.

 - vm_delete_masters.sh
 Deletes the master instances from the specified $ZONE_TAG.

 - vm_delete_slaves.sh
 Deletes the slaves instances from the specified $ZONE_TAG.

 - vm_setup_bugdroid.sh
 Takes in the hostname of a GCE instance and sets up bugdroid on that instance.

 - vm_setup_image.sh
 Installs necessary packages on a vanilla GCE instance. This script setups up the
 base image that is used by all Skia GCE instances.

 - vm_setup_masters.sh
 Sets up the base image with packages and directories necessary for the buildbot
 masters.

 - vm_setup_slaves.sh
 Sets up the base image with packages and directories necessary for the buildbot
 slaves.

 - vm_setup_utils.sh
 Utility functions used by vm_setup_master.sh and vm_setup_slaves.sh.

 - vm_status.sh
 Outputs the quota and all instances of google.com:skia-buildbots.


 How to create a Master Root Disk
 ================================

 Create it with (changing names and zones as required):
 gcutil --project=google.com:skia-buildbots adddisk private-master-root-a \
     --source_image=skia-buildbot-image --size_gb=10 --zone=us-central1-a \
     --service_version=v1beta16


 Migrating instances before PCRs
 ===============================

 The instances should be moved using the scripts in this directory a few weeks
 before a PCR is scheduled. We are allowed instances in the us-central1-a and
 us-central1-b zones and will alternate between zone tags 'a' and 'b' during
 PCRs.

 The scripts should be executed in this order:
 * ZONE_TAG=$NEW_ZONE ./vm_create_masters.sh
 * ZONE_TAG=$OLD_ZONE ./vm_delete_slaves.sh
 * ZONE_TAG=$NEW_ZONE ./vm_create_slaves.sh
 * ZONE_TAG=$NEW_ZONE ./vm_setup_slaves.sh
 * ZONE_TAG=$NEW_ZONE ./vm_copy_buildbot_history.sh  (Takes about 15-30 mins)
 * ZONE_TAG=$OLD_ZONE ./vm_delete_masters.sh  (Do NOT delete the persistent boot disk when prompted to)
 * Run vm_setup_bugdroid.sh to setup bugdroid on a particular GCE instance
 * ZONE_TAG=$OLD_ZONE ./vm_delete_rebaseline_servers.sh  (DO delete the persistent boot disk when prompted to)
 * ZONE_TAG=$NEW_ZONE ./vm_create_rebaseline_servers.sh
 * ZONE_TAG=$NEW_ZONE ./vm_setup_rebaseline_servers.sh

 After the above scripts are run, update gce_compile_bots_zone in
 global_variables.json to point to the $NEW_ZONE.


 Handling Master VM crashes
 ==========================

 On rare occasions Google Compute Engine VMs crash. If the master crashes then
 please run the following (this section will be removed after the next PCR
 migration because we will then have the ability to automatically recover from
 crashes):
 * ZONE_TAG=a bash vm_delete_masters.sh  (Do NOT delete the persistent boot disk when prompted to)
 * ZONE_TAG=a bash vm_create_masters.sh

 For more details, see https://code.google.com/p/skia/issues/detail?id=1676
 ('Make the buildbot master self-recover if it crashes')


 Handling Slave VM crashes
 =========================

 On rare occasions Google Compute Engine VMs crash. We do not currently have
 a way for slaves to automatically come up because they do not have root
 persistent disks. You will have to recreate the specific slave(s) that crashed.
 Please run the following:
 * Edit vm_config.sh and leave only the VM_SLAVE_NAMES and SLAVE_IP_ADDRESSES you
   want to recreate.
 * ZONE_TAG=a bash vm_delete_slaves.sh
 * ZONE_TAG=a bash vm_create_slaves.sh
 * ZONE_TAG=a bash vm_setup_slaves.sh

 For more details, see https://code.google.com/p/skia/issues/detail?id=1678
 ('make GCE buildbot slaves self-recover on boot')
	This directory contains scripts to automate the creation, setup and deletion of
	Skia's GCE buildbot machines.


	Directory Contents
	==================

	- vm_config.sh
	Instantiates constants that are used by the below scripts.

	- vm_firewalls_create.sh
	Creates the firewalls required for the Skia buildbots. Allows access from Google
	machines to port 10115. Allows access to ports 10116 and 10117 to everyone.

	- vm_create_masters.sh
	Creates the master instances in the specified $ZONE_TAG.

	- vm_create_slaves.sh
	Creates the slaves instances in the specified $ZONE_TAG.

	- vm_delete_masters.sh
	Deletes the master instances from the specified $ZONE_TAG.

	- vm_delete_slaves.sh
	Deletes the slaves instances from the specified $ZONE_TAG.

	- vm_setup_bugdroid.sh
	Takes in the hostname of a GCE instance and sets up bugdroid on that instance.

	- vm_setup_image.sh
	Installs necessary packages on a vanilla GCE instance. This script setups up the
	base image that is used by all Skia GCE instances.

	- vm_setup_masters.sh
	Sets up the base image with packages and directories necessary for the buildbot
	masters.

	- vm_setup_slaves.sh
	Sets up the base image with packages and directories necessary for the buildbot
	slaves.

	- vm_setup_utils.sh
	Utility functions used by vm_setup_master.sh and vm_setup_slaves.sh.

	- vm_status.sh
	Outputs the quota and all instances of google.com:skia-buildbots.


	How to create a Master Root Disk
	================================

	Create it with (changing names and zones as required):
	gcutil --project=google.com:skia-buildbots adddisk private-master-root-a \
	--source_image=skia-buildbot-image --size_gb=10 --zone=us-central1-a \
	--service_version=v1beta16


	Migrating instances before PCRs
	===============================

	The instances should be moved using the scripts in this directory a few weeks
	before a PCR is scheduled. We are allowed instances in the us-central1-a and
	us-central1-b zones and will alternate between zone tags 'a' and 'b' during
	PCRs.

	The scripts should be executed in this order:
	* ZONE_TAG=$NEW_ZONE ./vm_create_masters.sh
	* ZONE_TAG=$OLD_ZONE ./vm_delete_slaves.sh
	* ZONE_TAG=$NEW_ZONE ./vm_create_slaves.sh
	* ZONE_TAG=$NEW_ZONE ./vm_setup_slaves.sh
	* ZONE_TAG=$NEW_ZONE ./vm_copy_buildbot_history.sh (Takes about 15-30 mins)
	* ZONE_TAG=$OLD_ZONE ./vm_delete_masters.sh (Do NOT delete the persistent boot disk when prompted to)
	* Run vm_setup_bugdroid.sh to setup bugdroid on a particular GCE instance
	* ZONE_TAG=$OLD_ZONE ./vm_delete_rebaseline_servers.sh (DO delete the persistent boot disk when prompted to)
	* ZONE_TAG=$NEW_ZONE ./vm_create_rebaseline_servers.sh
	* ZONE_TAG=$NEW_ZONE ./vm_setup_rebaseline_servers.sh

	After the above scripts are run, update gce_compile_bots_zone in
	global_variables.json to point to the $NEW_ZONE.


	Handling Master VM crashes
	==========================

	On rare occasions Google Compute Engine VMs crash. If the master crashes then
	please run the following (this section will be removed after the next PCR
	migration because we will then have the ability to automatically recover from
	crashes):
	* ZONE_TAG=a bash vm_delete_masters.sh (Do NOT delete the persistent boot disk when prompted to)
	* ZONE_TAG=a bash vm_create_masters.sh

	For more details, see https://code.google.com/p/skia/issues/detail?id=1676
	('Make the buildbot master self-recover if it crashes')


	Handling Slave VM crashes
	=========================

	On rare occasions Google Compute Engine VMs crash. We do not currently have
	a way for slaves to automatically come up because they do not have root
	persistent disks. You will have to recreate the specific slave(s) that crashed.
	Please run the following:
	* Edit vm_config.sh and leave only the VM_SLAVE_NAMES and SLAVE_IP_ADDRESSES you
	want to recreate.
	* ZONE_TAG=a bash vm_delete_slaves.sh
	* ZONE_TAG=a bash vm_create_slaves.sh
	* ZONE_TAG=a bash vm_setup_slaves.sh

	For more details, see https://code.google.com/p/skia/issues/detail?id=1678
	('make GCE buildbot slaves self-recover on boot')