Storing TryJobs and TryJobResults in Firestore

We need to keep TryJob data that we have ingested.

See https://docs.google.com/document/d/1d0tOhgx51QOGiSXqTxiwNSlgm1pYHTUSBK3agysX6Iw/edit for more context.

Schema

We should have three Firestore Collections (i.e. tables), one for TryJob, TryJobResult, and Params. Params is its own table to avoid duplicated data and reduce bandwidth for requests.

TryJob
	# ID will be SystemID + System
	SystemID     string  # The id of the TryJob in, for example, BuildBucket
	ChangeListID string
	PatchSetID   string
	CRSystem     string  # "github", "gerrit", etc
	CISystem     string  # "buildbucket", etc
	DisplayName  string  # Human readable name of job.
	Updated      time.Time

TryJobResult
	# ID will be autogenerated
	TryJobID         string  # The SystemID of the TryJob that generated this.
	ChangeListID     string
	PatchSetID       string
	CRSystem         string  # "github", "gerrit", etc
	CISystem         string  # "buildbucket", etc
	Digest           string
	ResultParams     map[string]string
	GroupParamsHash  string # hex-encoded hash linking to Params Table
	OptionsHash      string # hex-encoded hash linking to Params Table

Params
	# ID will be hex encoded sha256 hash of Map
	Map    map[string]string

Indexing

We should mark the following fields as no-index, to save some index space. https://cloud.google.com/firestore/docs/query-data/indexing#exemptions

  • TryJobResult.ResultParams
  • TryJobResult.GroupParamsHash
  • TryJobResult.OptionsHash
  • Params.Map

Currently, all queries can be handled via Firestore's index merging https://firebase.google.com/docs/firestore/query-data/index-overview#taking_advantage_of_index_merging because we chain “==” filters together.

Usage

We'll be either looking up TryJobs by id or searching by PatchSetID.

TryJobResults and Params will be a bit more involved, as we'll fetch all TryJobResults by PatchSetID and then go through all the GroupParamsHash and OptionsHash. We will fetch the TryJobResults in parallel, sharded by Digest. We shard based on Digest because that data is essentially random and evenly distributed. If needed, we could try fetching the Params in parallel or caching them.

Growth Opportunities

We could open up the searching to search for just the results based on what TryJob produced them. Additionally, we could search for all TryJobs on a given CL. These queries may require additional composite indices.