Storing TryJobs and TryJobResults in Firestore

We need to keep TryJob data that we have ingested.

See https://docs.google.com/document/d/1d0tOhgx51QOGiSXqTxiwNSlgm1pYHTUSBK3agysX6Iw/edit for more context.

Schema

We should have three Firestore Collections (i.e. tables), one for TryJob, TryJobResult, and Params. Params is its own table to avoid duplicated data and reduce bandwidth for requests.

TryJob
	# ID will be SystemID + System
	SystemID     string  # The id of the TryJob in, for example, BuildBucket
	ChangeListID string
	PatchSetID   string
	CRSystem     string  # "github", "gerrit", etc
	CISystem     string  # "buildbucket", etc
	DisplayName  string  # Human readable name of job.
	Updated      time.Time

TryJobResult
	# ID will be autogenerated
	TryJobID         string  # The SystemID of the TryJob that generated this.
	ChangeListID     string
	PatchSetID       string
	CRSystem         string  # "github", "gerrit", etc
	CISystem         string  # "buildbucket", etc
	Digest           string
	ResultParams     map[string]string
	GroupParamsHash  string # hex-encoded hash linking to Params Table
	OptionsHash      string # hex-encoded hash linking to Params Table

Params
	# ID will be hex encoded sha256 hash of Map
	Map    map[string]string

Indexing

We should mark the following fields as no-index, to save some index space. https://cloud.google.com/firestore/docs/query-data/indexing#exemptions

  • TryJobResult.ResultParams
  • TryJobResult.GroupParamsHash
  • TryJobResult.OptionsHash
  • Params.Map

We need the following complex indices:

Collection ID | Fields

tjstore_result | clid: ASC crs: ASC psid: ASC digest: ASC tjstore_result | clid: ASC crs: ASC psid: ASC ts: ASC

The first is for fetching results in a sharded fashion. The second is for fetching results after a certain time.

Usage

We'll be either looking up TryJobs by id or searching by PatchSetID.

TryJobResults and Params will be a bit more involved, as we'll fetch all TryJobResults by PatchSetID and then go through all the GroupParamsHash and OptionsHash. We will fetch the TryJobResults in parallel, sharded by Digest. We shard based on Digest because that data is essentially random and evenly distributed. If needed, we could try fetching the Params in parallel or caching them.

Growth Opportunities

We could open up the searching to search for just the results based on what TryJob produced them. Additionally, we could search for all TryJobs on a given CL. These queries may require additional composite indices.