We need to keep TryJob data that we have ingested.
See https://docs.google.com/document/d/1d0tOhgx51QOGiSXqTxiwNSlgm1pYHTUSBK3agysX6Iw/edit for more context.
We should have three Firestore Collections (i.e. tables), one for TryJob, TryJobResult, and Params. Params is its own table to avoid duplicated data and reduce bandwidth for requests.
TryJob # ID will be SystemID + System SystemID string # The id of the TryJob in, for example, BuildBucket ChangeListID string PatchSetID string CRSystem string # "github", "gerrit", etc CISystem string # "buildbucket", etc DisplayName string # Human readable name of job. Updated time.Time TryJobResult # ID will be autogenerated TryJobID string # The SystemID of the TryJob that generated this. ChangeListID string PatchSetID string CRSystem string # "github", "gerrit", etc CISystem string # "buildbucket", etc Digest string ResultParams map[string]string GroupParamsHash string # hex-encoded hash linking to Params Table OptionsHash string # hex-encoded hash linking to Params Table Params # ID will be hex encoded sha256 hash of Map Map map[string]string
We should mark the following fields as no-index, to save some index space. https://cloud.google.com/firestore/docs/query-data/indexing#exemptions
We need the following complex indices:
tjstore_result | clid: ASC crs: ASC psid: ASC digest: ASC tjstore_result | clid: ASC crs: ASC psid: ASC ts: ASC
The first is for fetching results in a sharded fashion. The second is for fetching results after a certain time.
We'll be either looking up TryJobs by id or searching by PatchSetID.
TryJobResults and Params will be a bit more involved, as we'll fetch all TryJobResults by PatchSetID and then go through all the GroupParamsHash and OptionsHash. We will fetch the TryJobResults in parallel, sharded by Digest. We shard based on Digest because that data is essentially random and evenly distributed. If needed, we could try fetching the Params in parallel or caching them.
We could open up the searching to search for just the results based on what TryJob produced them. Additionally, we could search for all TryJobs on a given CL. These queries may require additional composite indices.