| # CockroachDB |
| |
| Many applications use CockroachDB (CDB) as their datastore. Here are a list of |
| utilities that have been built to support working with CDB. |
| |
| ## CDB Single Node |
| |
| For desktop testing you can install CDB and run a local instance in single-node |
| mode. Download from: |
| |
| > https://www.cockroachlabs.com/docs/releases/ |
| |
| Make sure to download the version that corresponds to the version used in production, which can |
| be checked by looking at the YAML file in the [`k8s-config`](https://skia.googlesource.com/k8s-config/) repo. |
| |
| cd /tmp; cockroach start-single-node --insecure --listen-addr=127.0.0.1 |
| |
| Note that this will start and print out useful information, such as the location |
| of the admin dashboard (webui), and also how to connect to the running instance |
| using the `cockroach` command line application (RPC client flags): |
| |
| ``` |
| CockroachDB node starting at 2023-10-04 15:10:35.315566725 +0000 UTC m=+1.911432965 (took 1.6s) |
| build: CCL v22.2.3 @ 2023/01/23 19:11:57 (go1.19.1) |
| webui: http://127.0.0.1:8080 |
| sql: postgresql://root@127.0.0.1:26257/defaultdb?sslmode=disable |
| sql (JDBC): jdbc:postgresql://127.0.0.1:26257/defaultdb?sslmode=disable&user=root |
| RPC client flags: cockroach <client cmd> --host=127.0.0.1:26257 --insecure |
| logs: /tmp/cockroach-data/logs |
| temp dir: /tmp/cockroach-data/cockroach-temp2650087369 |
| external I/O path: /tmp/cockroach-data/extern |
| store[0]: path=/tmp/cockroach-data |
| storage engine: pebble |
| clusterID: dbef3f5f-ef2c-48ef-a39b-5cdbeb219e70 |
| status: initialized new cluster |
| nodeID: 1 |
| ``` |
| |
| ## Generating SQL Schema |
| |
| Defining the schema for the database is done in Go code. By doing this, we have |
| the source of truth as a documented Go struct, which can be used in a more |
| flexible way than having the SQL as the source of truth. For example see: |
| |
| > https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/tables.go |
| |
| ``` |
| package sql |
| //go:generate bazelisk run --config=mayberemote //:go -- run ./tosql |
| |
| ... |
| |
| // Tables represents the full schema of the SQL database. |
| type Tables struct { |
| Alerts []alertschema.AlertSchema |
| Commits []gitschema.Commit |
| ParamSets []traceschema.ParamSetsSchema |
| Postings []traceschema.PostingsSchema |
| Regressions []regressionschema.RegressionSchema |
| Shortcuts []shortcutschema.ShortcutSchema |
| SourceFiles []traceschema.SourceFilesSchema |
| TraceValues []traceschema.TraceValuesSchema |
| } |
| ``` |
| |
| Each SQL table is defined by a public member of the `Tables` struct. |
| |
| Let's look at the `gitschema.Commit` type: |
| |
| ``` |
| // Commit represents a single commit stored in the database. |
| // |
| // JSON annotations make it serialize like the legacy cid.CommitDetail. |
| type Commit struct { |
| CommitNumber types.CommitNumber `sql:"commit_number INT PRIMARY KEY"` |
| GitHash string `sql:"git_hash TEXT UNIQUE NOT NULL"` |
| Timestamp int64 `sql:"commit_time INT"` // Unix timestamp, seconds from the epoch. |
| Author string `sql:"author TEXT"` |
| Subject string `sql:"subject TEXT"` |
| } |
| ``` |
| |
| Note that struct tags that define how each element will be stored as SQL in the |
| database. For example, `CommitNumber` will be stored as an `INT` and also used as |
| a `PRIMARY KEY`. |
| |
| If you need to add other information to the schema, such as secondary indexes, |
| add them as struct tags on private members of the struct. In `TraceValuesSchema` |
| below you can see how a secondary index was added in the `bySourceFileIndex` |
| struct tag: |
| |
| ``` |
| // TraceValuesSchema describes the SQL schema of the TraceValues table. |
| type TraceValuesSchema struct { |
| TraceID []byte `sql:"trace_id BYTES"` |
| CommitNumber int64 `sql:"commit_number INT"` |
| Value float32 `sql:"val REAL"` |
| SourceFileID int64 `sql:"source_file_id INT"` |
| primaryKey struct{} `sql:"PRIMARY KEY (trace_id, commit_number)"` |
| bySourceFileIndex struct{} `sql:"INDEX by_source_file_id (source_file_id, trace_id)"` |
| } |
| ``` |
| |
| Looking back at the `tables.go` file: |
| |
| ``` |
| package sql |
| //go:generate bazelisk run --config=mayberemote //:go -- run ./tosql |
| |
| ... |
| |
| // Tables represents the full schema of the SQL database. |
| type Tables struct { |
| Alerts []alertschema.AlertSchema |
| Commits []gitschema.Commit |
| ParamSets []traceschema.ParamSetsSchema |
| Postings []traceschema.PostingsSchema |
| Regressions []regressionschema.RegressionSchema |
| Shortcuts []shortcutschema.ShortcutSchema |
| SourceFiles []traceschema.SourceFilesSchema |
| TraceValues []traceschema.TraceValuesSchema |
| } |
| ``` |
| |
| You can see to `go:generate` command at the top of the file. This generates a |
| new Go file that contains the SQL schema as a Go string, and also provides the |
| column names of each table: |
| |
| ``` |
| package sql |
| |
| // Generated by //go/sql/exporter/ |
| // DO NOT EDIT |
| |
| const Schema = `CREATE TABLE IF NOT EXISTS Alerts ( |
| id INT PRIMARY KEY DEFAULT unique_rowid(), |
| alert TEXT, |
| config_state INT DEFAULT 0, |
| last_modified INT |
| ); |
| ... |
| CREATE TABLE IF NOT EXISTS TraceValues ( |
| trace_id BYTES, |
| commit_number INT, |
| val REAL, |
| source_file_id INT, |
| PRIMARY KEY (trace_id, commit_number), |
| INDEX by_source_file_id (source_file_id, trace_id) |
| ); |
| ` |
| |
| var Alerts = []string{ |
| "id", |
| "alert", |
| "config_state", |
| "last_modified", |
| } |
| |
| ... |
| |
| var TraceValues = []string{ |
| "trace_id", |
| "commit_number", |
| "val", |
| "source_file_id", |
| } |
| ``` |
| |
| The full file can be found at: |
| |
| > https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/schema.go |
| |
| ## Validating SQL Schema |
| |
| The above section shows how we can generate the SQL commands to create all the |
| tables in the database. Another aspect we need to care about is having the |
| application check the schema on startup to make sure it has all the tables and |
| columns the current version of the application expects. |
| |
| To do that we need to export information about the tables, in this case the |
| column names, their types, and any extra indexes. |
| |
| > https://pkg.go.dev/go.skia.org/infra/go/sql/schema#Description |
| |
| The description can be exported to a JSON file: |
| |
| > https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/exportschema/main.go |
| |
| Then the JSON file that describes the table structure can be then be embedded |
| into the application: |
| |
| ``` |
| // Package expectedschema contains the schema the database is expected to have. |
| package expectedschema |
| |
| import ( |
| "embed" // Enable go:embed. |
| "encoding/json" |
| |
| "go.skia.org/infra/go/skerr" |
| "go.skia.org/infra/go/sql/schema" |
| ) |
| |
| // FS is a filesystem with the schema.json file. |
| // |
| //go:embed schema.json |
| var FS embed.FS |
| |
| // Load returns the deserialized schema.Description stored in the schema.json file. |
| func Load() (schema.Description, error) { |
| var ret schema.Description |
| b, err := FS.ReadFile("schema.json") |
| if err != nil { |
| return ret, skerr.Wrap(err) |
| } |
| |
| err = json.Unmarshal(b, &ret) |
| if err != nil { |
| return ret, skerr.Wrap(err) |
| } |
| return ret, nil |
| } |
| ``` |
| |
| And finally that schema can be compared to the current schema |
| running in production: |
| |
| ``` |
| // Confirm the database has the right schema. |
| expectedSchema, err := expectedschema.Load() |
| if err != nil { |
| return nil, skerr.Wrap(err) |
| } |
| |
| actual, err := schema.GetDescription(singletonPool, sql.Tables{}) |
| if err != nil { |
| return nil, skerr.Wrap(err) |
| } |
| if diff := assertdeep.Diff(expectedSchema, *actual); diff != "" { |
| return nil, skerr.Fmt("Schema needs to be updated: %s.", diff) |
| } |
| ``` |
| |
| > https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/builders/builders.go#93 |
| |
| ## Schema Changes |
| |
| This system does not offer any automatic generation of schema migrations, which |
| would be incomplete and error prone, so migrations need to be manually written, |
| but the above schema description and comparison functions can be used to |
| test a schema migration. |
| |
| For an example of how to do this see `Test_LiveToNextSchemaMigration` in |
| |
| > https://skia.googlesource.com/buildbot/+/refs/heads/main/machine/go/machine/store/cdb/cdb_test.go#365 |
| |
| The test below loads the previous version of the schema, applies the migrations, |
| and then compares the resulting schema to the current schema: |
| |
| ``` |
| func Test_LiveToNextSchemaMigration(t *testing.T) { |
| ctx := context.Background() |
| db := cdbtest.NewCockroachDBForTests(t, "desc") |
| |
| expectedSchema := GetSchema(t, db) |
| |
| _, err := db.Exec(ctx, "DROP TABLE IF EXISTS Description") |
| require.NoError(t, err) |
| _, err = db.Exec(ctx, "DROP TABLE IF EXISTS TaskResult") |
| require.NoError(t, err) |
| |
| _, err = db.Exec(ctx, LiveSchema) |
| require.NoError(t, err) |
| |
| _, err = db.Exec(ctx, FromLiveToNext) |
| require.NoError(t, err) |
| |
| migratedSchema := GetSchema(t, db) |
| |
| assertdeep.Equal(t, expectedSchema, migratedSchema) |
| |
| // Test the test, make sure at least one known column is present. |
| require.Equal(t, "text def: nullable:NO", migratedSchema.ColumnNameAndType["description.machine_id"]) |
| } |
| ``` |
| |
| ## Utilities for writing queries |
| |
| The [sqlutil](https://pkg.go.dev/go.skia.org/infra/go/sql/sqlutil) package has |
| utilities that work with the generated column names to make writing some SQL |
| queries easier. |
| |
| For example, for a table with a lot of columns the insert statement may be |
| easier to write. In the following code presume that `Description` contains |
| the column names of the Description table: |
| |
| ``` |
| queryTemplate := fmt.Sprintf(`INSERT INTO Description (%s) VALUES %s`, |
| strings.Join(Description, ","), sqlutil.ValuesPlaceholders(len(Description), 1), |
| |
| ``` |