Skip to content

Commit 4f38642

Browse files
vsai12claude
andauthored
feat: Terraform support for OpenSearch/document-DB objectSchema masking (#190)
* feat: add ObjectSchema JSON canonicalization helpers Parse/Marshal/Normalize functions route user JSON through the v1pb.ObjectSchema proto for type-aware validation, then canonicalize via encoding/json with sorted map keys so plans don't flap on formatting differences. Unit tests cover whitespace/key-order equivalence, invalid proto rejection, malformed JSON rejection, empty input, and deterministic output. Part of BYT-9296. * feat: add object_schema_json attribute to catalog table block Surface the TableCatalog.object_schema oneof variant to HCL as a JSON string. StateFunc normalizes via the ObjectSchema proto so state matches what the server stores. ValidateDiagFunc rejects malformed or structurally invalid JSON at plan time. columns becomes Optional+Computed since a table using object_schema_json has no columns to list; mutual exclusivity is enforced in Task 6's convertToV1TableCatalog rewrite. Part of BYT-9296. * style: use any for new StateFunc/ValidateDiagFunc signatures Per CLAUDE.md Go conventions, use `any` in new code instead of `interface{}`. Only fixes the two signatures introduced in the previous commit; pre-existing interface{} usage elsewhere in the file is out of scope for PR2. * feat: route object_schema_json through TableCatalog oneof on write convertToV1TableCatalog now picks the ObjectSchema variant when object_schema_json is set, the Columns variant otherwise. Mutually exclusive — setting both returns a clear error at apply time. Also softens the columns type assertion (now tolerates absent/nil sets after Task 5 relaxed columns to Optional) and modernizes parameter type from interface{} to any per CLAUDE.md. Part of BYT-9296. * feat: surface ObjectSchema in flattened catalog state flattenDatabaseCatalog now populates object_schema_json when the server returns an ObjectSchema variant, using the shared canonicalizer from Task 4. Also nil-guards table.GetColumns() since the ObjectSchema variant leaves the Columns oneof field nil, which the old code dereferenced blindly. Closes the read half of the round-trip; subsequent terraform plan on an unchanged config now shows no diff. Part of BYT-9296. * test: acceptance coverage for object_schema_json round-trip Create -> re-apply (no-op) -> update -> destroy. The PlanOnly re-apply step catches canonicalization regressions that would otherwise surface to users as spurious plan drift. Also seed the mock with `test-db-objschema` so the new test can find the database it manages, mirroring the pre-existing seeds for `test-database` and `test-database-labels`. Part of BYT-9296. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: example and generated docs for object_schema_json Commented-out example shows the OpenSearch masking pattern using jsonencode for readability. Users substitute their semantic-type UUIDs from the Bytebase UI. Regenerated resource + data source docs via tfplugindocs. Part of BYT-9296. * docs: fix mislabeled nested schema heading in data source terraform-plugin-docs v0.13.0 has a known bug for read-only "Set of Object" nested schemas: it names the heading after the alphabetically- last sibling attribute instead of the attribute whose nested schema is being documented. Here the heading rendered as `catalog.schemas.tables.object_schema_json` even though the block beneath lists columns' attributes (classification, labels, name, semantic_type). The anchor ID was already correct (--columns); only the visible heading text needed fixing. Leaving as a hand edit rather than upgrading tfplugindocs, which would churn whitespace across all 44 generated docs — out of scope for PR2. Part of BYT-9296. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 51e52a0 commit 4f38642

10 files changed

Lines changed: 612 additions & 23 deletions

File tree

docs/data-sources/database.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,10 @@ Read-Only:
5252
- `classification` (String)
5353
- `columns` (Set of Object) (see [below for nested schema](#nestedobjatt--catalog--schemas--tables--columns))
5454
- `name` (String)
55+
- `object_schema_json` (String)
5556

5657
<a id="nestedobjatt--catalog--schemas--tables--columns"></a>
57-
### Nested Schema for `catalog.schemas.tables.name`
58+
### Nested Schema for `catalog.schemas.tables.columns`
5859

5960
Read-Only:
6061

docs/resources/database.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,13 @@ Optional:
5555

5656
Required:
5757

58-
- `columns` (Block Set, Min: 1) (see [below for nested schema](#nestedblock--catalog--schemas--tables--columns))
5958
- `name` (String)
6059

6160
Optional:
6261

6362
- `classification` (String) The classification id
63+
- `columns` (Block Set) (see [below for nested schema](#nestedblock--catalog--schemas--tables--columns))
64+
- `object_schema_json` (String) JSON-encoded ObjectSchema for document-oriented databases (e.g. OpenSearch, Elasticsearch). Mutually exclusive with `columns` on the same table. The JSON must match the v1.ObjectSchema proto shape; see the Bytebase API docs.
6465

6566
<a id="nestedblock--catalog--schemas--tables--columns"></a>
6667
### Nested Schema for `catalog.schemas.tables.columns`

examples/database/main.tf

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,39 @@ data "bytebase_database_list" "all" {
2727
output "all_databases" {
2828
value = data.bytebase_database_list.all
2929
}
30+
31+
# Example: OpenSearch / document-DB nested masking via object_schema_json.
32+
# The JSON must match the v1.ObjectSchema proto shape.
33+
# Replace <uuid-from-ui> with real semantic type IDs from the Bytebase
34+
# UI at Settings -> Data Masking -> Semantic Types.
35+
#
36+
# resource "bytebase_database" "opensearch_users" {
37+
# name = "instances/opensearch-cluster/databases/node-1"
38+
# project = "projects/sample-project"
39+
# environment = "environments/test"
40+
#
41+
# catalog {
42+
# schemas {
43+
# name = ""
44+
# tables {
45+
# name = "users_index"
46+
# object_schema_json = jsonencode({
47+
# type = "OBJECT"
48+
# structKind = {
49+
# properties = {
50+
# email = { type = "STRING", semanticType = "<uuid-from-ui>" }
51+
# contact = {
52+
# type = "OBJECT"
53+
# structKind = {
54+
# properties = {
55+
# phone = { type = "STRING", semanticType = "<uuid-from-ui>" }
56+
# }
57+
# }
58+
# }
59+
# }
60+
# }
61+
# })
62+
# }
63+
# }
64+
# }
65+
# }

provider/data_source_database.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,11 @@ func dataSourceDatabase() *schema.Resource {
7575
Computed: true,
7676
Description: "The classification id",
7777
},
78+
"object_schema_json": {
79+
Type: schema.TypeString,
80+
Computed: true,
81+
Description: "JSON-encoded ObjectSchema for document-oriented databases.",
82+
},
7883
"columns": {
7984
Computed: true,
8085
Type: schema.TypeSet,

provider/internal/mock_client.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,12 +191,21 @@ func (c *mockClient) CreateInstance(_ context.Context, instanceID string, instan
191191
},
192192
}
193193

194+
testDbObjSchema := &v1pb.Database{
195+
Name: fmt.Sprintf("%s/%stest-db-objschema", ins.Name, DatabaseIDPrefix),
196+
State: v1pb.State_ACTIVE,
197+
Labels: map[string]string{
198+
"bb.environment": envID,
199+
},
200+
}
201+
194202
mu.Lock()
195203
defer mu.Unlock()
196204
c.instanceMap[ins.Name] = ins
197205
c.databaseMap[defaultDb.Name] = defaultDb
198206
c.databaseMap[testDb.Name] = testDb
199207
c.databaseMap[testDbLabels.Name] = testDbLabels
208+
c.databaseMap[testDbObjSchema.Name] = testDbObjSchema
200209

201210
// Also create empty catalogs for the databases
202211
c.databaseCatalogMap[defaultDb.Name] = &v1pb.DatabaseCatalog{
@@ -208,6 +217,9 @@ func (c *mockClient) CreateInstance(_ context.Context, instanceID string, instan
208217
c.databaseCatalogMap[testDbLabels.Name] = &v1pb.DatabaseCatalog{
209218
Name: testDbLabels.Name,
210219
}
220+
c.databaseCatalogMap[testDbObjSchema.Name] = &v1pb.DatabaseCatalog{
221+
Name: testDbObjSchema.Name,
222+
}
211223
return ins, nil
212224
}
213225

provider/internal/object_schema.go

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
package internal
2+
3+
import (
4+
"encoding/json"
5+
"sort"
6+
7+
"github.com/pkg/errors"
8+
"google.golang.org/protobuf/encoding/protojson"
9+
10+
v1pb "buf.build/gen/go/bytebase/bytebase/protocolbuffers/go/v1"
11+
)
12+
13+
// ParseObjectSchemaJSON unmarshals the user-provided JSON string into a
14+
// v1pb.ObjectSchema. Returns an error with a user-friendly message if the
15+
// JSON is malformed or does not match the proto shape.
16+
func ParseObjectSchemaJSON(raw string) (*v1pb.ObjectSchema, error) {
17+
if raw == "" {
18+
return nil, nil
19+
}
20+
var schema v1pb.ObjectSchema
21+
opts := protojson.UnmarshalOptions{DiscardUnknown: false}
22+
if err := opts.Unmarshal([]byte(raw), &schema); err != nil {
23+
return nil, errors.Wrap(err, "invalid object_schema_json")
24+
}
25+
return &schema, nil
26+
}
27+
28+
// MarshalObjectSchemaToJSON serializes an ObjectSchema into a canonical
29+
// JSON string. We route through encoding/json after protojson so map keys
30+
// are sorted and whitespace is stripped — protojson itself does not
31+
// guarantee deterministic map-key order.
32+
func MarshalObjectSchemaToJSON(schema *v1pb.ObjectSchema) (string, error) {
33+
if schema == nil {
34+
return "", nil
35+
}
36+
pj, err := protojson.MarshalOptions{UseProtoNames: false}.Marshal(schema)
37+
if err != nil {
38+
return "", errors.Wrap(err, "protojson marshal")
39+
}
40+
return canonicalizeJSON(pj)
41+
}
42+
43+
// NormalizeObjectSchemaJSON is the round-trip: parse user JSON through the
44+
// proto type (type-aware validation) and emit the canonical form. Used by
45+
// StateFunc so the same value stored in state matches what we'd read back
46+
// from the server.
47+
func NormalizeObjectSchemaJSON(raw string) (string, error) {
48+
schema, err := ParseObjectSchemaJSON(raw)
49+
if err != nil {
50+
return "", err
51+
}
52+
return MarshalObjectSchemaToJSON(schema)
53+
}
54+
55+
// canonicalizeJSON reparses raw JSON into a generic value and re-marshals
56+
// with sorted map keys. encoding/json emits map keys in sorted order when
57+
// marshaling map[string]any — we exploit that.
58+
func canonicalizeJSON(raw []byte) (string, error) {
59+
var v any
60+
if err := json.Unmarshal(raw, &v); err != nil {
61+
return "", errors.Wrap(err, "canonicalize: reparse")
62+
}
63+
v = sortMapKeys(v)
64+
out, err := json.Marshal(v)
65+
if err != nil {
66+
return "", errors.Wrap(err, "canonicalize: remarshal")
67+
}
68+
return string(out), nil
69+
}
70+
71+
// sortMapKeys walks the decoded value and replaces any map[string]any with
72+
// a value whose marshaling order is deterministic. encoding/json already
73+
// sorts map[string]any keys, but we still need to descend into nested
74+
// arrays and maps to cover the full tree.
75+
func sortMapKeys(v any) any {
76+
switch t := v.(type) {
77+
case map[string]any:
78+
keys := make([]string, 0, len(t))
79+
for k := range t {
80+
keys = append(keys, k)
81+
}
82+
sort.Strings(keys)
83+
out := make(map[string]any, len(t))
84+
for _, k := range keys {
85+
out[k] = sortMapKeys(t[k])
86+
}
87+
return out
88+
case []any:
89+
out := make([]any, len(t))
90+
for i, e := range t {
91+
out[i] = sortMapKeys(e)
92+
}
93+
return out
94+
default:
95+
return v
96+
}
97+
}
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
package internal
2+
3+
import (
4+
"strings"
5+
"testing"
6+
7+
v1pb "buf.build/gen/go/bytebase/bytebase/protocolbuffers/go/v1"
8+
)
9+
10+
func TestNormalizeObjectSchemaJSON_ProducesStableOutput(t *testing.T) {
11+
// Two inputs with different whitespace and key order should normalize
12+
// to the same canonical JSON.
13+
inputA := `{"type":"OBJECT","structKind":{"properties":{"email":{"type":"STRING","semanticType":"abc"}}}}`
14+
inputB := `{
15+
"structKind": {
16+
"properties": {
17+
"email": { "semanticType": "abc", "type": "STRING" }
18+
}
19+
},
20+
"type": "OBJECT"
21+
}`
22+
23+
gotA, err := NormalizeObjectSchemaJSON(inputA)
24+
if err != nil {
25+
t.Fatalf("normalize A: %v", err)
26+
}
27+
gotB, err := NormalizeObjectSchemaJSON(inputB)
28+
if err != nil {
29+
t.Fatalf("normalize B: %v", err)
30+
}
31+
if gotA != gotB {
32+
t.Errorf("canonical forms differ:\nA=%s\nB=%s", gotA, gotB)
33+
}
34+
}
35+
36+
func TestNormalizeObjectSchemaJSON_RejectsInvalidProto(t *testing.T) {
37+
_, err := NormalizeObjectSchemaJSON(`{"type":"NOT_A_REAL_TYPE"}`)
38+
if err == nil {
39+
t.Fatal("expected error for invalid enum value, got nil")
40+
}
41+
}
42+
43+
func TestNormalizeObjectSchemaJSON_RejectsInvalidJSON(t *testing.T) {
44+
_, err := NormalizeObjectSchemaJSON(`not json at all`)
45+
if err == nil {
46+
t.Fatal("expected error for malformed JSON, got nil")
47+
}
48+
}
49+
50+
func TestNormalizeObjectSchemaJSON_EmptyString(t *testing.T) {
51+
got, err := NormalizeObjectSchemaJSON("")
52+
if err != nil {
53+
t.Fatalf("empty: %v", err)
54+
}
55+
if got != "" {
56+
t.Errorf("expected empty output for empty input, got %q", got)
57+
}
58+
}
59+
60+
func TestParseObjectSchemaJSON_RoundTripsThroughProto(t *testing.T) {
61+
input := `{"type":"OBJECT","structKind":{"properties":{"x":{"type":"STRING","semanticType":"s"}}}}`
62+
schema, err := ParseObjectSchemaJSON(input)
63+
if err != nil {
64+
t.Fatalf("parse: %v", err)
65+
}
66+
if schema.GetType() != v1pb.ObjectSchema_OBJECT {
67+
t.Errorf("expected OBJECT, got %v", schema.GetType())
68+
}
69+
props := schema.GetStructKind().GetProperties()
70+
if props["x"].GetSemanticType() != "s" {
71+
t.Errorf("expected semanticType s, got %q", props["x"].GetSemanticType())
72+
}
73+
}
74+
75+
func TestMarshalObjectSchemaToJSON_DeterministicOrder(t *testing.T) {
76+
// Build a proto with multiple map keys and verify marshal is
77+
// byte-identical across calls AND that keys are sorted.
78+
schema := &v1pb.ObjectSchema{
79+
Type: v1pb.ObjectSchema_OBJECT,
80+
Kind: &v1pb.ObjectSchema_StructKind_{
81+
StructKind: &v1pb.ObjectSchema_StructKind{
82+
Properties: map[string]*v1pb.ObjectSchema{
83+
"zeta": {Type: v1pb.ObjectSchema_STRING, SemanticType: "z"},
84+
"alpha": {Type: v1pb.ObjectSchema_STRING, SemanticType: "a"},
85+
},
86+
},
87+
},
88+
}
89+
a, err := MarshalObjectSchemaToJSON(schema)
90+
if err != nil {
91+
t.Fatalf("marshal a: %v", err)
92+
}
93+
b, err := MarshalObjectSchemaToJSON(schema)
94+
if err != nil {
95+
t.Fatalf("marshal b: %v", err)
96+
}
97+
if a != b {
98+
t.Errorf("marshal not deterministic: %q vs %q", a, b)
99+
}
100+
// Alpha must appear before zeta since we sort map keys.
101+
if strings.Index(a, "alpha") > strings.Index(a, "zeta") {
102+
t.Errorf("expected sorted map keys in output, got %s", a)
103+
}
104+
}

0 commit comments

Comments
 (0)