Conversation
Instead of showing full .create-merge table lines, show only added/changed/live-only columns. Uses ColumnDiffHelper to compare column dictionaries and render a concise diff. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When multiple clusters have the same changes, show them once with a combined header instead of repeating the full diff for each cluster. Uses a canonical fingerprint (markdown + comments + validity) to detect identical diffs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves the generated diff markdown by (1) showing more focused, column-level diffs for table .create-merge changes and (2) grouping clusters that have identical diff output into a single combined markdown section.
Changes:
- Added cluster grouping logic in markdown generation by computing a per-cluster fingerprint and merging identical outputs.
- Introduced
ColumnDiffHelperand updatedScriptCompareChangeto render column-level diffs forCreateMergeTablechanges (fallback to line diff otherwise). - Added unit tests for column diff rendering and cluster fingerprint grouping.
Show a summary per file
| File | Description |
|---|---|
| KustoSchemaTools/KustoSchemaHandler.cs | Computes per-cluster fingerprints and groups identical cluster diffs into a combined markdown header/output. |
| KustoSchemaTools/Changes/ScriptCompareChange.cs | Adds a column-diff rendering path for CreateMergeTable and refactors line-diff rendering into a helper method. |
| KustoSchemaTools/Changes/ColumnDiffHelper.cs | New helper that produces a concise column-level diff (added/type-changed + note for live-only columns). |
| KustoSchemaTools.Tests/Changes/ColumnDiffHelperTests.cs | Unit tests validating column diff output for added/type-changed/removed/no-change cases. |
| KustoSchemaTools.Tests/Changes/ClusterGroupingTests.cs | Unit tests validating cluster fingerprint equality/inequality behavior across scenarios. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 4
| foreach (var comment in comments.OrderBy(c => c.Kind).ThenBy(c => c.Text)) | ||
| { | ||
| sb.Append($"comment:{comment.Kind}:{comment.FailsRollout}:{comment.Text};"); | ||
| } |
There was a problem hiding this comment.
BuildClusterFingerprint constructs the grouping key by concatenating raw comment.Text (and later change.Markdown) with :/; delimiters. If any comment/markdown contains these delimiter sequences (e.g. ";change:"), two different diffs can produce the same fingerprint and get incorrectly grouped. Consider generating the fingerprint from a structured representation (e.g., JSON) and hashing it (SHA256), or length-prefix/escape the fields before concatenation.
| foreach (var change in changes) | ||
| { | ||
| sb.Append($"change:{change.Markdown};"); | ||
| } |
There was a problem hiding this comment.
BuildClusterFingerprint appends full change.Markdown into the fingerprint string. For large diffs and/or many clusters this can create very large strings and increase memory/CPU usage during grouping. A more efficient approach is to hash each change/comment payload (or stream into a hasher) and group on the fixed-size hash.
| .Where(c => !oldColumns.ContainsKey(c.Key)) | ||
| .ToList(); | ||
|
|
||
| var typeChanged = newColumns | ||
| .Where(c => oldColumns.ContainsKey(c.Key) && !string.Equals(oldColumns[c.Key], c.Value, StringComparison.OrdinalIgnoreCase)) | ||
| .Select(c => new { Name = c.Key, OldType = oldColumns[c.Key], NewType = c.Value }) | ||
| .ToList(); | ||
|
|
||
| var removedFromYaml = oldColumns | ||
| .Where(c => !newColumns.ContainsKey(c.Key)) |
There was a problem hiding this comment.
BuildColumnDiff enumerates newColumns/oldColumns directly when rendering added, typeChanged, and removedFromYaml. Since these are Dictionary instances, enumeration order can vary depending on how the dictionaries were constructed, which can make the rendered diff (and cluster grouping fingerprint) non-deterministic. Sorting by column name before rendering (and sorting removed columns) would make the output stable across runs/clusters.
| .Where(c => !oldColumns.ContainsKey(c.Key)) | |
| .ToList(); | |
| var typeChanged = newColumns | |
| .Where(c => oldColumns.ContainsKey(c.Key) && !string.Equals(oldColumns[c.Key], c.Value, StringComparison.OrdinalIgnoreCase)) | |
| .Select(c => new { Name = c.Key, OldType = oldColumns[c.Key], NewType = c.Value }) | |
| .ToList(); | |
| var removedFromYaml = oldColumns | |
| .Where(c => !newColumns.ContainsKey(c.Key)) | |
| .Where(c => !oldColumns.ContainsKey(c.Key)) | |
| .OrderBy(c => c.Key, StringComparer.Ordinal) | |
| .ToList(); | |
| var typeChanged = newColumns | |
| .Where(c => oldColumns.ContainsKey(c.Key) && !string.Equals(oldColumns[c.Key], c.Value, StringComparison.OrdinalIgnoreCase)) | |
| .OrderBy(c => c.Key, StringComparer.Ordinal) | |
| .Select(c => new { Name = c.Key, OldType = oldColumns[c.Key], NewType = c.Value }) | |
| .ToList(); | |
| var removedFromYaml = oldColumns | |
| .Where(c => !newColumns.ContainsKey(c.Key)) | |
| .OrderBy(c => c.Key, StringComparer.Ordinal) |
| int last = 0; | ||
| for (int i = 0; i < relevantLines.Count; i++) | ||
| { | ||
| var b = i - 1 > 0 ? relevantLines[i - 1] : null; |
There was a problem hiding this comment.
In RenderLineDiff, the previous-line lookup uses i - 1 > 0, which skips index 0 when i == 1. This makes the neighbor-context logic inconsistent (line 0 is never considered as the previous line for line 1). Use i > 0 (or i - 1 >= 0) for the boundary check.
| var b = i - 1 > 0 ? relevantLines[i - 1] : null; | |
| var b = i > 0 ? relevantLines[i - 1] : null; |
Do only changed columns for tables and join changes for same database/table and different clusters
sample output:
https://github.com/github/warehouse-config/pull/4231#issuecomment-4236107408