Skip to content

perf: replace byte-by-byte takeByte() with block reads in xml.zig and json.zig #135

@vmvarela

Description

@vmvarela

Description

Both `src/xml.zig` and `src/json.zig` read stdin byte-by-byte using `takeByte()` in a tight loop before parsing:

```zig
while (true) {
const byte = reader.takeByte() catch |err| switch (err) {
error.EndOfStream => break,
error.ReadFailed => fatal(...),
};
buf.append(allocator, byte) catch fatal(...);
}
```

This calls the reader dispatch logic once per byte. For a 10 MB file that is ~10 M iterations instead of ~2 500 with 4 KB block reads. For typical sql-pipe inputs (KBs to a few MBs piped from other processes) the overhead is unnoticeable, but it scales poorly for large files.

The pattern appears in 3 functions in `xml.zig` (`loadXmlInput`, `getXmlColumnNames`, `summarizeXml`) and in `json.zig`.

Acceptance Criteria

  • Replace the `takeByte` loop with a block-read loop (e.g. `reader.readAll` or chunked `takeBytesMax`) in all affected functions
  • Apply consistently to both `xml.zig` and `json.zig`
  • All existing tests continue to pass
  • Benchmark or note observed improvement on a representative large file (≥ 10 MB)

Notes

Check the `std.Io.Reader` API available in Zig 0.16 for the appropriate bulk-read method before implementing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:lowNice to have, do when possiblesize:sSmall — 1 to 4 hourstech-debtTechnical debt — address proactivelytype:choreMaintenance, refactoring, tooling

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions