Rewrite output pipeline; add paginated bnz list by schuay · Pull Request #2 · victorgomes/bug

schuay · 2026-05-15T10:52:18Z

Rewrite output pipeline as flat-tree HTML to Turndown markdown
Tighten focused output after triage
bnz list: paginate, parse rich rows, filter --since

The previous extractor read element.innerText from the rendered page and ran anchored regexes to recover structure. That approach could silently drop content when Buganizer's UI changed, and it lost link targets, heading levels, and code-block fidelity along the way. The new pipeline is three layers: 1. Flatten: an in-page recursive walker descends light + shadow DOM, substitutes <slot> elements with assignedNodes({flatten:true}), and emits HTML mirroring the rendered flat tree. 2. Convert: Node-side Turndown (with the GFM plugin) turns that HTML into markdown. Custom rules handle Polymer-specific compaction: drop action buttons / icon controls / empty avatars, flatten inline links, collapse the issue-metadata sidebar to one bullet per field, drop per-field-change time-of-day stamps, drop system-generated history events that have no body. 3. Render: bug.js adds a synthesized header and appendix (attachments, downloaded reproducers). What remains of the structured-parsing layer are tiny targeted extractors for things the CLI itself acts on: the testcase-key URL inside an issue, the [Command line] flags on a CF page, attachment URLs collected during the walk, reproducer-download endpoints. Other changes bundled in: - Split into lib/url, lib/dom, lib/browser, lib/cache, lib/render. - bug list <query>: run a Buganizer search and emit the results. - Disk cache at ~/.config/bug-cli/cache/ with a 5-minute TTL. --refresh busts a single fetch; --no-cache disables. - --download-original (cf testcases) and --download-attachments[=DIR] (any issue) using the authenticated request context. - A single Chromium launch handles multiple targets in one invocation (bug 1 2 3, bug cf a b c). - Default output is focused (page chrome dropped, sidebar compacted, empty fields suppressed). Pass --full to disable the filter.

Five fixes surfaced while triaging a batch of open issues: - Drop the redundant page-rendered <h2>Issue <id></h2> at the top of issue-details-wrapper (Turndown rule), and drop b-issue-id-picker / issue-chip-indicators / b-access-limits-chip from the walker. These were echoing the issue id and an empty 'Visibility' label that our synthesized header / sidebar already cover. - Synthesize a 'Status: X . Type: Y . Priority: Z . Severity: W' summary line just under the URL. Buganizer doesn't render a chip for Status=New, so the status would otherwise only appear at the bottom of the page. Extracted via regex from the compacted sidebar. - Pad b-formatted-date-time with a leading space so the comment header no longer renders '[victorgomes#2](url)2026-05-15 04:27' (link jammed against timestamp with no separator). - Structure attachment listings: <b-attachment-viewer> renders as '- **filename** -- size -- [View](url) [Download](url)' instead of paragraph flow. - Introduce a qsa() helper to iterate domino's NodeList (Turndown's Node-side HTML parser); its NodeList lacks Symbol.iterator, so for...of silently iterated zero times. Earlier rules that used for...of on querySelectorAll happened to fall through to acceptable defaults; the new attachment rule blew up outright.

Before this, bnz list returned only the first page of search results and parsed bare issue links from it. The new pipeline walks Buganizer's pagination and parses the rich result table: - dumpPaginated() clicks the 'Go to next page' button until disabled (Buganizer ignores URL pagination parameters). --max-pages=N caps at N pages of 50 (default 30). - extractSearchRows() parses the markdown table Turndown emits: priority, type, title, assignee, status, 7d-views, id, modified. extractSearchHits is kept as a fallback for pages that don't render the table (empty results, error stubs). - --since=<dur|date> filters hits by their LAST MODIFIED column. Duration syntax accepts h/d/w/m (e.g. 7d, 1w); anything else falls through to Date.parse, so ISO dates work too. - listCmd dedups by issue id across pages and reports pagesFetched and filteredOut in both the markdown summary line and json output. - renderListMarkdown emits a markdown table when hits have structured fields, falling back to the bullet list when only id/title/url are available.

schuay added 3 commits May 15, 2026 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite output pipeline; add paginated bnz list#2

Rewrite output pipeline; add paginated bnz list#2
schuay wants to merge 3 commits into
victorgomes:mainfrom
schuay:feat-output-rewrite

schuay commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

schuay commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant