Rewrite output pipeline; add paginated bnz list#2
Open
schuay wants to merge 3 commits into
Open
Conversation
Contributor
schuay
commented
May 15, 2026
- Rewrite output pipeline as flat-tree HTML to Turndown markdown
- Tighten focused output after triage
- bnz list: paginate, parse rich rows, filter --since
The previous extractor read element.innerText from the rendered page and
ran anchored regexes to recover structure. That approach could silently
drop content when Buganizer's UI changed, and it lost link targets,
heading levels, and code-block fidelity along the way.
The new pipeline is three layers:
1. Flatten: an in-page recursive walker descends light + shadow DOM,
substitutes <slot> elements with assignedNodes({flatten:true}), and
emits HTML mirroring the rendered flat tree.
2. Convert: Node-side Turndown (with the GFM plugin) turns that HTML
into markdown. Custom rules handle Polymer-specific compaction:
drop action buttons / icon controls / empty avatars, flatten
inline links, collapse the issue-metadata sidebar to one bullet
per field, drop per-field-change time-of-day stamps, drop
system-generated history events that have no body.
3. Render: bug.js adds a synthesized header and appendix
(attachments, downloaded reproducers).
What remains of the structured-parsing layer are tiny targeted
extractors for things the CLI itself acts on: the testcase-key URL
inside an issue, the [Command line] flags on a CF page, attachment
URLs collected during the walk, reproducer-download endpoints.
Other changes bundled in:
- Split into lib/url, lib/dom, lib/browser, lib/cache, lib/render.
- bug list <query>: run a Buganizer search and emit the results.
- Disk cache at ~/.config/bug-cli/cache/ with a 5-minute TTL.
--refresh busts a single fetch; --no-cache disables.
- --download-original (cf testcases) and --download-attachments[=DIR]
(any issue) using the authenticated request context.
- A single Chromium launch handles multiple targets in one invocation
(bug 1 2 3, bug cf a b c).
- Default output is focused (page chrome dropped, sidebar compacted,
empty fields suppressed). Pass --full to disable the filter.
Five fixes surfaced while triaging a batch of open issues: - Drop the redundant page-rendered <h2>Issue <id></h2> at the top of issue-details-wrapper (Turndown rule), and drop b-issue-id-picker / issue-chip-indicators / b-access-limits-chip from the walker. These were echoing the issue id and an empty 'Visibility' label that our synthesized header / sidebar already cover. - Synthesize a 'Status: X . Type: Y . Priority: Z . Severity: W' summary line just under the URL. Buganizer doesn't render a chip for Status=New, so the status would otherwise only appear at the bottom of the page. Extracted via regex from the compacted sidebar. - Pad b-formatted-date-time with a leading space so the comment header no longer renders '[victorgomes#2](url)2026-05-15 04:27' (link jammed against timestamp with no separator). - Structure attachment listings: <b-attachment-viewer> renders as '- **filename** -- size -- [View](url) [Download](url)' instead of paragraph flow. - Introduce a qsa() helper to iterate domino's NodeList (Turndown's Node-side HTML parser); its NodeList lacks Symbol.iterator, so for...of silently iterated zero times. Earlier rules that used for...of on querySelectorAll happened to fall through to acceptable defaults; the new attachment rule blew up outright.
Before this, bnz list returned only the first page of search results and parsed bare issue links from it. The new pipeline walks Buganizer's pagination and parses the rich result table: - dumpPaginated() clicks the 'Go to next page' button until disabled (Buganizer ignores URL pagination parameters). --max-pages=N caps at N pages of 50 (default 30). - extractSearchRows() parses the markdown table Turndown emits: priority, type, title, assignee, status, 7d-views, id, modified. extractSearchHits is kept as a fallback for pages that don't render the table (empty results, error stubs). - --since=<dur|date> filters hits by their LAST MODIFIED column. Duration syntax accepts h/d/w/m (e.g. 7d, 1w); anything else falls through to Date.parse, so ISO dates work too. - listCmd dedups by issue id across pages and reports pagesFetched and filteredOut in both the markdown summary line and json output. - renderListMarkdown emits a markdown table when hits have structured fields, falling back to the bullet list when only id/title/url are available.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.