This document tracks the experiment to remove server-side dependencies from the OneNote Web Clipper's content processing pipeline and replace them with client-side alternatives. The goal is a fully self-contained browser extension that does not rely on the OneNote augmentation/screenshot server APIs.
- Endpoint:
onenote.com/onaugmentation/clipperextract/v1.0/ - Purpose: Server-side article/recipe/product extraction using ML models
- Replacement: Mozilla Readability (
@mozilla/readability, Apache 2.0 license) - Status: Complete
- Endpoint:
onenote.com/onaugmentation/clipperDomEnhancer/v1.0/ - Purpose: Server-side Puppeteer rendering of page DOM into full-page screenshots
- Replacement: Client-side renderer window with scroll-capture and canvas stitching
- Status: Functional, with known issues (see below)
augmentationHelper.ts— RewroteaugmentPage()to usenew Readability(doc).parse()locally instead of POSTing to the server API- Removed
makeAugmentationRequest()method entirely - Removed imports:
HttpWithRetries,OneNoteApiUtils,Settings,Constants(URL refs) - Added metadata mapping: Readability's
title,excerpt,byline,siteName,publishedTimeare stored inPageMetadata
- Apache 2.0 license (compatible with WebClipper's MIT license; repo already has Apache 2.0 deps like pdfjs-dist)
- Well-maintained by Mozilla, used in Firefox Reader View
- Produces clean article HTML similar to what the server API returned
clipper.tsx— RemovedUrlUtils.onWhitelistedDomain()check that gated augmentation mode; FullPage is now the default clip modeconstants.ts— RemovedaugmentationApiUrlconstantreadability.d.ts(new) — TypeScript type declarations for@mozilla/readabilitypackage.json— Added@mozilla/readabilitydependencyaugmentationHelper_tests.ts— Updated tests for new local implementation
The server-side approach used Puppeteer to render sanitized HTML and produce a full-page screenshot. The client-side replacement mirrors this:
- Store HTML in
chrome.storage.session— The page's HTML content, base URL, and localized status text are written to session storage (avoids JSON serialization bottleneck with large payloads) - Open a renderer popup window — An extension page (
renderer.html) is opened at the same position/size as the user's browser withfocused: true. Width is capped at 1280px. Zoom is forced to 100% viachrome.tabs.setZoom. Title bar shows localized "Clipping Page" status text - Port-based communication — The renderer page connects to the service worker via
chrome.runtime.connect({ name: "renderer" }). Commands (loadContent, scroll) are exchanged over this port - Renderer loads content — Reads HTML from
chrome.storage.session, strips<script>tags, preserves<style>,<link rel="stylesheet">, and<meta>tags. Rewrites relative URLs (images, stylesheets, srcset) to absolute usingnew URL(relative, baseUrl)(CSP blocks<base href>on extension pages). Fetches external stylesheets viafetch()and inlines as<style>blocks. Renders content inside an iframe for CSS isolation. Injects[hidden]{display:none!important}to enforce HTML hidden attribute. Neutralizes fixed/sticky positioning with!importantafter stylesheets load. User interaction blocked by transparent overlay div (#interaction-shield) + keyboard/wheel JS listeners - Scroll-capture with incremental stitching — The service worker tells the renderer to scroll to each viewport position, waits 500ms (Chrome's
MAX_CAPTURE_VISIBLE_TAB_CALLS_PER_SECONDrate limit = 2/sec), then callscaptureVisibleTab()to take a PNG screenshot (lossless). Each capture is sent back to the renderer via port for immediate drawing onto a hidden canvas (display:none, invisible to captureVisibleTab). Scroll stall detection stops capture whenscrollYstops changing. Canvas height capped at pre-conversioncontentHeightor 16,384px - Finalize to Blob — When capture is complete, the renderer trims the canvas to actual content height, converts to JPEG 90% via
canvas.toBlob(), stores the single final data URL inchrome.storage.session. The helper reads this single image and converts to Blob viafetch(dataUrl).then(r => r.blob()) - Binary MIME part upload — The Blob is sent as a binary MIME part in the multipart Graph API request (
<img src="name:FullPageImageXXXX" />), eliminating ~33% base64 encoding overhead - Cleanup — Renderer window closed, session storage cleaned up (input keys by worker, output key by helper)
The implementation went through many iterations:
| Attempt | Approach | Why It Failed |
|---|---|---|
| 1 | Scroll user's actual page + captureVisibleTab | Clipper UI visible in captures; visible scrolling was jarring; scrollbars in screenshots |
| 2 | Blob URL popup | Chrome blocks executeScript on blob URLs |
| 3 | about:blank popup | Not scriptable in Chrome |
| 4 | Extension page + executeScript | scripting.executeScript blocked on extension pages in MV3 |
| 5 | Extension page + port messaging | Works, but large HTML/base64 data broke JSON serialization |
| 6 | Port + chrome.storage.session | Works for all data sizes |
| 7 | Single tall window capture | OS constrains window height to screen dimensions |
| 8 | Off-screen window (left: -9999) |
Chrome doesn't paint off-screen windows; blank captures |
| 9 | Renderer behind user window (focused: false + refocus) |
Occluded window not painted; stalls until user exposes it |
| 10 | Full-screen overlay during capture | Flashing between overlay and page content; overlay appeared in screenshots |
| Final | Focused renderer window with title bar status + binary Blob output | Current implementation |
The original implementation used base64-encoded data URLs embedded inline in the ONML body. This was switched to binary MIME parts:
| Aspect | Old (base64 data URL) | New (binary MIME part) |
|---|---|---|
| Encoding | canvas.toDataURL() → base64 string |
canvas.toBlob() → binary Blob |
| In ONML | <img src="data:image/jpeg;base64,..." /> |
<img src="name:FullPageImageXXXX" /> |
| Multipart | All content in Presentation part | Separate binary MIME part |
| Size overhead | ~33% from base64 encoding | None (raw binary) |
| Preview | Data URL in <img> |
URL.createObjectURL(blob) |
The onenoteapi library's TypedFormData.asBlob() already handles mixed string/ArrayBuffer content via the Blob constructor. We push the image Blob directly to page.dataParts following the same pattern as addAttachment().
| File | Change |
|---|---|
src/renderer.html |
NEW — Extension page for offscreen rendering |
src/scripts/renderer.ts |
NEW — Renderer script: port communication, reads HTML from storage, iframe CSS isolation, incremental canvas stitching, interaction shield |
src/scripts/contentCapture/fullPageScreenshotHelper.ts |
REWRITTEN — Reads single final JPEG from session storage, converts to Blob |
src/scripts/extensions/webExtensionBase/webExtensionWorker.ts |
Added takeFullPageScreenshot() — renderer window creation, scroll-capture loop, sends captures to renderer for stitching |
src/scripts/extensions/extensionWorkerBase.ts |
Added abstract takeFullPageScreenshot() method + registered function key |
src/scripts/extensions/safari/safariWorker.ts |
Fallback: single viewport capture via takeTabScreenshot() |
src/scripts/extensions/bookmarklet/inlineWorker.ts |
Fallback: throws not-implemented |
src/scripts/extensions/chrome/manifest.json |
Added storage permission; renderer.html as web-accessible resource |
src/scripts/constants.ts |
Removed fullPageScreenshotUrl; added takeFullPageScreenshot function key |
src/scripts/saveToOneNote/oneNoteSaveableFactory.ts |
FullPage mode sends binary MIME part instead of base64 data URL |
src/scripts/clipperUI/components/previewViewer/fullPagePreview.tsx |
Preview uses URL.createObjectURL(blob) for image display |
src/scripts/clipperUI/clipper.tsx |
Passes rawUrl to getFullPageScreenshot() for base URL resolution |
gulpfile.js |
Added bundleRenderer task to build pipeline |
Note: This diagram shows the original flow via clipper.tsx. As of V3 (self-contained sign-in), clipper.tsx is no longer injected. See
docs/unified-window-plan.mdfor the current V3 flow diagram where the worker opens the renderer directly andcontentCaptureInject.tsreplaces the clipper.tsx → fullPageScreenshotHelper.ts chain.
[Legacy flow — kept for reference]
clipper.tsx extensionWorkerBase.ts webExtensionWorker.ts
| | |
|-- getFullPageScreenshot() --> | |
| (stores HTML + URL + | |
| statusText + CSS cache | |
| in session storage) | |
| |-- takeFullPageScreenshot() --> |
| | |-- windows.create(renderer.html)
... ... ...
[Save to OneNote] | |
|-- push Blob to page.dataParts | |
|-- multipart request with | |
| binary MIME part | |
User clicks extension button
→ webExtensionWorker.openRendererWindow()
→ checks isUserLoggedIn via offscreen/localStorage
→ injects contentCaptureInject.js into original tab (if signed in)
→ opens renderer.html as popup window
contentCaptureInject.ts (content script on original tab)
→ clones DOM, inlines hidden elements, neutralizes sticky/fixed (!important),
flattens shadow DOM, converts canvas→img, adds base tag, image sizes,
removes unwanted items, resolves lazy images
→ chrome.runtime.sendMessage(JSON.stringify({html, baseUrl, title, url}))
Worker receives contentCaptureComplete
→ stores HTML/title/URL in chrome.storage.session
→ sends "loadContent" to renderer via port
Renderer loads content into iframe, sends "dimensions"
→ Worker runs scroll-capture loop (scroll → captureVisibleTab → drawCapture)
→ Renderer stitches on hidden canvas, finalizes to JPEG 95%
→ Stores fullPageFinalImage in session storage
User clicks Clip
→ Renderer sends "save" via port
→ Worker refreshes token (auth.updateUserInfoData)
→ Worker reads JPEG from session storage, builds multipart form
→ Worker POSTs to OneNote API, returns saveResult
| Decision | Rationale |
|---|---|
@mozilla/readability for article extraction |
Apache 2.0, used by Firefox Reader View, well-maintained |
| FullPage as default clip mode | Augmentation is no longer server-gated; FullPage is more universally useful |
| Renderer popup window (not direct page scroll) | Avoids visible scrolling, scrollbar artifacts, and clipper UI in screenshots |
chrome.storage.session as data bus |
Port/communicator JSON serialization chokes on multi-MB base64 data; session storage used for HTML input and single final JPEG output |
Port-based messaging (runtime.connect) |
scripting.executeScript blocked on extension pages in MV3 |
| PNG capture, JPEG 90% final output, 1280px width cap | PNG captures are lossless; single JPEG encode at finalize; no double compression; 1280px balances fidelity vs size |
| Renderer-side incremental stitching | Each PNG capture sent to renderer via port (~1-3MB each, within port limits), drawn onto hidden canvas immediately; avoids storing N captures in session storage (was 4-8MB, now ~1-2MB for single final JPEG) |
| Scroll stall detection | Stops capturing when scrollY doesn't change between captures; handles inflated scrollHeight from fixed→absolute conversion |
| Content height cropping | Measures scrollHeight before position conversions; uses pre-conversion height to cap canvas, trimming blank space |
| Interaction shield overlay | Transparent div at max z-index blocks all mouse/touch/pointer; keyboard/wheel blocked via JS listeners |
| Binary Blob output (not base64) | Eliminates ~33% base64 encoding overhead; sent as binary MIME part per Graph API multipart spec |
| Title bar for status (not overlay) | Overlay caused flashing during capture; title bar is never part of captured content |
focused: true for renderer window |
captureVisibleTab requires the window to be painted; unfocused/occluded windows produce blank captures |
| Localized status text via storage | Renderer can't access clipper's Localization module; status text passed through chrome.storage.session |
- DOM cleaning:
contentCaptureInject.tsruns the master DomUtils-compatible pipeline plus enhancements as self-contained inline functions (no imports): clone → inline hidden elements (computed display:none) → neutralize positioning (sticky→relative, fixed→absolute, all with !important) → flatten shadow DOM → canvas→image → base tag → image sizes → remove unwanted items (base64 binary styles [data:application], clipper elements + local-ref iframes, scripts/noscript, srcset, non-http/https link hrefs) → full DOCTYPE serialization → lazy image resolution (data-src → src) - CSS delivery: Renderer fetches external stylesheets directly via
fetch()and inlines as<style>blocks — no CSSOM caching or session storage intermediary. Extension pages havehost_permissions: <all_urls> - Iframe isolation: Renderer uses
<iframe id="content-frame">— page CSS and renderer styles never conflict. Injects[hidden]{display:none!important}to enforce HTML hidden attribute against CSS overrides - Position neutralization:
sticky → relative !important,fixed → absolute !importantat both capture time (contentCaptureInject.ts) and render time (renderer.ts backup). Prevents sticky element repetition in stitched multi-viewport captures - Content height cropping:
scrollHeightmeasured before position conversions to avoid inflated canvas height - Port safety: All
port.postMessagecalls wrapped insafeSend()to handle disconnected port errors (e.g., from devtools inspection) - Article mode ONML cleanup: Readability output cleaned via
cleanArticleHtml()before caching — strips ONML-unsupported elements (video, audio, canvas, svg, etc.), removes allstyle/classattributes. Same cleaned HTML used for both preview and save, matching the oldtoOnml()pipeline. Preview styled with Segoe UI 11pt, 624px max-width (OneNote page width from@OneNotePageWidth) - Files:
contentCaptureInject.ts,renderer.ts,renderer.html,webExtensionWorker.ts - Remaining: Right-edge clipping on zero-margin pages; video/streaming iframe embeds show broken players (cross-origin, same as server-side Puppeteer); CSR/shadow DOM sites (e.g., MSN.com on FAST framework) produce empty captures —
cloneNode(true)cannot copy shadow roots per DOM spec, pre-existing limitation shared with server-side Puppeteer (--disable-javascript)
captureVisibleTabrequires the window to be painted — occluded/off-screen windows produce blank captures- Current approach: renderer opens focused at same position/size as user's browser, stays open for mode switching/editing/save
- Anti-maximize:
chrome.windowsAPI reverts maximized state; resize handler snaps back to original dimensions - Future improvement: CDP via
chrome.debuggerwould allow invisible capture but requiresdebuggerpermission (shows warning banner)
- PNG captures sent individually to renderer via port (no session storage for intermediates)
- Canvas height capped at pre-conversion
contentHeightor 16,384px (Chrome canvas dimension limit) - Scroll stall detection stops capture when page can't scroll further
- Final JPEG 95% stored in session storage (~1-2MB); session storage only holds HTML + metadata + final image
- Very long pages get bottom truncated; OneNote API would likely reject larger images anyway
- Renderer width capped at 1280px; wider content reflows via CSS overrides (
max-width: 100%on images/tables,pre-wrapon code) - Some layouts with explicit pixel widths may still clip
- CDP approach would allow
captureBeyondViewportfor true full-width capture
- Tests use PhantomJS (deprecated, ES5-only) and were already failing before these changes
- Playwright migration discussed but deferred
augmentationHelper_tests.tsandfullPagePreview_tests.tsxupdated for new interfaces
Uncaught TypeError: Cannot read properties of undefined (reading 'logFailure')— communicator error handler fires beforeClipper.loggeris initializedCould not establish connection. Receiving end does not exist.— MV3 service worker lifecycle issue; port/message sent after worker suspension
The original architecture (described above) required injecting clipperInject.ts into the page to show a Mithril-based sign-in sidebar and initiate content capture via clipper.tsx → fullPageScreenshotHelper.ts. This failed on pages with strict Content Security Policy (CSP blocks iframe injection).
V3 eliminates this dependency entirely:
- Worker opens the renderer window directly on button click (no
clipperInject.ts) - Renderer handles sign-in via MSA/OrgId overlay + OAuth popup
- Content capture via standalone
contentCaptureInject.tsinjected byscripting.executeScript(CSP-immune) - Save with token refresh, telemetry, region capture, article/bookmark modes — all in the renderer
- Old sidebar (clipperInject.ts → clipper.tsx → Mithril) is dead code
See docs/unified-window-plan.md for the complete V3 architecture, flow diagram, and verification checklist.
npm install
npm run build # Compiles TS, bundles, exports to /targetLoad the extension from target/edge/OneNoteWebClipper/edgeextension/manifest/extension/ (Edge) or target/chrome (Chrome).
Gulp tasks: bundleRenderer (renderer.ts), bundleRegionOverlay (regionOverlay.ts), bundleContentCaptureInject (contentCaptureInject.ts) — all compiled and deployed to Chrome/Edge targets.