Skip to content

ShengjieJin/pdftrim-for-llm

Repository files navigation

PDFTrim for LLM icon

PDFTrim for LLM

Trim the back matter before it eats your context window.

Clean the paper. Keep the argument. Save the context window.

English | 简体中文

Zotero 7-8 GitHub release License: AGPL-3.0-or-later


🔥 News

  • v0.1.0 First public release with Zotero 7/8 support, split/export workflow, sidebar actions, and one-click cleanup for generated PDFs.

✨ Motivation

Most papers are much longer than the part we actually want to hand to an LLM.

When we send a paper to an LLM, we usually want the core argument, method, experiment design, and takeaways. But the PDF often also contains long References, appendices, prompts, pseudocode, ablations, and implementation details. Those sections are valuable for close reading, but they can also eat a surprising amount of context window.

That creates a familiar workflow tax:

  • more tokens spent on material you may not need right now
  • more noise in summaries and extraction tasks
  • less room left for the actual paper

PDFTrim for LLM was built for that exact moment inside Zotero: you are reading a paper, you want a clean PDF for an LLM, and you do not want to manually export, crop, rename, and re-attach files every time.

This need was strongly inspired by llm-for-zotero and Vibero, which made the LLM-in-Zotero workflow feel very real and very useful. PDFTrim for LLM focuses on one small but painful bottleneck in that workflow: trimming token-heavy back matter before the paper goes into the model.

This project is built on top of zotero-plugin-template.

🧩 What It Does

  • Detects where References starts.
  • Infers where Appendix begins.
  • Lets you review and edit the detected page numbers.
  • Exports LLM-friendly PDFs next to the original file.
  • Keeps the original PDF untouched.
  • Attaches generated PDFs back to the same Zotero item.
  • Lets you open or delete generated outputs from the sidebar.

⚠️ Detection Accuracy

The detected split positions are heuristic and are not always exact.

Before exporting, you should always:

  • review the detected References start page
  • review the detected Appendix start page
  • use Jump to verify the pages manually
  • adjust the page numbers if needed

The plugin is designed to reduce repetitive work, not to replace human confirmation.

🖼️ Screenshots

Automatically detects the References and Appendices sections in a PDF. Click Jump to navigate directly to the detected page. If the result is not accurate enough, you can also adjust it manually.
Sidebar placeholder Detection placeholder
Click Split PDF to split the PDF in one step. Click Open PDF to open the generated split PDF directly. After finishing your LLM workflow with the split main-content PDF, click Delete Generated PDFs to remove all generated PDF files in one click.
Preview placeholder Attachments placeholder

📦 Installation

  1. Download the latest .xpi from GitHub Releases.
  2. Open Zotero.
  3. Go to Tools -> Plugins.
  4. Drag the .xpi file into the Plugins window.
  5. Restart Zotero if needed.

The release asset is expected to look like:

pdf-trim-for-llm.xpi

🚀 Usage

  1. Open a PDF in the Zotero reader.
  2. Open PDFTrim for LLM in the right sidebar.
  3. Wait for automatic detection.
  4. Review:
    • References start page
    • Appendix start page
  5. Use Jump to confirm the detected pages.
  6. Adjust the page numbers if needed.
  7. Choose a mode:
    • Extract Main Text
    • Split into Sections
  8. Click Split PDF.

Generated PDFs are saved next to the source PDF and attached to the same Zotero item.

📄 Output

Given:

paper.pdf

Extract Main Text creates:

paper-main.pdf

Split into Sections creates:

paper-main.pdf
paper-reference.pdf
paper-appendix.pdf

The source PDF is never modified.

🛠️ Development

Requirements:

  • Node.js LTS
  • Git
  • Zotero 7 or Zotero 8

Start development:

npm install
npm start

Production build:

npm run build

Build outputs are generated in:

.scaffold/build/

🧱 Repository Setup

This repository is already configured for:

  • GitHub Releases
  • update.json / update-beta.json
  • Zotero .xpi packaging

⚖️ License

This project is released under AGPL-3.0-or-later.

🙏 Acknowledgements

About

An open-source Zotero plugin for vibe reading and LLM-assisted paper reading. Detect references/appendix pages and export cleaner PDFs for AI workflows.

Topics

Resources

License

Stars

Watchers

Forks

Packages