Graph-and-Geometric-Learning
diff --git a/‎app/projects/litbench/assets/litbench_interface.jpeg‎
243 KB b/‎app/projects/litbench/assets/litbench_interface.jpeg‎
243 KB
diff --git a/‎app/projects/litbench/assets/pipeline.jpeg‎
353 KB b/‎app/projects/litbench/assets/pipeline.jpeg‎
353 KB
diff --git a/‎app/projects/litbench/assets/table1.jpeg‎
406 KB b/‎app/projects/litbench/assets/table1.jpeg‎
406 KB
diff --git a/‎app/projects/litbench/assets/table2.jpeg‎
83.6 KB b/‎app/projects/litbench/assets/table2.jpeg‎
83.6 KB
diff --git a/‎app/projects/litbench/page.mdx‎
Lines changed: 95 additions & 0 deletions b/‎app/projects/litbench/page.mdx‎
Lines changed: 95 additions & 0 deletions
diff --git a/‎config/publications.ts‎
Lines changed: 11 additions & 0 deletions b/‎config/publications.ts‎
Lines changed: 11 additions & 0 deletions
@@ -0,0 +1,95 @@
+import { Authors, Badges } from '@/components/utils'
+
+# LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks
+
+<Authors
+  authors="Andreas Varvarigos, Yale University; Ali Maatouk, Yale University; Jiasheng Zhang, Yale University; Ngoc Bui, Yale University; Jialin Chen, Yale University; Leandros Tassiulas, Yale University; Rex Ying, Yale University"
+/>
+
+<Badges
+  venue="SIGKDD 2026"
+  github="https://github.com/varvarigos/LitBench"
+  arxiv="https://cdn.jsdelivr.net/npm/simple-icons@v9/icons/arxiv.svg"
+  pdf="https://cdn.jsdelivr.net/npm/simple-icons@v9/icons/arxiv.svg"
+/>
+
+
+## 1. Introduction
+
+Large Language Models (LLMs) have become the de facto framework for literature-related tasks such as summarization, citation recommendation, and question answering. However, general-purpose LLMs struggle to act as **domain-specific literature agents**, as they fail to reason over structured relationships between papers, concepts, and citations.
+
+Existing benchmarks either lack rich textual structure (e.g., citation sentences, related work, introductions) or ignore the **graph structure** that naturally connects scientific knowledge.
+
+**LitBench** introduces a **graph-centric benchmarking framework** that enables automated curation of domain-specific literature subgraphs and rigorous evaluation across a comprehensive suite of literature tasks.
+
+---
+
+## 2. Overall Pipeline
+
+LitBench follows an end-to-end, automated pipeline for constructing domain-specific literature benchmarks and training specialized LLMs.
+
+The framework begins with **arXiv-sourced metadata**, from which we crawl the corresponding LaTeX sources and extract structured textual content using a **custom LaTeX parser**. This step recovers rich section-level information, including titles, abstracts, introductions, related work sections, and aligned citation sentences.
+
+Each paper is then annotated with a **hierarchical set of natural-language topics** at multiple levels of abstraction using a large language model. These topic annotations are embedded and stored in a database.
+
+Given a **user-specified domain query** (e.g., *Quantum Physics*), a topic-based retriever matches the query against the topic embeddings to identify the most relevant papers. The retrieved papers are used to construct a **domain-specific citation sub-network**, where nodes represent papers with rich textual attributes and edges represent citations with associated citation sentences.
+
+Finally, the resulting citation graph is transformed into **instruction-tuning and benchmarking datasets**, covering both node-level and edge-level literature tasks. These datasets enable LLMs to be **trained and evaluated** as domain-specific literature agents across a comprehensive suite of literature-related tasks.
+
+![Overall LitBench pipeline. arXiv papers are processed, annotated with hierarchical concepts, retrieved via a concept-based encoder, and transformed into domain-specific citation sub-networks used to create instruction and benchmark datasets.](./assets/framework2024.jpg)
+
+
+---
+
+## 3. Method Overview
+
+LitBench treats **literature understanding as a graph learning problem**.
+
+The framework consists of four key stages:
+
+- **Concept Curation:** Each paper is annotated with nine natural-language concepts spanning three abstraction levels.
+- **Concept-Based Retrieval:** User queries are matched against concept embeddings rather than titles or abstracts.
+- **Graph Construction:** Nodes contain rich textual attributes, while edges contain aligned citation sentences.
+- **Multi-Instruction Internalization:** The graph is converted into instruction-tuning and benchmarking datasets.
+
+---
+
+## 4. LitBench Interface
+
+LitBench provides an interactive GUI that allows users to:
+
+- specify arbitrary research domains,
+- automatically construct domain-specific citation graphs,
+- train domain-specific LLMs,
+- evaluate models across multiple literature tasks.
+
+![LitBench graphical interface for interactive domain selection, training, and evaluation across literature tasks.](./assets/litbench_interface.jpeg)
+
+---
+
+## 5. Experiments
+
+We evaluate LitBench across three domains:
+
+- Quantitative Biology  
+- Robotics  
+- Quantum Physics  
+
+and across models ranging from **1B to 8B parameters**, including comparisons with **GPT-4o** and **DeepSeek-R1**.
+
+### 5.1: Generative Tasks
+
+![Performance on generative literature tasks including title generation, abstract completion, citation sentence generation, and introduction-to-abstract conversion.](./assets/table1.jpeg)
+
+### 5.2: Predictive Tasks
+
+![Performance on citation link prediction and citation recommendation across domains.](./assets/table2.jpeg)
+
+---
+
+## 6. Conclusion
+
+- Domain-specific LLMs fine-tuned on LitBench outperform larger general-purpose models on most literature tasks.
+- Performance gains are strongest for **graph-intensive tasks** such as citation prediction and related work generation.
+- Only a small subgraph (~1k papers) is sufficient to internalize domain-specific knowledge.
+- Strong performance is achieved **without continual pretraining**.
@@ -20,6 +20,17 @@ export interface Publication {
 }
 
 export const publications: Publication[] = [
+  {
+    title: "LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks",
+    authors: "Andreas Varvarigos, Ali Maatouk, Jiasheng Zhang, Ngoc Bui, Jialin Chen, Leandros Tassiulas, Rex Ying",
+    venue: "KDD 2026",
+    page: "litbench",
+    code: "https://github.com/varvarigos/LitBench",
+    paper: "https://cdn.jsdelivr.net/npm/simple-icons@v9/icons/arxiv.svg",
+    abstract: "Large Language Models (LLMs) demonstrate strong general-purpose language understanding but remain poorly suited as domain-specific literature agents. Existing benchmarks largely ignore the structured nature of scientific knowledge, overlooking citation graphs, concept hierarchies, and fine-grained citation context. LitBench introduces a graph-centric benchmarking framework for literature tasks that explicitly models scientific structure. The framework constructs domain-specific citation graphs from arXiv papers, enriched with structured textual fields (abstracts, introductions, related work) and aligned citation sentences. Given a user-specified domain, LitBench retrieves relevant papers via concept-based embeddings and builds focused citation sub-networks from which diverse instruction-tuning and evaluation datasets are generated, spanning node-level, edge-level, and higher-order synthesis tasks.",
+    impact: "LitBench establishes a new paradigm for training and evaluating literature-aware LLMs by leveraging graph-structured supervision rather than continual pretraining. It enables systematic benchmarking across tasks such as title generation, abstract completion, citation sentence generation, link prediction, and related work synthesis. Experiments across multiple scientific domains show that LitBench-trained models consistently outperform substantially larger general-purpose LLMs, demonstrating that explicit modeling of citation graphs and structured scientific context is critical for effective literature understanding.",
+    tags: [Tag.Benchmark],
+  },
   {
     title: "HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts",
     authors: "Neil He, Rishabh Anand, Hiren Madhu, Ali Maatouk, Smita Krishnaswamy, Leandros Tassiulas, Menglin Yang, Rex Ying",