Skip to content

Commit 4959bfa

Browse files
VinciGit00claude
andcommitted
feat: add monitor_activity tool, realign env vars with scrapegraph-py #84
Track the latest push on scrapegraph-py PR #84: - Base URL: https://api.scrapegraphai.com/v2 -> /api/v2 (matches py env.py) - Env var: SGAI_TIMEOUT_S -> SGAI_TIMEOUT (default 120s) SGAI_TIMEOUT_S is kept as a legacy alias and still honored. - New tool: monitor_activity (GET /monitor/:id/activity) with limit/cursor pagination, mirroring sgai.monitor.activity() in the Python/JS SDKs. Returns tick history (id, createdAt, status, changed, elapsedMs, diffs) plus nextCursor for paging. - README.md, server.json, .agent docs: update base URL, env var names, and tool listing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3c82ca7 commit 4959bfa

5 files changed

Lines changed: 66 additions & 22 deletions

File tree

.agent/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,7 +379,8 @@ npx @modelcontextprotocol/inspector scrapegraph-mcp
379379
## 📅 Changelog
380380

381381
### April 2026
382-
- ✅ Migrated MCP client and tools to **API v2** ([scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84)): base `https://api.scrapegraphai.com/v2`, `SGAI-APIKEY` header (matches SDK wire format), new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools. Env vars aligned with SDK: `SGAI_API_URL`, `SGAI_TIMEOUT_S`.
382+
- ✅ Migrated MCP client and tools to **API v2** ([scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84)): base `https://api.scrapegraphai.com/api/v2`, `SGAI-APIKEY` header (matches SDK wire format), new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools. Env vars aligned with SDK: `SGAI_API_URL`, `SGAI_TIMEOUT` (legacy alias `SGAI_TIMEOUT_S` still honored).
383+
- ✅ Added `monitor_activity` tool for paginated tick history (GET /monitor/:id/activity), mirroring `sgai.monitor.activity()` in scrapegraph-py v2.
383384

384385
### January 2026
385386
- ✅ Added `time_range` parameter to SearchScraper for filtering results by recency (v1-era; **ignored on API v2**)

.agent/system/project_architecture.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ AI Assistant (Claude/Cursor)
130130
↓ (stdio via MCP)
131131
FastMCP Server (this project)
132132
↓ (HTTPS API calls)
133-
ScrapeGraphAI API (default https://api.scrapegraphai.com/v2)
133+
ScrapeGraphAI API (default https://api.scrapegraphai.com/api/v2)
134134
↓ (web scraping)
135135
Target Websites
136136
```
@@ -141,7 +141,7 @@ The server follows a simple, single-file architecture:
141141

142142
**`ScapeGraphClient` Class:**
143143
- HTTP client wrapper for ScrapeGraphAI API v2 ([scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84))
144-
- Base URL: `https://api.scrapegraphai.com/v2` (override with env `SGAI_API_URL`)
144+
- Base URL: `https://api.scrapegraphai.com/api/v2` (override with env `SGAI_API_URL`)
145145
- Auth: `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0` (matches scrapegraph-py v2)
146146
- v2 methods include `scrape_v2`, `extract`, `search_api`, `crawl_*`, `monitor_*`, `credits`, `history`, plus compatibility wrappers used by MCP tools
147147

@@ -188,7 +188,7 @@ The server follows a simple, single-file architecture:
188188

189189
The server exposes many `@mcp.tool()` handlers (see repository `README.md` for the full table). The detailed subsections below still use **v1-style endpoint names** in several places; treat them as illustrative and prefer the v2 mapping in **API Integration**.
190190

191-
**v2 tool names:** `markdownify`, `scrape`, `smartscraper`, `searchscraper`, `smartcrawler_initiate`, `smartcrawler_fetch_results`, `crawl_stop`, `crawl_resume`, `credits`, `sgai_history`, `monitor_create`, `monitor_list`, `monitor_get`, `monitor_pause`, `monitor_resume`, `monitor_delete`.
191+
**v2 tool names:** `markdownify`, `scrape`, `smartscraper`, `searchscraper`, `smartcrawler_initiate`, `smartcrawler_fetch_results`, `crawl_stop`, `crawl_resume`, `credits`, `sgai_history`, `monitor_create`, `monitor_list`, `monitor_get`, `monitor_pause`, `monitor_resume`, `monitor_delete`, `monitor_activity`.
192192

193193
### 1. `markdownify(website_url: str)`
194194

@@ -391,7 +391,7 @@ If status is "completed":
391391

392392
### ScrapeGraphAI API
393393

394-
**Base URL:** `https://api.scrapegraphai.com/v2` (configurable via `SGAI_API_URL`)
394+
**Base URL:** `https://api.scrapegraphai.com/api/v2` (configurable via `SGAI_API_URL`)
395395

396396
**Authentication:**
397397
- Headers: `SGAI-APIKEY: <key>` (matches scrapegraph-py v2 wire format)
@@ -414,6 +414,7 @@ If status is "completed":
414414
| `/monitor/{id}` | GET, DELETE | `monitor_get`, `monitor_delete` |
415415
| `/monitor/{id}/pause` | POST | `monitor_pause` |
416416
| `/monitor/{id}/resume` | POST | `monitor_resume` |
417+
| `/monitor/{id}/activity` | GET | `monitor_activity` |
417418

418419
**Request Format:**
419420
```json

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,22 +28,22 @@ A production-ready [Model Context Protocol](https://modelcontextprotocol.io/intr
2828

2929
## API v2
3030

31-
This MCP server targets **ScrapeGraph API v2** (`https://api.scrapegraphai.com/v2`), aligned 1:1 with
31+
This MCP server targets **ScrapeGraph API v2** (`https://api.scrapegraphai.com/api/v2`), aligned 1:1 with
3232
[scrapegraph-py PR #84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84). Auth uses the
3333
`SGAI-APIKEY` header. Environment variables mirror the Python SDK:
3434

35-
- **`SGAI_API_URL`** — override the base URL (default `https://api.scrapegraphai.com/v2`)
36-
- **`SGAI_TIMEOUT_S`** — request timeout in seconds (default `120`)
35+
- **`SGAI_API_URL`** — override the base URL (default `https://api.scrapegraphai.com/api/v2`)
36+
- **`SGAI_TIMEOUT`** — request timeout in seconds (default `120`)
3737
- **`SGAI_API_KEY`** — API key (can also be passed via MCP `scrapegraphApiKey` or `X-API-Key` header)
3838

39-
> `SCRAPEGRAPH_API_BASE_URL` is still honored as a legacy alias for `SGAI_API_URL`.
39+
> Legacy aliases (still honored): `SCRAPEGRAPH_API_BASE_URL` for `SGAI_API_URL`, `SGAI_TIMEOUT_S` for `SGAI_TIMEOUT`.
4040
4141
## Key Features
4242

4343
- **Scrape & extract**: `markdownify` / `scrape` (POST /scrape), `smartscraper` (POST /extract, URL only)
4444
- **Search**: `searchscraper` (POST /search; `num_results` clamped 3–20)
4545
- **Crawl**: Async multi-page crawl in **markdown** or **html** only; `crawl_stop` / `crawl_resume`
46-
- **Monitors**: Scheduled jobs via `monitor_create`, `monitor_list`, `monitor_get`, pause/resume/delete
46+
- **Monitors**: Scheduled jobs via `monitor_create`, `monitor_list`, `monitor_get`, pause/resume/delete, `monitor_activity` (paginated tick history)
4747
- **Account**: `credits`, `sgai_history`
4848
- **Easy integration**: Claude Desktop, Cursor, Smithery, HTTP transport
4949
- **Developer docs**: `.agent/` folder
@@ -83,6 +83,7 @@ That's it! The server is now available to your AI assistant.
8383
| `credits` | GET /credits |
8484
| `sgai_history` | GET /history |
8585
| `monitor_create`, `monitor_list`, `monitor_get`, `monitor_pause`, `monitor_resume`, `monitor_delete` | /monitor API |
86+
| `monitor_activity` | GET /monitor/:id/activity (paginated tick history: `id`, `createdAt`, `status`, `changed`, `elapsedMs`, `diffs`) |
8687

8788
**Removed vs older MCP releases:** `sitemap`, `agentic_scrapper`, `markdownify_status`, `smartscraper_status` (no v2 endpoints).
8889

server.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"name": "SGAI_API_KEY"
2525
},
2626
{
27-
"description": "Override API base URL (default https://api.scrapegraphai.com/v2)",
27+
"description": "Override API base URL (default https://api.scrapegraphai.com/api/v2)",
2828
"isRequired": false,
2929
"format": "string",
3030
"isSecret": false,
@@ -35,7 +35,7 @@
3535
"isRequired": false,
3636
"format": "string",
3737
"isSecret": false,
38-
"name": "SGAI_TIMEOUT_S"
38+
"name": "SGAI_TIMEOUT"
3939
}
4040
]
4141
}

src/scrapegraph_mcp/server.py

Lines changed: 51 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,10 @@
2424
Removed on v2 (no API equivalent): sitemap, agentic_scrapper, markdownify_status, smartscraper_status.
2525
2626
Environment variables (match scrapegraph-py v2):
27-
- SGAI_API_URL (default https://api.scrapegraphai.com/v2) — base URL override
28-
- SGAI_TIMEOUT_S (default 120) — request timeout in seconds
27+
- SGAI_API_URL (default https://api.scrapegraphai.com/api/v2) — base URL override
28+
- SGAI_TIMEOUT (default 120) — request timeout in seconds
2929
- SCRAPEGRAPH_API_BASE_URL — legacy alias for SGAI_API_URL (still honored)
30+
- SGAI_TIMEOUT_S — legacy alias for SGAI_TIMEOUT (still honored)
3031
3132
## Parameter Validation and Error Handling
3233
@@ -87,8 +88,8 @@
8788
logger = logging.getLogger(__name__)
8889

8990
MCP_SERVER_VERSION = "2.0.0"
90-
# Matches scrapegraph-py v2 (env.py): https://api.scrapegraphai.com/v2
91-
DEFAULT_API_BASE_URL = "https://api.scrapegraphai.com/v2"
91+
# Matches scrapegraph-py v2 (env.py): https://api.scrapegraphai.com/api/v2
92+
DEFAULT_API_BASE_URL = "https://api.scrapegraphai.com/api/v2"
9293

9394

9495
def _api_base_url() -> str:
@@ -101,8 +102,8 @@ def _api_base_url() -> str:
101102

102103

103104
def _api_timeout_s() -> float:
104-
# SGAI_TIMEOUT_S mirrors scrapegraph-py v2 (default 120s).
105-
val = os.environ.get("SGAI_TIMEOUT_S")
105+
# SGAI_TIMEOUT mirrors scrapegraph-py v2 (default 120s); SGAI_TIMEOUT_S is a legacy alias.
106+
val = os.environ.get("SGAI_TIMEOUT") or os.environ.get("SGAI_TIMEOUT_S")
106107
try:
107108
return float(val) if val else 120.0
108109
except ValueError:
@@ -545,6 +546,20 @@ def monitor_resume(self, monitor_id: str) -> Dict[str, Any]:
545546
def monitor_delete(self, monitor_id: str) -> Dict[str, Any]:
546547
return self._request("DELETE", f"/monitor/{monitor_id}")
547548

549+
def monitor_activity(
550+
self,
551+
monitor_id: str,
552+
limit: Optional[int] = None,
553+
cursor: Optional[str] = None,
554+
) -> Dict[str, Any]:
555+
"""GET /monitor/:id/activity — paginated tick history."""
556+
params: Dict[str, Any] = {}
557+
if limit is not None:
558+
params["limit"] = limit
559+
if cursor is not None:
560+
params["cursor"] = cursor
561+
return self._request("GET", f"/monitor/{monitor_id}/activity", params=params or None)
562+
548563
def close(self) -> None:
549564
"""Close the HTTP client."""
550565
self.client.close()
@@ -647,7 +662,7 @@ def web_scraping_guide() -> str:
647662
1. Use **markdownify** or **scrape** before **smartscraper** when you only need readable text.
648663
2. Multi-page **AI** extraction: run **smartscraper** per URL, or use **monitor_create** on a schedule.
649664
3. Poll **smartcrawler_fetch_results** until the crawl finishes.
650-
4. Override API host with env **SGAI_API_URL** if needed (default `https://api.scrapegraphai.com/v2`).
665+
4. Override API host with env **SGAI_API_URL** if needed (default `https://api.scrapegraphai.com/api/v2`).
651666
"""
652667

653668

@@ -697,7 +712,7 @@ def quick_start_examples() -> str:
697712
limit: 10
698713
```
699714
700-
Auth: `SGAI_API_KEY` or MCP `scrapegraphApiKey`. Optional: `SGAI_API_URL`, `SGAI_TIMEOUT_S`.
715+
Auth: `SGAI_API_KEY` or MCP `scrapegraphApiKey`. Optional: `SGAI_API_URL`, `SGAI_TIMEOUT` (legacy: `SGAI_TIMEOUT_S`).
701716
"""
702717

703718

@@ -712,11 +727,11 @@ def api_status() -> str:
712727
return """# ScapeGraph API Status (MCP v2)
713728
714729
- **MCP package version**: 2.0.0 (matches [scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84) API surface)
715-
- **Default API base**: `https://api.scrapegraphai.com/v2` (override with `SGAI_API_URL`)
730+
- **Default API base**: `https://api.scrapegraphai.com/api/v2` (override with `SGAI_API_URL`)
716731
- **Auth headers**: `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0`
717732
718733
## Tools
719-
markdownify, scrape, smartscraper, searchscraper, smartcrawler_initiate, smartcrawler_fetch_results, crawl_stop, crawl_resume, generate_schema, credits, sgai_history, monitor_create, monitor_list, monitor_get, monitor_pause, monitor_resume, monitor_delete
734+
markdownify, scrape, smartscraper, searchscraper, smartcrawler_initiate, smartcrawler_fetch_results, crawl_stop, crawl_resume, generate_schema, credits, sgai_history, monitor_create, monitor_list, monitor_get, monitor_pause, monitor_resume, monitor_delete, monitor_activity
720735
721736
## Removed vs legacy MCP
722737
sitemap, agentic_scrapper, markdownify_status, smartscraper_status — not available on API v2.
@@ -2008,6 +2023,32 @@ def monitor_delete(monitor_id: str, ctx: Context) -> Dict[str, Any]:
20082023
return {"error": str(e)}
20092024

20102025

2026+
@mcp.tool(annotations={"readOnlyHint": True, "destructiveHint": False, "idempotentHint": True})
2027+
def monitor_activity(
2028+
monitor_id: str,
2029+
ctx: Context,
2030+
limit: Optional[int] = None,
2031+
cursor: Optional[str] = None,
2032+
) -> Dict[str, Any]:
2033+
"""Poll per-run tick history for a monitor (API v2 GET /monitor/:id/activity).
2034+
2035+
Returns the ticks produced on each scheduled run (`id`, `createdAt`, `status`,
2036+
`changed`, `elapsedMs`, `diffs`) plus `nextCursor` when more results are
2037+
available. Mirrors `sgai.monitor.activity()` in scrapegraph-py v2.
2038+
2039+
Args:
2040+
monitor_id: ID of the monitor (cronId returned by monitor_create).
2041+
limit: Page size, 1–100. Default 20 (server-side).
2042+
cursor: Opaque pagination cursor returned as `nextCursor` by a prior call.
2043+
"""
2044+
try:
2045+
api_key = get_api_key(ctx)
2046+
client = ScapeGraphClient(api_key)
2047+
return client.monitor_activity(monitor_id, limit=limit, cursor=cursor)
2048+
except Exception as e:
2049+
return {"error": str(e)}
2050+
2051+
20112052
# Add tool for basic scrape
20122053
@mcp.tool(annotations={"readOnlyHint": True, "destructiveHint": False, "idempotentHint": True})
20132054
def scrape(

0 commit comments

Comments
 (0)