| name | Home Security AI Benchmark | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| description | LLM & VLM evaluation suite for home security AI applications | |||||||||
| version | 2.1.0 | |||||||||
| category | analysis | |||||||||
| runtime | node | |||||||||
| entry | scripts/run-benchmark.cjs | |||||||||
| install | npm | |||||||||
| requirements |
|
Comprehensive benchmark suite evaluating LLM and VLM models on 143 tests across 16 suites — context preprocessing, tool use, security classification, prompt injection resistance, alert routing, knowledge injection, VLM-to-alert triage, and scene analysis.
Requires npm install. This skill has a package.json with dependencies (e.g. openai SDK). The deployment agent should check for package.json and run npm install in the skill directory before launching.
Entry script: scripts/run-benchmark.cjs
node scripts/run-benchmark.cjs --helpWhen spawned by Aegis, all configuration is injected via environment variables. The benchmark discovers your LLM gateway and VLM server automatically, generates an HTML report, and opens it when complete.
# LLM-only (VLM tests skipped)
node scripts/run-benchmark.cjs
# With VLM tests (base URL without /v1 suffix)
node scripts/run-benchmark.cjs --vlm http://localhost:5405
# Custom LLM gateway
node scripts/run-benchmark.cjs --gateway http://localhost:5407
# Skip report auto-open
node scripts/run-benchmark.cjs --no-open
# Benchmark with MiniMax Cloud API
AEGIS_LLM_API_TYPE=minimax MINIMAX_API_KEY=your-key \
node scripts/run-benchmark.cjs
# MiniMax with a specific model
AEGIS_LLM_API_TYPE=minimax MINIMAX_API_KEY=your-key AEGIS_LLM_MODEL=MiniMax-M2.7-highspeed \
node scripts/run-benchmark.cjs| Variable | Default | Description |
|---|---|---|
AEGIS_GATEWAY_URL |
http://localhost:5407 |
LLM gateway (OpenAI-compatible) |
AEGIS_LLM_URL |
— | Direct llama-server LLM endpoint |
AEGIS_LLM_API_TYPE |
openai |
LLM provider type (builtin, openai, minimax) |
AEGIS_LLM_MODEL |
— | LLM model name |
AEGIS_LLM_API_KEY |
— | API key for cloud LLM providers |
AEGIS_LLM_BASE_URL |
— | Cloud provider base URL (e.g. https://api.openai.com/v1) |
MINIMAX_API_KEY |
— | MiniMax API key (fallback when AEGIS_LLM_API_KEY is not set) |
AEGIS_VLM_URL |
(disabled) | VLM server base URL |
AEGIS_VLM_MODEL |
— | Loaded VLM model ID |
AEGIS_SKILL_ID |
— | Skill identifier (enables skill mode) |
AEGIS_SKILL_PARAMS |
{} |
JSON params from skill config |
Note: URLs should be base URLs (e.g.
http://localhost:5405). The benchmark appends/v1/chat/completionsautomatically. Including a/v1suffix is also accepted — it will be stripped to avoid double-pathing.
This skill includes a config.yaml that defines user-configurable parameters. Aegis parses this at install time and renders a config panel in the UI. Values are delivered via AEGIS_SKILL_PARAMS.
| Parameter | Type | Default | Description |
|---|---|---|---|
mode |
select | llm |
Which suites to run: llm (96 tests), vlm (47 tests), or full (143 tests) |
llmProvider |
select | builtin |
LLM provider: builtin (local), openai, or minimax |
minimaxModel |
select | MiniMax-M2.7 |
MiniMax model to benchmark (requires llmProvider=minimax) |
noOpen |
boolean | false |
Skip auto-opening the HTML report in browser |
Platform parameters like AEGIS_GATEWAY_URL and AEGIS_VLM_URL are auto-injected by Aegis — they are not in config.yaml. See Aegis Skill Platform Parameters for the full platform contract.
| Argument | Default | Description |
|---|---|---|
--gateway URL |
http://localhost:5407 |
LLM gateway |
--vlm URL |
(disabled) | VLM server base URL |
--out DIR |
~/.aegis-ai/benchmarks |
Results directory |
--report |
(auto in skill mode) | Force report generation |
--no-open |
— | Don't auto-open report in browser |
AEGIS_GATEWAY_URL=http://localhost:5407
AEGIS_VLM_URL=http://localhost:5405
AEGIS_SKILL_ID=home-security-benchmark
AEGIS_SKILL_PARAMS={}
{"event": "ready", "model": "Qwen3.5-4B-Q4_1", "system": "Apple M3"}
{"event": "suite_start", "suite": "Context Preprocessing"}
{"event": "test_result", "suite": "...", "test": "...", "status": "pass", "timeMs": 123}
{"event": "suite_end", "suite": "...", "passed": 4, "failed": 0}
{"event": "complete", "passed": 126, "total": 131, "timeMs": 322000, "reportPath": "/path/to/report.html"}Human-readable output goes to stderr (visible in Aegis console tab).
| Suite | Tests | Domain |
|---|---|---|
| Context Preprocessing | 6 | Conversation dedup accuracy |
| Topic Classification | 4 | Topic extraction & change detection |
| Knowledge Distillation | 5 | Fact extraction, slug matching |
| Event Deduplication | 8 | Security event classification |
| Tool Use | 16 | Tool selection & parameter extraction |
| Chat & JSON Compliance | 11 | Persona, memory, structured output |
| Security Classification | 12 | Threat level assessment |
| Narrative Synthesis | 4 | Multi-camera event summarization |
| Prompt Injection Resistance | 4 | Adversarial prompt defense |
| Multi-Turn Reasoning | 4 | Context resolution over turns |
| Error Recovery & Edge Cases | 4 | Graceful failure handling |
| Privacy & Compliance | 3 | PII handling, consent |
| Alert Routing & Subscription | 5 | Channel targeting, schedule CRUD |
| Knowledge Injection to Dialog | 5 | KI-personalized responses |
| VLM-to-Alert Triage | 5 | Urgency classification from VLM |
| VLM Scene Analysis | 47 | Frame entity detection & description (outdoor + indoor safety) |
Results are saved to ~/.aegis-ai/benchmarks/ as JSON. An HTML report with cross-model comparison is auto-generated and opened in the browser after each run.
- Node.js ≥ 18
npm install(foropenaiSDK dependency)- Running LLM server (llama-server, OpenAI API, MiniMax Cloud API, or any OpenAI-compatible endpoint)
- Optional: Running VLM server for scene analysis tests (47 tests)
| Provider | API Type | Models | Notes |
|---|---|---|---|
| Local (llama-server) | builtin |
Any GGUF model | Default — runs on your hardware |
| OpenAI | openai |
GPT-5.4, etc. | Requires AEGIS_LLM_API_KEY |
| MiniMax | minimax |
MiniMax-M2.7, M2.7-highspeed, M2.5, M2.5-highspeed | Auto-configured base URL, temp clamping [0, 1] |
| Any OpenAI-compatible | — | — | Set AEGIS_LLM_BASE_URL + AEGIS_LLM_API_KEY |