A Multi-Agent System for Cross-Checking Phishing URLs.
Cross-Check is an advanced phishing detection framework powered by Large Language Models (LLMs). Built using Google's Agent Development Kit (ADK) and Mesop, it implements a "debate" mechanism where multiple specialized AI agents analyze a website from different perspectives before reaching a consensus on its legitimacy.
Traditional phishing detection often relies on single-point analysis. Cross-Check mitigates the risk of AI hallucinations and improves accuracy by employing a Multi-Agent Debate Framework.
Instead of asking one model "Is this phishing?", Cross-Check convenes a panel of experts:
- URL Analyst: Examines domain patterns, typosquatting, and TLDs.
- HTML Structure Analyst: Inspects code for hidden elements, obfuscated scripts, and form exploits.
- Content Semantic Analyst: Analyzes visible text for urgency, social engineering, and manipulative language.
- Brand Impersonation Analyst: Detects mismatches between brand identity and the actual URL/content.
These agents debate their findings under the supervision of a Moderator, and a Judge delivers the final verdict.
See Cross-Check in action:
- Legitimate URL โ Analysis of a safe website
- Phishing URL โ Detection of a phishing attempt
- Invalid URL โ Handling of invalid URLs
- Rate Limit โ Graceful handling of API limits
- Python 3.13+
uv(for fast Python package management)- API Keys for your LLM provider (e.g., Groq, OpenAI)
-
Clone the repository:
git clone https://github.com/vksundararajan/cross-check.git cd cross-check -
Install dependencies:
make install
-
Environment Setup: Rename the example environment file and add your API keys.
mv .env.example .env # Edit .env and add your GROQ_API_KEY or relevant model keys
You can run the application using the provided Makefile or via Docker.
Using Make: To see all available commands (including tests, evaluation, and dev server), simply run:
make helpTo start the web UI immediately:
make serveUsing Docker:
docker build -t cross-check .
docker run -p 7860:7860 -e GROQ_API_KEY=$GROQ_API_KEY cross-check- Google ADK Integration: Scalable and modular agent orchestration.
- Mesop UI: A clean, Python-native web interface.
- Model Agnostic: Uses LiteLLM to route requests to models like Llama 3, GPT-4, or Gemini.
- Debate Capability: Implements multi-round reasoning to reduce false positives.
- Robust Evaluation: Integrated Pytest suite for benchmarking and unit testing.
The system utilizes a sequential pipeline governed by a debate loop.
Cross-Check operates on a sophisticated SequentialAgent architecture powered by Google ADK. The pipeline simulates a panel of cybersecurity experts debating the legitimacy of a website.
The system processes a request in three distinct stages:
Agent: UrlPreProcessor โ Before any AI analysis occurs, this custom Python agent executes deterministic validation:
- Validation: Verifies the URL format and reachability.
- Extraction: Scrapes the target website, cleaning the raw HTML and extracting visible text.
- Context Injection: Places the sanitized data into the session state, ensuring all subsequent agents analyze the exact same snapshot of the site.
Agent: LoopAgent โ containing a ParallelAgent & Moderator
This is the core reasoning engine. Instead of a single pass, the system enters an iterative cycle:
- Parallel Analysis: Four specialist agents (
UrlAnalyst,HtmlAnalyst,ContentAnalyst,BrandAnalyst) analyze the website simultaneously. Each focuses solely on its domain (e.g., the URL analyst looks for typosquatting, while the HTML analyst looks for obfuscated scripts). - Moderator Review: The
ModeratorAgentaggregates the specialists' outputs. It evaluates if a consensus exists. - Dynamic Flow:
- If the team agrees, the Moderator calls the
exit_looptool to break the cycle. - If there is disagreement (e.g., URL looks fine but Content is suspicious), the Moderator triggers another round, forcing agents to re-evaluate based on peer feedback.
- If the team agrees, the Moderator calls the
Agent: JudgementAgent โ Once the debate concludes (either via consensus or reaching the maximum iteration limit), the Judge reviews the entire conversation history. It weighs the final arguments from all specialists and delivers the authoritative PHISHING or LEGITIMATE verdict.
cross-check/
โโโ .env.example # API key template file
โโโ .github/
โ โโโ workflows/
โ โโโ tests.yml # CI test automation workflow
โโโ .gitignore
โโโ .python-version
โโโ .vscode/
โ โโโ launch.json # VS Code debugger config
โโโ CITATION.cff # Academic citation metadata
โโโ Dockerfile # Container build instructions
โโโ LICENSE
โโโ Makefile # Project command shortcuts
โโโ README.md
โโโ app/
โ โโโ config.py # Debug and logging settings
โ โโโ events.py # UI event handlers
โ โโโ main.py # Mesop app entry point
โ โโโ state.py # UI state management
โ โโโ styles.py # Component styling rules
โโโ engine/
โ โโโ __init__.py
โ โโโ agent.py # Multi-agent pipeline definition
โ โโโ config.yaml # Agent prompts and models
โ โโโ interface.py # Runner and streaming API
โ โโโ schemas.py # Pydantic output schemas
โ โโโ utils.py # URL fetching and parsing
โโโ docs/
โ โโโ invalid.mov # Invalid URL demo video
โ โโโ legitimate.mov # Legitimate site demo video
โ โโโ phishing.mov # Phishing detection demo video
โ โโโ rate-limit.mov # Rate limit demo video
โ โโโ workflow.svg # Architecture diagram
โโโ eval/
โ โโโ data/
โ โ โโโ legitimate.evalset.json # Legitimate eval dataset
โ โ โโโ phishing.evalset.json # Phishing eval dataset
โ โ โโโ test_config.json # Evaluation config
โ โโโ test_eval.py # Agent evaluation tests
โโโ pyproject.toml
โโโ tests/
โ โโโ test_agents.py # Agent unit tests
โ โโโ test_utils.py # Utility function tests
โโโ uv.lock
Unit tests run automatically on every push via GitHub Actions. View the workflow status badge at the top of this README.
Viewing Coverage Reports from GitHub:
- Go to Actions โ click on a workflow run
- Download the
coverage-reportartifact - Extract and serve locally:
cd coverage-report python -m http.server 8000 - Open http://localhost:8000/index.html in your browser
Full Coverage (including integration tests):
Integration tests require LLM API keys. Run locally with:
make coveragePhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection
Wenhao Li, Selvakumar Manickam, Yung-Wey Chong, Shankar Karuppayah
arXiv:2506.15656 [cs.CR]
This project is licensed under the MIT License - see the LICENSE file for details.