Skip to content

vksundararajan/cross-check

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Cross-Check

Tests

A Multi-Agent System for Cross-Checking Phishing URLs.

Cross-Check is an advanced phishing detection framework powered by Large Language Models (LLMs). Built using Google's Agent Development Kit (ADK) and Mesop, it implements a "debate" mechanism where multiple specialized AI agents analyze a website from different perspectives before reaching a consensus on its legitimacy.

Overview

Traditional phishing detection often relies on single-point analysis. Cross-Check mitigates the risk of AI hallucinations and improves accuracy by employing a Multi-Agent Debate Framework.

Instead of asking one model "Is this phishing?", Cross-Check convenes a panel of experts:

  1. URL Analyst: Examines domain patterns, typosquatting, and TLDs.
  2. HTML Structure Analyst: Inspects code for hidden elements, obfuscated scripts, and form exploits.
  3. Content Semantic Analyst: Analyzes visible text for urgency, social engineering, and manipulative language.
  4. Brand Impersonation Analyst: Detects mismatches between brand identity and the actual URL/content.

These agents debate their findings under the supervision of a Moderator, and a Judge delivers the final verdict.

Demo

See Cross-Check in action:

๐Ÿš€ Getting Started

Prerequisites

  • Python 3.13+
  • uv (for fast Python package management)
  • API Keys for your LLM provider (e.g., Groq, OpenAI)

Installation

  1. Clone the repository:

    git clone https://github.com/vksundararajan/cross-check.git
    cd cross-check
  2. Install dependencies:

    make install
  3. Environment Setup: Rename the example environment file and add your API keys.

    mv .env.example .env
    # Edit .env and add your GROQ_API_KEY or relevant model keys

Running the Application

You can run the application using the provided Makefile or via Docker.

Using Make: To see all available commands (including tests, evaluation, and dev server), simply run:

make help

To start the web UI immediately:

make serve

Using Docker:

docker build -t cross-check .
docker run -p 7860:7860 -e GROQ_API_KEY=$GROQ_API_KEY cross-check

โœจ Features

  • Google ADK Integration: Scalable and modular agent orchestration.
  • Mesop UI: A clean, Python-native web interface.
  • Model Agnostic: Uses LiteLLM to route requests to models like Llama 3, GPT-4, or Gemini.
  • Debate Capability: Implements multi-round reasoning to reduce false positives.
  • Robust Evaluation: Integrated Pytest suite for benchmarking and unit testing.

๐Ÿ—๏ธ Architecture & Workflow

The system utilizes a sequential pipeline governed by a debate loop.

Workflow

๐Ÿค– The Agentic Pipeline

Cross-Check operates on a sophisticated SequentialAgent architecture powered by Google ADK. The pipeline simulates a panel of cybersecurity experts debating the legitimacy of a website.

The system processes a request in three distinct stages:

1. Ingestion & Preprocessing

Agent: UrlPreProcessor โ€“ Before any AI analysis occurs, this custom Python agent executes deterministic validation:

  • Validation: Verifies the URL format and reachability.
  • Extraction: Scrapes the target website, cleaning the raw HTML and extracting visible text.
  • Context Injection: Places the sanitized data into the session state, ensuring all subsequent agents analyze the exact same snapshot of the site.

2. The Debate Loop

Agent: LoopAgent โ€“ containing a ParallelAgent & Moderator This is the core reasoning engine. Instead of a single pass, the system enters an iterative cycle:

  • Parallel Analysis: Four specialist agents (UrlAnalyst, HtmlAnalyst, ContentAnalyst, BrandAnalyst) analyze the website simultaneously. Each focuses solely on its domain (e.g., the URL analyst looks for typosquatting, while the HTML analyst looks for obfuscated scripts).
  • Moderator Review: The ModeratorAgent aggregates the specialists' outputs. It evaluates if a consensus exists.
  • Dynamic Flow:
    • If the team agrees, the Moderator calls the exit_loop tool to break the cycle.
    • If there is disagreement (e.g., URL looks fine but Content is suspicious), the Moderator triggers another round, forcing agents to re-evaluate based on peer feedback.

3. Final Judgment

Agent: JudgementAgent โ€“ Once the debate concludes (either via consensus or reaching the maximum iteration limit), the Judge reviews the entire conversation history. It weighs the final arguments from all specialists and delivers the authoritative PHISHING or LEGITIMATE verdict.

๐Ÿ“ Project Structure

cross-check/
โ”œโ”€โ”€ .env.example                     # API key template file
โ”œโ”€โ”€ .github/
โ”‚   โ””โ”€โ”€ workflows/
โ”‚       โ””โ”€โ”€ tests.yml                # CI test automation workflow
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ .python-version
โ”œโ”€โ”€ .vscode/
โ”‚   โ””โ”€โ”€ launch.json                  # VS Code debugger config
โ”œโ”€โ”€ CITATION.cff                     # Academic citation metadata
โ”œโ”€โ”€ Dockerfile                       # Container build instructions
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ Makefile                         # Project command shortcuts
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ app/
โ”‚   โ”œโ”€โ”€ config.py                    # Debug and logging settings
โ”‚   โ”œโ”€โ”€ events.py                    # UI event handlers
โ”‚   โ”œโ”€โ”€ main.py                      # Mesop app entry point
โ”‚   โ”œโ”€โ”€ state.py                     # UI state management
โ”‚   โ””โ”€โ”€ styles.py                    # Component styling rules
โ”œโ”€โ”€ engine/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ agent.py                     # Multi-agent pipeline definition
โ”‚   โ”œโ”€โ”€ config.yaml                  # Agent prompts and models
โ”‚   โ”œโ”€โ”€ interface.py                 # Runner and streaming API
โ”‚   โ”œโ”€โ”€ schemas.py                   # Pydantic output schemas
โ”‚   โ””โ”€โ”€ utils.py                     # URL fetching and parsing
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ invalid.mov                  # Invalid URL demo video
โ”‚   โ”œโ”€โ”€ legitimate.mov               # Legitimate site demo video
โ”‚   โ”œโ”€โ”€ phishing.mov                 # Phishing detection demo video
โ”‚   โ”œโ”€โ”€ rate-limit.mov               # Rate limit demo video
โ”‚   โ””โ”€โ”€ workflow.svg                 # Architecture diagram
โ”œโ”€โ”€ eval/
โ”‚   โ”œโ”€โ”€ data/
โ”‚   โ”‚   โ”œโ”€โ”€ legitimate.evalset.json  # Legitimate eval dataset
โ”‚   โ”‚   โ”œโ”€โ”€ phishing.evalset.json    # Phishing eval dataset
โ”‚   โ”‚   โ””โ”€โ”€ test_config.json         # Evaluation config
โ”‚   โ””โ”€โ”€ test_eval.py                 # Agent evaluation tests
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_agents.py               # Agent unit tests
โ”‚   โ””โ”€โ”€ test_utils.py                # Utility function tests
โ””โ”€โ”€ uv.lock

๐Ÿงช Testing

Unit tests run automatically on every push via GitHub Actions. View the workflow status badge at the top of this README.

Viewing Coverage Reports from GitHub:

  1. Go to Actions โ†’ click on a workflow run
  2. Download the coverage-report artifact
  3. Extract and serve locally:
    cd coverage-report
    python -m http.server 8000
  4. Open http://localhost:8000/index.html in your browser

Full Coverage (including integration tests):

Integration tests require LLM API keys. Run locally with:

make coverage

๐Ÿ“š Reference

PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection
Wenhao Li, Selvakumar Manickam, Yung-Wey Chong, Shankar Karuppayah
arXiv:2506.15656 [cs.CR]

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

๐˜ˆ ๐˜”๐˜ถ๐˜ญ๐˜ต๐˜ช-๐˜ˆ๐˜จ๐˜ฆ๐˜ฏ๐˜ต ๐˜š๐˜บ๐˜ด๐˜ต๐˜ฆ๐˜ฎ ๐˜ง๐˜ฐ๐˜ณ ๐˜Š๐˜ณ๐˜ฐ๐˜ด๐˜ด-๐˜Š๐˜ฉ๐˜ฆ๐˜ค๐˜ฌ๐˜ช๐˜ฏ๐˜จ ๐˜—๐˜ฉ๐˜ช๐˜ด๐˜ฉ๐˜ช๐˜ฏ๐˜จ ๐˜œ๐˜™๐˜“๐˜ด.

Topics

Resources

License

Stars

Watchers

Forks