Architecture

🧩 System Architecture

Back to Home | Setup-Guide | API-Reference

🗺️ High-Level Architecture

The platform follows a 3-tier architecture with a clear separation between the AI/data layer, the API layer, and the presentation layer.

┌────────────────────────────────────────────────────┐
│              PRESENTATION LAYER                  │
│   Next.js 14 Dashboard (localhost:3000)          │
│   ├── Recharts (2D graphs: traffic, anomalies)  │
│   ├── 3D Threat Globe (React Three Fiber)        │
│   └── 3D Network Topology (nodes & edges)        │
└─────────────────────────▲─────────────────────────┘
                         │ REST API (JSON)
┌─────────────────────────▼─────────────────────────┐
│               API LAYER                          │
│   FastAPI Backend (localhost:8000)               │
│   ├── GET  /          → Health Check              │
│   ├── POST /logs/     → Log Ingestion             │
│   ├── GET  /logs/     → Log Retrieval + Pagination │
│   └── POST /predict/  → AI Anomaly Score           │
└─────────────────────────┬─────────────────────────┘
                         │
              ┌─────────┼─────────┐
              ▼                  ▼
┌─────────────────┐  ┌─────────────────┐
│   DATA LAYER      │  │    AI LAYER        │
│ SQLite (via ORM) │  │ Isolation Forest   │
│ SQLAlchemy models│  │ (.pkl model file)  │
│ Log records      │  │ Scikit-learn       │
└─────────────────┘  └─────────────────┘

🔄 Data Flow

Log Ingestion Flow

Client/Agent
    │
    │  POST /logs/  {source_ip, dest_ip, protocol, bytes, event_type, details}
    ▼
FastAPI Router
    │
    ├── Pydantic Schema Validation
    │
    ├── SQLAlchemy → SQLite (persist raw log)
    │
    └── Return saved log record (JSON)

Anomaly Detection Flow

Client
    │
    │  POST /predict/  {feature vector}
    ▼
FastAPI Router
    │
    ├── Load Isolation Forest model (.pkl)
    │
    ├── Feature extraction (NumPy/Pandas)
    │
    ├── model.predict() → anomaly score
    │
    └── Classify: Normal / Suspicious / Critical
    │
    └── Return {score, label, confidence}

🧠 AI Model: Isolation Forest

Why Isolation Forest?

Property	Benefit
Unsupervised	No labeled attack data needed
Handles high-dimensional data	Works with IPs, ports, bytes, timing
Scales well	Faster than LOF for large log volumes
Zero-day friendly	Detects unknown/novel attack patterns
Low false-positive rate	Tuned contamination parameter

How it Works

Training: train_model.py generates synthetic logs (generated_logs.csv) simulating both normal and anomalous traffic patterns
Feature Engineering: Numeric features (bytes transferred, port numbers, protocol encoding) are extracted
Model Fitting: IsolationForest(contamination=0.05) is trained on the dataset
Serialization: Model saved to ai-model/isolation_forest_model.pkl via joblib
Inference: On each /predict/ call, the model scores the input and returns a classification

Threat Classification Thresholds

Score Range	Classification	Action
score > -0.1	Normal	Log and continue
-0.3 < score ≤ -0.1	Suspicious	Flag for review
score ≤ -0.3	Critical	Immediate alert

🖿️ Database Schema

Log Entry Table

CREATE TABLE logs (
    id               INTEGER PRIMARY KEY AUTOINCREMENT,
    source_ip        VARCHAR NOT NULL,
    destination_ip   VARCHAR NOT NULL,
    protocol         VARCHAR NOT NULL,
    bytes_transferred INTEGER NOT NULL,
    event_type       VARCHAR NOT NULL,   -- 'normal' | 'suspicious' | 'critical'
    details          TEXT,
    timestamp        DATETIME DEFAULT CURRENT_TIMESTAMP
);

Scalability Note

SQLite is used for local development. For production, replace the DATABASE_URL with a PostgreSQL connection string — SQLAlchemy handles the transition seamlessly.

🌐 Frontend Components

Component	Technology	Purpose
Traffic Charts	Recharts (LineChart, BarChart)	Visualize log volume and traffic over time
Threat Pie Chart	Recharts (PieChart)	Distribution of Normal / Suspicious / Critical
3D Threat Globe	React Three Fiber + drei	Global geographic threat origin map
Network Topology	React Three Fiber	Real-time node-edge graph of connections
Log Table	Next.js + Tailwind	Paginated, searchable raw log viewer
Alert Banner	Lucide + Tailwind	Live critical event notifications

🔄 CI/CD Pipeline

# .github/workflows/ci.yml
Trigger: push / pull_request to main

Steps:
  1. Checkout code
  2. Set up Python 3.10
  3. Install backend dependencies (pip install -r requirements.txt)
  4. Set PYTHONPATH=. (for backend module resolution)
  5. Run pytest (backend/tests/)
  6. Report test results

🚀 Future Architecture Extensions

WebSockets: Replace REST polling with ws:// streams for real-time push alerts
Celery + Redis: Async task queue for background model retraining
Kafka / RabbitMQ: Message broker for high-throughput log ingestion
Docker Compose: Orchestrate backend, frontend, and DB as containers
Autoencoder Model: Deep learning replacement for Isolation Forest for richer embeddings
PostgreSQL: Production-grade database with full-text search

Back to Home | Next: Setup-Guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

🧩 System Architecture

🗺️ High-Level Architecture

🔄 Data Flow

Log Ingestion Flow

Anomaly Detection Flow

🧠 AI Model: Isolation Forest

Why Isolation Forest?

How it Works

Threat Classification Thresholds

🖿️ Database Schema

Log Entry Table

Scalability Note

🌐 Frontend Components

🔄 CI/CD Pipeline

🚀 Future Architecture Extensions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally