Skip to content

Tri-Yantra-Technologies/Profanity-And-Offensiveness-Detection-Nepali

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

16 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

NepSense - Profanity & Offensiveness Detection for Nepali Language ๐Ÿ‡ณ๐Ÿ‡ต๐Ÿ›ก๏ธ

License Status Version

NepSense Demo Interface


๐Ÿ“„ Abstract

NepSense is a pioneering NLP tool designed to detect profanity, offensiveness, and speaker gender in Nepali text. By synthesizing advanced Deep Learning with Nepali linguistic context, NepSense bridges the critical gap between users seeking safe online spaces and platforms needing content moderation.

Offensive and profane content has been on the rise in Nepali Social Media, which is very disturbing to users. This is partly due to the absence of proper tools and mechanisms for the Nepali language to deal with profanity and offensive texts. In this work, we develop a Bi-LSTM (Bidirectional Long Short Term Memory) based model for the classification of Profane and Offensive comments.

Leveraging Multilingual BERT embeddings and custom vocabulary embeddings, NepSense captures contextual meaning beyond simple keyword matching. While previous related studies in the Nepali language are more focused on sentiment and offensiveness detection only, our study explores profanity and offensiveness detection as two distinct tasks.

๐Ÿ“– Based on peer-reviewed research published at ICON 2024 (21st International Conference on Natural Language Processing)

๐Ÿ“„ Read the Full Paper on ACL Anthology โ†’


๐Ÿš€ Key Features

1. ๐Ÿ›ก๏ธ Triple Detection System

  • Profanity Detection: Identifies vulgar, obscene, and swear words in Nepali text.
  • Offensiveness Detection: Detects disrespectful, insulting, or harmful content.
  • Gender Identification: Predicts the gender of the speaker based on text patterns.

2. ๐Ÿ”ค Hybrid Input Support

  • Devanagari Script: Native Nepali text (เคจเฅ‡เคชเคพเคฒเฅ€)
  • Romanized Nepali: Latin script transliteration (Nepali โ†’ automatically converted)
  • Automatic Transliteration: AI4Bharat XlitEngine converts Romanized input to Devanagari

3. ๐Ÿงฌ State-of-the-Art Architecture

  • Bi-LSTM Models: Capture bidirectional context for accurate classification
  • BERT Embeddings: Multilingual BERT (bert-base-multilingual-cased) for semantic understanding
  • Multi-Output Model: Single model predicting both profanity levels and speaker gender

4. โšก Production-Ready API

  • FastAPI Backend: High-performance async API with automatic docs
  • Rate Limiting: Prevent abuse with configurable request limits
  • Feedback Loop: Users can submit corrections to improve models

5. ๐ŸŽจ Premium Web Interface

  • Three.js Animations: Immersive 3D background effects
  • Glassmorphism Design: Modern, elegant UI with blur effects
  • Framer Motion: Smooth micro-animations and transitions

๐Ÿ›  Technology Stack

Frontend (Client-Side)

Category Technology
Framework Next.js 16 (App Router)
Language TypeScript (Strict type safety)
UI Library React 19
Styling TailwindCSS v4
Animations Framer Motion, Three.js
Icons Lucide React
HTTP Client Axios

Backend (Server-Side)

Category Technology
Framework FastAPI (Python 3.9+)
Server Uvicorn (ASGI)
Validation Pydantic v2
Serialization Joblib

Artificial Intelligence & ML

Category Technology
Deep Learning TensorFlow/Keras
Transformers PyTorch + HuggingFace Transformers
BERT Model bert-base-multilingual-cased
Transliteration AI4Bharat XlitEngine
Language Detection langdetect
Data Processing NumPy, scikit-learn

๐Ÿ“‚ Project Structure

Profanity-And-Offensiveness-Detection-Nepali/
โ”œโ”€โ”€ backend/                    # Python FastAPI Application
โ”‚   โ”œโ”€โ”€ app/                    # Core Application
โ”‚   โ”‚   โ”œโ”€โ”€ core/               # Configuration & Settings
โ”‚   โ”‚   โ”œโ”€โ”€ main.py             # FastAPI App & Routes
โ”‚   โ”‚   โ”œโ”€โ”€ models.py           # Model Manager & Inference
โ”‚   โ”‚   โ”œโ”€โ”€ schemas.py          # Pydantic Models
โ”‚   โ”‚   โ””โ”€โ”€ rate_limit.py       # Rate Limiting Logic
โ”‚   โ”œโ”€โ”€ pkl/                    # Trained Models & Tokenizers
โ”‚   โ”‚   โ”œโ”€โ”€ Binomial_LSTM_Profane.h5
โ”‚   โ”‚   โ”œโ”€โ”€ Binomial_LSTM_Offensive.keras
โ”‚   โ”‚   โ”œโ”€โ”€ Mutlilabel_LSTM_Offensive_Profane.h5
โ”‚   โ”‚   โ””โ”€โ”€ Multi_Model_Multi_Output.h5
โ”‚   โ”œโ”€โ”€ run_backend.py          # Server Entry Point
โ”‚   โ”œโ”€โ”€ requirements.txt        # Python Dependencies
โ”‚   โ””โ”€โ”€ Dockerfile              # Container Configuration
โ”œโ”€โ”€ frontend/                   # Next.js TypeScript Application
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ app/                # App Router Pages
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ page.tsx        # Landing Page
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ demo/           # Demo Interface
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ examples/       # Example Texts
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ authors/        # Research Team
โ”‚   โ”‚   โ”œโ”€โ”€ components/         # Reusable UI Components
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ navbar.tsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ three-background.tsx
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ui/             # Design System
โ”‚   โ”‚   โ””โ”€โ”€ lib/                # Utilities
โ”‚   โ”œโ”€โ”€ package.json            # Node Dependencies
โ”‚   โ””โ”€โ”€ next.config.ts          # Next.js Configuration
โ”œโ”€โ”€ image.png                   # Demo Screenshot
โ””โ”€โ”€ README.md                   # You are here

๐Ÿง  Model Information

Available Models

Model File Size Accuracy Description
Profane Binary Binomial_LSTM_Profane.h5 22 MB 87.8% Binary: Profane vs Non-Profane
Offensive Binary Binomial_LSTM_Offensive.keras 8 MB 85%+ Binary: Offensive vs Non-Offensive
Multilabel Mutlilabel_LSTM_Offensive_Profane.h5 42 MB 83%+ 3-Class: Clean/Offensive/Profane
Multi-Output Multi_Model_Multi_Output.h5 96 MB - Gender + Profanity combined

Technical Specifications

  • Max Sequence Length: 500 tokens
  • Padding Strategy: Post-padding
  • Tokenizers: Keras Tokenizer (saved as .pkl)
  • BERT Model: bert-base-multilingual-cased (768-dim embeddings)
  • N-gram Size: 2 (bigrams for BERT embeddings)

โšก Quick Start Guide

Prerequisites

Requirement Version
Node.js v18+ (LTS recommended)
Python v3.9+
Git Latest

1. Clone Repository

git clone https://github.com/Tri-Yantra-Technologies/Profanity-And-Offensiveness-Detection-Nepali.git
cd Profanity-And-Offensiveness-Detection-Nepali

2. Backend Setup

cd backend

# Create Virtual Environment
python -m venv venv

# Activate Virtual Environment
# Windows:
.\venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate

# Install Dependencies
pip install -r requirements.txt

# Run Server
python run_backend.py

โœ… Backend running at: http://localhost:8000

๐Ÿ“š API Docs: http://localhost:8000/api/docs

3. Frontend Setup

# Open new terminal
cd frontend

# Install Dependencies
npm install

# Run Development Server
npm run dev

โœ… Frontend running at: http://localhost:3000

4. Environment Variables

Create .env files:

Backend (backend/.env):

FRONTEND_ORIGIN=http://localhost:3000
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_WINDOW=60

Frontend (frontend/.env):

NEXT_PUBLIC_API_URL=http://localhost:8000

๐Ÿ“š API Reference

Core Endpoints

Method Endpoint Description
POST /predict Analyze text for profanity/offensiveness
POST /predict/gender Predict speaker gender
POST /analyze Run all models simultaneously
GET /health Health check
GET /models List available models
GET /meta API metadata
POST /feedback Submit correction feedback

Example Request

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "เคคเคฟเคฎเฅ€เคฒเคพเคˆ เคฏเฅ‹ เค•เฅเคฐเคพ เคฅเคพเคนเคพ เค›?", "model_type": "profane_binary"}'

Example Response

{
  "profanity": {
    "label": "Non-Profane",
    "confidence": 0.9823
  },
  "offensiveness": {
    "label": "Non-Offensive",
    "confidence": 0.9456
  },
  "latency_ms": 42.5,
  "model_used": "profane_binary"
}

๐Ÿ“„ Research Paper

Profanity and Offensiveness Detection in Nepali Language Using Bi-directional LSTM Models

Abiral Adhikari, Prashant Manandhar, Reewaj Khanal, Samir Wagle, Praveen Acharya, Bal Krishna Bal

Proceedings of the 21st International Conference on Natural Language Processing (ICON) December 2024 โ€ข Chennai, India โ€ข NLP Association of India (NLPAI)

๐Ÿ“– Read Full Paper โ†’

Citation (BibTeX)

@inproceedings{adhikari-etal-2024-profanity,
    title = "Profanity and Offensiveness Detection in {N}epali Language Using Bi-directional {LSTM} Models",
    author = "Adhikari, Abiral and Manandhar, Prashant and Khanal, Reewaj and Wagle, Samir and Acharya, Praveen and Bal, Bal Krishna",
    editor = "Lalitha Devi, Sobha and Arora, Karunesh",
    booktitle = "Proceedings of the 21st International Conference on Natural Language Processing (ICON)",
    month = dec,
    year = "2024",
    address = "AU-KBC Research Centre, Chennai, India",
    publisher = "NLP Association of India (NLPAI)",
    url = "https://aclanthology.org/2024.icon-1.60/",
    pages = "515--521"
}

๐Ÿ‘ฅ Research Team

Name Profile
Abiral Adhikari ResearchGate
Prashant Manandhar Website
Reewaj Khanal Website
Samir Wagle Website
Praveen Acharya ACL Anthology
Bal Krishna Bal ACL Anthology

๐Ÿค Contributing

Contributions are welcome! Please follow the standard "Fork-and-Pull" workflow:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Ensure you lint your code and test thoroughly before submitting a PR.


๐Ÿ“ License

This project is licensed under the MIT License โ€” free for personal, academic, and commercial use.


๐Ÿ™ Acknowledgments


Empowering Safer Nepali Digital Spaces ๐Ÿ‡ณ๐Ÿ‡ต

๐Ÿ“„ Paper โ€ข ๐Ÿ› Report Bug โ€ข ๐Ÿ’ก Request Feature

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors