NepSense is a pioneering NLP tool designed to detect profanity, offensiveness, and speaker gender in Nepali text. By synthesizing advanced Deep Learning with Nepali linguistic context, NepSense bridges the critical gap between users seeking safe online spaces and platforms needing content moderation.
Offensive and profane content has been on the rise in Nepali Social Media, which is very disturbing to users. This is partly due to the absence of proper tools and mechanisms for the Nepali language to deal with profanity and offensive texts. In this work, we develop a Bi-LSTM (Bidirectional Long Short Term Memory) based model for the classification of Profane and Offensive comments.
Leveraging Multilingual BERT embeddings and custom vocabulary embeddings, NepSense captures contextual meaning beyond simple keyword matching. While previous related studies in the Nepali language are more focused on sentiment and offensiveness detection only, our study explores profanity and offensiveness detection as two distinct tasks.
๐ Based on peer-reviewed research published at ICON 2024 (21st International Conference on Natural Language Processing)
- Profanity Detection: Identifies vulgar, obscene, and swear words in Nepali text.
- Offensiveness Detection: Detects disrespectful, insulting, or harmful content.
- Gender Identification: Predicts the gender of the speaker based on text patterns.
- Devanagari Script: Native Nepali text (เคจเฅเคชเคพเคฒเฅ)
- Romanized Nepali: Latin script transliteration (Nepali โ automatically converted)
- Automatic Transliteration: AI4Bharat XlitEngine converts Romanized input to Devanagari
- Bi-LSTM Models: Capture bidirectional context for accurate classification
- BERT Embeddings: Multilingual BERT (
bert-base-multilingual-cased) for semantic understanding - Multi-Output Model: Single model predicting both profanity levels and speaker gender
- FastAPI Backend: High-performance async API with automatic docs
- Rate Limiting: Prevent abuse with configurable request limits
- Feedback Loop: Users can submit corrections to improve models
- Three.js Animations: Immersive 3D background effects
- Glassmorphism Design: Modern, elegant UI with blur effects
- Framer Motion: Smooth micro-animations and transitions
| Category | Technology |
|---|---|
| Framework | Next.js 16 (App Router) |
| Language | TypeScript (Strict type safety) |
| UI Library | React 19 |
| Styling | TailwindCSS v4 |
| Animations | Framer Motion, Three.js |
| Icons | Lucide React |
| HTTP Client | Axios |
| Category | Technology |
|---|---|
| Framework | FastAPI (Python 3.9+) |
| Server | Uvicorn (ASGI) |
| Validation | Pydantic v2 |
| Serialization | Joblib |
| Category | Technology |
|---|---|
| Deep Learning | TensorFlow/Keras |
| Transformers | PyTorch + HuggingFace Transformers |
| BERT Model | bert-base-multilingual-cased |
| Transliteration | AI4Bharat XlitEngine |
| Language Detection | langdetect |
| Data Processing | NumPy, scikit-learn |
Profanity-And-Offensiveness-Detection-Nepali/
โโโ backend/ # Python FastAPI Application
โ โโโ app/ # Core Application
โ โ โโโ core/ # Configuration & Settings
โ โ โโโ main.py # FastAPI App & Routes
โ โ โโโ models.py # Model Manager & Inference
โ โ โโโ schemas.py # Pydantic Models
โ โ โโโ rate_limit.py # Rate Limiting Logic
โ โโโ pkl/ # Trained Models & Tokenizers
โ โ โโโ Binomial_LSTM_Profane.h5
โ โ โโโ Binomial_LSTM_Offensive.keras
โ โ โโโ Mutlilabel_LSTM_Offensive_Profane.h5
โ โ โโโ Multi_Model_Multi_Output.h5
โ โโโ run_backend.py # Server Entry Point
โ โโโ requirements.txt # Python Dependencies
โ โโโ Dockerfile # Container Configuration
โโโ frontend/ # Next.js TypeScript Application
โ โโโ src/
โ โ โโโ app/ # App Router Pages
โ โ โ โโโ page.tsx # Landing Page
โ โ โ โโโ demo/ # Demo Interface
โ โ โ โโโ examples/ # Example Texts
โ โ โ โโโ authors/ # Research Team
โ โ โโโ components/ # Reusable UI Components
โ โ โ โโโ navbar.tsx
โ โ โ โโโ three-background.tsx
โ โ โ โโโ ui/ # Design System
โ โ โโโ lib/ # Utilities
โ โโโ package.json # Node Dependencies
โ โโโ next.config.ts # Next.js Configuration
โโโ image.png # Demo Screenshot
โโโ README.md # You are here
| Model | File | Size | Accuracy | Description |
|---|---|---|---|---|
| Profane Binary | Binomial_LSTM_Profane.h5 |
22 MB | 87.8% | Binary: Profane vs Non-Profane |
| Offensive Binary | Binomial_LSTM_Offensive.keras |
8 MB | 85%+ | Binary: Offensive vs Non-Offensive |
| Multilabel | Mutlilabel_LSTM_Offensive_Profane.h5 |
42 MB | 83%+ | 3-Class: Clean/Offensive/Profane |
| Multi-Output | Multi_Model_Multi_Output.h5 |
96 MB | - | Gender + Profanity combined |
- Max Sequence Length: 500 tokens
- Padding Strategy: Post-padding
- Tokenizers: Keras Tokenizer (saved as
.pkl) - BERT Model:
bert-base-multilingual-cased(768-dim embeddings) - N-gram Size: 2 (bigrams for BERT embeddings)
| Requirement | Version |
|---|---|
| Node.js | v18+ (LTS recommended) |
| Python | v3.9+ |
| Git | Latest |
git clone https://github.com/Tri-Yantra-Technologies/Profanity-And-Offensiveness-Detection-Nepali.git
cd Profanity-And-Offensiveness-Detection-Nepalicd backend
# Create Virtual Environment
python -m venv venv
# Activate Virtual Environment
# Windows:
.\venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate
# Install Dependencies
pip install -r requirements.txt
# Run Server
python run_backend.pyโ
Backend running at: http://localhost:8000
๐ API Docs: http://localhost:8000/api/docs
# Open new terminal
cd frontend
# Install Dependencies
npm install
# Run Development Server
npm run devโ
Frontend running at: http://localhost:3000
Create .env files:
Backend (backend/.env):
FRONTEND_ORIGIN=http://localhost:3000
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_WINDOW=60Frontend (frontend/.env):
NEXT_PUBLIC_API_URL=http://localhost:8000| Method | Endpoint | Description |
|---|---|---|
POST |
/predict |
Analyze text for profanity/offensiveness |
POST |
/predict/gender |
Predict speaker gender |
POST |
/analyze |
Run all models simultaneously |
GET |
/health |
Health check |
GET |
/models |
List available models |
GET |
/meta |
API metadata |
POST |
/feedback |
Submit correction feedback |
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"text": "เคคเคฟเคฎเฅเคฒเคพเค เคฏเฅ เคเฅเคฐเคพ เคฅเคพเคนเคพ เค?", "model_type": "profane_binary"}'{
"profanity": {
"label": "Non-Profane",
"confidence": 0.9823
},
"offensiveness": {
"label": "Non-Offensive",
"confidence": 0.9456
},
"latency_ms": 42.5,
"model_used": "profane_binary"
}Profanity and Offensiveness Detection in Nepali Language Using Bi-directional LSTM Models
Abiral Adhikari, Prashant Manandhar, Reewaj Khanal, Samir Wagle, Praveen Acharya, Bal Krishna Bal
Proceedings of the 21st International Conference on Natural Language Processing (ICON) December 2024 โข Chennai, India โข NLP Association of India (NLPAI)
๐ Read Full Paper โ
@inproceedings{adhikari-etal-2024-profanity,
title = "Profanity and Offensiveness Detection in {N}epali Language Using Bi-directional {LSTM} Models",
author = "Adhikari, Abiral and Manandhar, Prashant and Khanal, Reewaj and Wagle, Samir and Acharya, Praveen and Bal, Bal Krishna",
editor = "Lalitha Devi, Sobha and Arora, Karunesh",
booktitle = "Proceedings of the 21st International Conference on Natural Language Processing (ICON)",
month = dec,
year = "2024",
address = "AU-KBC Research Centre, Chennai, India",
publisher = "NLP Association of India (NLPAI)",
url = "https://aclanthology.org/2024.icon-1.60/",
pages = "515--521"
}| Name | Profile |
|---|---|
| Abiral Adhikari | ResearchGate |
| Prashant Manandhar | Website |
| Reewaj Khanal | Website |
| Samir Wagle | Website |
| Praveen Acharya | ACL Anthology |
| Bal Krishna Bal | ACL Anthology |
Contributions are welcome! Please follow the standard "Fork-and-Pull" workflow:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Ensure you lint your code and test thoroughly before submitting a PR.
This project is licensed under the MIT License โ free for personal, academic, and commercial use.
- AI4Bharat โ Transliteration engine
- Hugging Face โ BERT model hosting
- NLP Association of India โ ICON 2024 organizers
Empowering Safer Nepali Digital Spaces ๐ณ๐ต
