Nishit Bohra nishitbohra

Nishit Bohra M

AI/ML Engineer · RAG Systems · Deep Learning Research

M.Tech AI/ML · Symbiosis Institute of Technology, Pune
Production AI @ John Deere · Q1 Springer Author · Gold Medalist B.Tech (Rank 1, CGPA 9.37)

About

I build AI systems that work in production - not just in notebooks.

At John Deere, I engineered an enterprise RAG system over 60,000+ technical manuals, saving 300+ engineering hours across 5 teams and cutting LLM inference costs by 20%. I extended it with a CLIP-based multimodal pipeline for diagram and image retrieval.

Simultaneously, I published a Q1 Springer journal paper on heterogeneous graph neural networks for wind power forecasting across 200+ turbines - research that directly addresses grid stability and renewable energy integration.

My work spans: large-scale retrieval systems · multimodal ML · GNN-based spatio-temporal modeling · medical image segmentation · low-resource NLP.

Currently open to AI/ML Engineering roles in Pune (Baner · Balewadi · Hinjewadi) - available 2026.

Technical Skills

Domain	Stack
ML & Deep Learning	PyTorch · TensorFlow · Scikit-learn · Hugging Face · Keras
RAG & LLMs	LangChain · FAISS · OpenSearch · OpenAI API · Prompt Engineering · Embedding Generation
Computer Vision	OpenCV · YOLOv8 · U-Net · CLIP · Multimodal ML · Medical Imaging
NLP	Transformers · XLM-RoBERTa · GNNs · Sentiment Analysis · Cross-lingual Transfer
Infrastructure & MLOps	Databricks · FastAPI · Flask · Streamlit · Google Cloud · MLflow · REST APIs
Languages	Python · SQL · R

Featured Projects

🔍 Enterprise RAG System - John Deere (Production · 2025)

Semantic search and retrieval over 60K+ engineering manuals for cross-functional teams

Problem: Engineers manually searched 60K+ technical PDFs with no unified interface - creating bottlenecks across 5 teams and costing hundreds of hours per quarter.

Approach: Built a hybrid retrieval pipeline - Hugging Face embeddings → dual-index (FAISS dense + OpenSearch sparse) → LLM reranking → response generation. Extended with a CLIP-based multimodal pipeline for cross-modal text-image search over diagrams and schematics.

Result: 300+ engineering hours saved · 20% LLM cost reduction via response caching · ~30% faster retrieval · 100K+ documents processed on Databricks

PyTorch LangChain FAISS OpenSearch Databricks OpenAI Hugging Face

🌊 GangaFlow - River Pollution Detection (IEEE ICoICC 2025)

Real-time aerial pollution detection from drone imagery using a unified multi-model CV pipeline

Problem: Manual Ganga river monitoring is expensive, inconsistent, and cannot scale. Drone surveys produce terabytes of imagery with no automated analysis pipeline.

Approach: Unified detect-then-segment architecture - YOLOv8 for object detection → U-Net for spatial segmentation → multi-class severity classification → automated report generation.

Result: >90% detection accuracy · ~20% improvement in localization precision · Published at IEEE ICoICC 2025

Python YOLOv8 U-Net AlexNet OpenCV

⚡ Heterogeneous Graph-KAN - Wind Power Forecasting (Q1 Springer · 2026)

Spatio-temporal forecasting across 200+ wind turbines using a novel GNN + KAN architecture

Problem: Standard time-series models ignore spatial dependencies - wake effects, proximity correlations, turbine interactions - that are critical for accurate farm-level forecasting.

Approach: Heterogeneous graph construction (wake, proximity, correlation edges) → GNN spatial encoding → Kolmogorov-Arnold Networks for interpretable temporal modeling → multi-horizon prediction.

Result: Outperforms standard GNN baselines · Published in Smart Grids and Sustainable Energy, Springer (Q1)

PyTorch GNNs KAN Attention Mechanisms Time-Series Forecasting

💬 Marathi Sentiment Analysis - Hybrid XLM-R + CNN

High-accuracy sentiment classification for a morphologically complex low-resource language (83M speakers)

Problem: Standard transformers underperform on Marathi due to subword tokenization mismatch for morphologically rich language structure.

Approach: Hybrid architecture - XLM-RoBERTa fine-tuning (cross-lingual transfer from 100 languages) + CNN feature extraction → feature fusion for both contextual and local representations.

Result: >80% accuracy · +3–5% F1-score improvement over transformer baseline · 60K+ sentence dataset

Hugging Face XLM-RoBERTa PyTorch CNN Scikit-learn

🔧 Mild Steel Degradation Detection (First Runner-up · Ninja Hack 2K25)

Automated corrosion detection and severity classification for industrial quality control

Problem: Manual inspection of mild steel is subjective and misses early-stage corrosion - a safety and cost risk at scale.

Approach: Multi-stage CV pipeline - CLAHE preprocessing → SSIM-based structural analysis → adaptive thresholding → morphological operations → CNN classification → automated Streamlit report.

Result: >90% classification accuracy · ~40% reduction in manual inspection effort

Python OpenCV SSIM CLAHE Streamlit

Research & Publications

Venue	Title	Year
Q1 · Springer	Heterogeneous Graph Kolmogorov-Arnold Networks for Spatio-Temporal Wind Power Forecasting - Smart Grids and Sustainable Energy	2026
IEEE	GangaFlow: A Multi-Model Deep Learning Framework for Real-Time River Pollution Detection Using Drone Imagery - ICoICC 2025	2025
Scopus	Natural Language Processing, Large Language Models, and Multimodal AI Systems - Wiley-Scrivener	2025
Scopus + WoS	Intelligent Horizons: AI and the Evolution to 6G Networks - River Publishers	2025
Patent Filed	LEARN-UP: An Interactive Game-Based Learning Application - App No. 202441042008 A	2024

📎 View all on Google Scholar →

GitHub Stats

Portfolio · LinkedIn · Google Scholar · Email

_{Open to AI/ML Engineering roles · Pune · Available 2026}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly