Skip to content
View nishitbohra's full-sized avatar

Highlights

  • Pro

Block or report nishitbohra

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nishitbohra/README.md

Nishit Bohra M

AI/ML Engineer  ·  RAG Systems  ·  Deep Learning Research

     

M.Tech AI/ML · Symbiosis Institute of Technology, Pune
Production AI @ John Deere  ·  Q1 Springer Author  ·  Gold Medalist B.Tech (Rank 1, CGPA 9.37)


About

I build AI systems that work in production - not just in notebooks.

At John Deere, I engineered an enterprise RAG system over 60,000+ technical manuals, saving 300+ engineering hours across 5 teams and cutting LLM inference costs by 20%. I extended it with a CLIP-based multimodal pipeline for diagram and image retrieval.

Simultaneously, I published a Q1 Springer journal paper on heterogeneous graph neural networks for wind power forecasting across 200+ turbines - research that directly addresses grid stability and renewable energy integration.

My work spans: large-scale retrieval systems · multimodal ML · GNN-based spatio-temporal modeling · medical image segmentation · low-resource NLP.

Currently open to AI/ML Engineering roles in Pune (Baner · Balewadi · Hinjewadi) - available 2026.


Technical Skills

Domain Stack
ML & Deep Learning PyTorch · TensorFlow · Scikit-learn · Hugging Face · Keras
RAG & LLMs LangChain · FAISS · OpenSearch · OpenAI API · Prompt Engineering · Embedding Generation
Computer Vision OpenCV · YOLOv8 · U-Net · CLIP · Multimodal ML · Medical Imaging
NLP Transformers · XLM-RoBERTa · GNNs · Sentiment Analysis · Cross-lingual Transfer
Infrastructure & MLOps Databricks · FastAPI · Flask · Streamlit · Google Cloud · MLflow · REST APIs
Languages Python · SQL · R

Featured Projects

🔍 Enterprise RAG System - John Deere (Production · 2025)

Semantic search and retrieval over 60K+ engineering manuals for cross-functional teams

Problem: Engineers manually searched 60K+ technical PDFs with no unified interface - creating bottlenecks across 5 teams and costing hundreds of hours per quarter.

Approach: Built a hybrid retrieval pipeline - Hugging Face embeddings → dual-index (FAISS dense + OpenSearch sparse) → LLM reranking → response generation. Extended with a CLIP-based multimodal pipeline for cross-modal text-image search over diagrams and schematics.

Result: 300+ engineering hours saved · 20% LLM cost reduction via response caching · ~30% faster retrieval · 100K+ documents processed on Databricks

PyTorch LangChain FAISS OpenSearch Databricks OpenAI Hugging Face


🌊 GangaFlow - River Pollution Detection (IEEE ICoICC 2025)

Real-time aerial pollution detection from drone imagery using a unified multi-model CV pipeline

Problem: Manual Ganga river monitoring is expensive, inconsistent, and cannot scale. Drone surveys produce terabytes of imagery with no automated analysis pipeline.

Approach: Unified detect-then-segment architecture - YOLOv8 for object detection → U-Net for spatial segmentation → multi-class severity classification → automated report generation.

Result: >90% detection accuracy · ~20% improvement in localization precision · Published at IEEE ICoICC 2025

Python YOLOv8 U-Net AlexNet OpenCV


⚡ Heterogeneous Graph-KAN - Wind Power Forecasting (Q1 Springer · 2026)

Spatio-temporal forecasting across 200+ wind turbines using a novel GNN + KAN architecture

Problem: Standard time-series models ignore spatial dependencies - wake effects, proximity correlations, turbine interactions - that are critical for accurate farm-level forecasting.

Approach: Heterogeneous graph construction (wake, proximity, correlation edges) → GNN spatial encoding → Kolmogorov-Arnold Networks for interpretable temporal modeling → multi-horizon prediction.

Result: Outperforms standard GNN baselines · Published in Smart Grids and Sustainable Energy, Springer (Q1)

PyTorch GNNs KAN Attention Mechanisms Time-Series Forecasting


💬 Marathi Sentiment Analysis - Hybrid XLM-R + CNN

High-accuracy sentiment classification for a morphologically complex low-resource language (83M speakers)

Problem: Standard transformers underperform on Marathi due to subword tokenization mismatch for morphologically rich language structure.

Approach: Hybrid architecture - XLM-RoBERTa fine-tuning (cross-lingual transfer from 100 languages) + CNN feature extraction → feature fusion for both contextual and local representations.

Result: >80% accuracy · +3–5% F1-score improvement over transformer baseline · 60K+ sentence dataset

Hugging Face XLM-RoBERTa PyTorch CNN Scikit-learn


🔧 Mild Steel Degradation Detection (First Runner-up · Ninja Hack 2K25)

Automated corrosion detection and severity classification for industrial quality control

Problem: Manual inspection of mild steel is subjective and misses early-stage corrosion - a safety and cost risk at scale.

Approach: Multi-stage CV pipeline - CLAHE preprocessing → SSIM-based structural analysis → adaptive thresholding → morphological operations → CNN classification → automated Streamlit report.

Result: >90% classification accuracy · ~40% reduction in manual inspection effort

Python OpenCV SSIM CLAHE Streamlit


Research & Publications

Venue Title Year
Q1 · Springer Heterogeneous Graph Kolmogorov-Arnold Networks for Spatio-Temporal Wind Power Forecasting - Smart Grids and Sustainable Energy 2026
IEEE GangaFlow: A Multi-Model Deep Learning Framework for Real-Time River Pollution Detection Using Drone Imagery - ICoICC 2025 2025
Scopus Natural Language Processing, Large Language Models, and Multimodal AI Systems - Wiley-Scrivener 2025
Scopus + WoS Intelligent Horizons: AI and the Evolution to 6G Networks - River Publishers 2025
Patent Filed LEARN-UP: An Interactive Game-Based Learning Application - App No. 202441042008 A 2024

📎 View all on Google Scholar →


GitHub Stats

GitHub Stats

Top Languages


Portfolio  ·  LinkedIn  ·  Google Scholar  ·  Email

Open to AI/ML Engineering roles · Pune · Available 2026

Pinned Loading

  1. Saara--The-digital-Assistant Saara--The-digital-Assistant Public

    This project includes some facilities of users and there computer systems to operate as per the command given to system by the users. This software recognizes the voice of the user and acts accordi…

    Python

  2. Resume_Parser Resume_Parser Public

    A resume parser is an AI-powered tool that extracts key information from resumes, simplifying the recruitment process by quickly identifying top candidates and organizing data into a usable format.

    Jupyter Notebook

  3. Eye-Detection-System Eye-Detection-System Public

    An eye detection system using Python is a computer vision application that uses image processing techniques to detect eyes in an image or video stream. The system uses a pre-trained classifier to i…

    Jupyter Notebook 3