LexiScan Auto: Intelligent Document Processing LexiScan Auto is an AI-powered tool designed to extract key entities and information from legal contracts and documents using Natural Language Processing (NLP) and OCR.
🚀 Features OCR Integration: Uses Tesseract OCR to read text from images and scanned PDFs.
NLP Extraction: Extracts legal entities, dates, and contract terms using SpaCy.
Flask API: A lightweight backend to handle document uploads and processing.
Docker Ready: Includes a Dockerfile for containerized deployment.
🛠️ Tech Stack Language: Python 3.x
Framework: Flask
Libraries: SpaCy, PyTesseract, PDF2Image, Flask-CORS
Tools: Tesseract OCR engine
📥 Installation & Setup
- Manual Execution (Recommended for Low Disk Space) If you are running the project without Docker:
Bash
pip install flask flask-cors pytesseract pdf2image spacy
cd api python app.py 2. Docker Execution To build and run using Docker:
Bash docker build -t lexiscan-auto . docker run -p 5000:5000 lexiscan-auto 📋 API Endpoints GET / - Health check to see if the API is running.
POST /upload - Upload a document (PDF/Image) to extract entities.
📸 Project Status The API is currently active and running on http://127.0.0.1:5000.