Skip to content

praveen4101/lexiscan-auto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LexiScan Auto: Intelligent Document Processing LexiScan Auto is an AI-powered tool designed to extract key entities and information from legal contracts and documents using Natural Language Processing (NLP) and OCR.

🚀 Features OCR Integration: Uses Tesseract OCR to read text from images and scanned PDFs.

NLP Extraction: Extracts legal entities, dates, and contract terms using SpaCy.

Flask API: A lightweight backend to handle document uploads and processing.

Docker Ready: Includes a Dockerfile for containerized deployment.

🛠️ Tech Stack Language: Python 3.x

Framework: Flask

Libraries: SpaCy, PyTesseract, PDF2Image, Flask-CORS

Tools: Tesseract OCR engine

📥 Installation & Setup

  1. Manual Execution (Recommended for Low Disk Space) If you are running the project without Docker:

Bash

Install dependencies

pip install flask flask-cors pytesseract pdf2image spacy

Run the API

cd api python app.py 2. Docker Execution To build and run using Docker:

Bash docker build -t lexiscan-auto . docker run -p 5000:5000 lexiscan-auto 📋 API Endpoints GET / - Health check to see if the API is running.

POST /upload - Upload a document (PDF/Image) to extract entities.

📸 Project Status The API is currently active and running on http://127.0.0.1:5000.

About

An AI-powered Intelligent Document Processing (IDP) system that automates entity extraction from legal contracts using OCR (Tesseract) and NLP (SpaCy).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors