A local lead-intelligence app for CSV ingestion, embedding, clustering, t-SNE projection, and semantic search.
- Backend: FastAPI, SQLAlchemy, PostgreSQL + pgvector, Gemini APIs
- Frontend: Next.js, React Query, Plotly
- Infra: Docker Compose (Postgres)
- Docker Desktop (running)
- Python 3.12+
- Node.js 18+
- npm
- Copy env files:
cp backend/.env.example backend/.env
cp frontend/.env.local.example frontend/.env.local- Edit
backend/.envand set:
GOOGLE_AI_API_KEY(required)DATABASE_URL(default works with this repo)
- Start everything:
./start.sh- Open:
- Frontend:
http://localhost:3000 - Backend health:
http://localhost:8000/health
Stop with Ctrl+C.
- Starts PostgreSQL (
docker compose up -d db) - Waits for DB process and auth readiness
- Applies
schema.sql - Installs backend/frontend dependencies if needed
- Starts backend on
:8000and frontend on:3000
POST /ingestwith no file usesassets/Leads.csv- New ingest replaces existing data (
TRUNCATE leads RESTART IDENTITY) max_rowsdefault is50
GET /health
POST /ingestGET /ingest/{job_id}/statusPOST /ingest/{job_id}/cancelPOST /ingest/{job_id}/retryGET /ingest/{job_id}/validation-report.csv
Input:
file(optional.csv, multipart)max_rows(optional,>=1, form or query)
Job statuses:
queued,running,completed,completed_with_errors,failed,cancelled
POST /cluster
Important controls (all optional, defaults applied):
algorithm:kmeans,mini_batch_kmeans,agglomerative,dbscan,optics,birch,gaussian_mixture,hdbscandistance_metric:euclidean,manhattan,cosine(algorithm-dependent)n_clusters,auto_tune_k,k_min,k_max,auto_tune_objectiveoutlier_policy:keep,drop,nearestnormalize_embeddings,pca_components,random_state,lock_random_state- Density/hierarchical/mixture-specific params:
min_cluster_size,min_samples,eps,max_eps,linkage,covariance_type,birch_threshold,birch_branching_factor
GET /tsne
Query params:
recompute(defaultfalse)perplexity(used when recomputing)n_iter(used when recomputing)
GET /search
Query params:
q(required)limit(default10)threshold(default0.3)
# Ingest default assets/Leads.csv
curl -X POST http://localhost:8000/ingest# Ingest uploaded CSV
curl -X POST http://localhost:8000/ingest \
-F "file=@/absolute/path/to/leads.csv" \
-F "max_rows=50"# Cluster
curl -X POST http://localhost:8000/cluster \
-H "Content-Type: application/json" \
-d '{"algorithm":"kmeans","n_clusters":8}'# Recompute t-SNE
curl "http://localhost:8000/tsne?recompute=true&perplexity=30&n_iter=1000"# Semantic search
curl "http://localhost:8000/search?q=enterprise%20lead&limit=10&threshold=0.3"Frontend defaults to http://localhost:8000.
Override via frontend/.env.local:
NEXT_PUBLIC_API_BASE_URL=http://your-host:8000cd frontend
npm run generate:typesfailed to connect to the docker API at unix:///var/run/docker.sock
- Start Docker Desktop
- Confirm with
docker ps - Re-run
./start.sh
Examples:
role "postgres" does not existCould not authenticate with DATABASE_URL user ...
Fix:
docker compose down -v
./start.shThen keep backend/.env aligned with:
DATABASE_URL=postgresql+asyncpg://postgres:password@127.0.0.1:5433/leads_db- Re-apply schema:
cat schema.sql | docker compose exec -T db psql -U postgres -d leads_db- Set valid
GOOGLE_AI_API_KEY - Ensure
GOOGLE_TEXT_MODELis available to your key - Restart backend