Unicorn Intelligence System

An automated, AI-powered pipeline for tracking, analyzing, and generating narrative reports on unicorn companies.

🚀 Overview

The Unicorn Intelligence System is a production-hardened data pipeline designed to:

Scrape the latest unicorn company data from Crunchbase.
Maintain a persistent Excel database (unicorn_companies.xlsx) with intelligent incremental updates.
Generate comprehensive, narrative-driven company profiles using advanced AI models (OpenRouter, OpenAI, Gemini).
Enrich reports with real-time web data via Tavily and Serper to ensure freshness and accuracy for critical metrics like Valuation and Funding.
Validate all outputs with strict logic to prevent hallucinations and ensure data integrity.

🏗️ Architecture

The system operates in a modular pipeline:

Scraper Module (scraper.py): Fetch basic metadata (Company, Country, Valuation, Investors) and update the master Excel sheet.
Generator Module (ai_story_generator.py):
- Orchestrator: Reads from Excel, manages flow.
- AI Engine: Supports multiple providers (openai, openrouter, mock).
- Enrichment Layer:
  - Tavily: Fills missing narrative gaps ("Not mentioned" -> Search).
  - Serper: Verifies and updates numeric data (Valuation, Funding) with strict date-based logic.
- Validation: Enforces template structure and data completeness.

✨ Features

Multi-Model Support: Seamlessly switch between OpenAI (GPT-4o), Google Gemini, and OpenRouter models via .env.
Intelligent Freshness: Automatically verifies valuation and funding numbers against Google Search results (Serper) and updates them only if a newer date is confirmed.
Strict Validation: Rejects and regenerates reports that contain placeholders or miss required sections.
Cost-Safe Verification: Includes a verify pipeline mode that simulates the entire logic flow without incurring API costs.
Robust Error Handling: Automatic retries, rate limiting (10s delay for OpenAI), and graceful failure recovery.
Clean Output: Generates structured .txt reports in the stories/ directory.

🛠️ Installation

Prerequisites

Python 3.10+
An API Key for chosen AI provider (OpenAI, OpenRouter, or Gemini)
Tavily API Key (for enrichment)
Serper API Key (for freshness checks)

Setup

Clone the repository

git clone https://github.com/yourusername/unicorn-intelligence.git
cd unicorn-intelligence

Create a virtual environment

python -m venv venv
# Windows
.\venv\Scripts\activate
# Mac/Linux
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

Configure Environment Copy the example environment file and add your keys:

cp .env.example .env

edit .env:

# 1. Pipeline Mode
PIPELINE_MODE=execute # or verify (cost-free simulation)

# 2. AI Provider
AI_MODE=openrouter # or openai, gemini, mock

# 3. Keys
OPENROUTER_API_KEY=sk-or-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...

# 4. Enrichment (Tavily & Serper)
TAVILY_API_KEY_1=tvly-...
SERPER_API_KEY=...

# 5. Filters
TARGET_COUNTRY=India
MAX_COMPANIES=2

🏃 Usage

1. Scrape Data

Initialize or update the company database:

python scraper.py

Output: unicorn_companies.xlsx

2. Generate Reports

Run the main AI pipeline:

python main.py

Output: Structured text files in stories/ directory.

3. Verify Logic (Cost-Free)

Run a simulation to verify logic without API calls:

# In .env: PIPELINE_MODE=verify
python main.py

📂 Output Structure

unicorn/
├── stories/                 # Generated Reports
│   ├── Unacademy_openai_gpt-4o.txt
│   └── Razorpay_openai_gpt-4o_enriched.txt
├── unicorn_companies.xlsx   # Master Database
├── scraper.py               # Data Collection
├── main.py                  # Pipeline Entry Point
├── ai_story_generator.py    # Core Logic
└── requirements.txt         # Dependencies

🔒 Security Notes

API Keys: Never commit .env. It is added to .gitignore.
Validation: The system automatically masks API keys in logs (api_usage.log).
Sanitization: Verification mode uses sanitized mock templates to prevent data leaks or cost overruns during testing.

📜 License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
stories_india		stories_india
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai_story_generator.py		ai_story_generator.py
main.py		main.py
requirements.txt		requirements.txt
scheduler.py		scheduler.py
scraper.py		scraper.py
tavily_manager.py		tavily_manager.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unicorn Intelligence System

🚀 Overview

🏗️ Architecture

✨ Features

🛠️ Installation

Prerequisites

Setup

🏃 Usage

1. Scrape Data

2. Generate Reports

3. Verify Logic (Cost-Free)

📂 Output Structure

🔒 Security Notes

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unicorn Intelligence System

🚀 Overview

🏗️ Architecture

✨ Features

🛠️ Installation

Prerequisites

Setup

🏃 Usage

1. Scrape Data

2. Generate Reports

3. Verify Logic (Cost-Free)

📂 Output Structure

🔒 Security Notes

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages