Skip to content

PrajwalAmte/AutoSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

AutoSQL

Self-optimizing SQL query pipeline. Inspired by karpathy/autoresearch.

LLM rewrites query → run it → measure (speed + correctness) → keep if better → repeat

One file. Zero fluff.


The Problem

Slow SQL queries are everywhere. Correlated subqueries, redundant scans, missing pre-aggregations — they kill app performance and nobody catches them until prod is on fire.

AutoSQL fixes this automatically:

  1. You give it a query
  2. It runs the query, records the baseline time and result hash
  3. An LLM rewrites the query trying to make it faster
  4. It runs the new query, checks the result is identical, measures the speedup
  5. Keeps the best version, feeds the history back into the next iteration
  6. Repeats until the budget is spent

Quickstart

# 1. Install dependencies
pip install openai            # covers Groq, OpenAI, Ollama, Together, OpenRouter
pip install anthropic         # only needed for Claude models

# 2. Set the API key for your chosen provider
export GROQ_API_KEY=gsk_...        # Groq  (free at https://console.groq.com)
export OPENAI_API_KEY=sk_...       # OpenAI
export ANTHROPIC_API_KEY=sk-ant-... # Anthropic
# Ollama needs no key — just have it running locally

# 3. Optimize your query
python AutoSQL.py --query slow.sql --db myapp.sqlite

# OpenAI
python AutoSQL.py --query slow.sql --db myapp.sqlite --model gpt-4o --provider openai

# Anthropic / Claude
python AutoSQL.py --query slow.sql --db myapp.sqlite --model claude-opus-4-5 --provider anthropic

# Ollama (local)
python AutoSQL.py --query slow.sql --db myapp.sqlite --model llama3 --provider ollama

# Any OpenAI-compatible endpoint
python AutoSQL.py --query slow.sql --db myapp.sqlite --model my-model --base-url http://localhost:8080/v1

# More iterations or a different Groq model
python AutoSQL.py --query "SELECT * FROM ..." --db myapp.sqlite --iterations 15
python AutoSQL.py --query slow.sql --db myapp.sqlite --model llama-3.1-8b-instant

Example output

──────────────────────────────────────────────────────────────
  AutoSQL — Self-optimizing Query Pipeline
  Model  : llama-3.3-70b-versatile
  Provider : groq
──────────────────────────────────────────────────────────────

  Measuring baseline … 1842.3 ms

  [01/10] Generating … ✓  312.4 ms  (5.89× faster)  ← NEW BEST
  [02/10] Generating … ✓  298.1 ms  (6.18× faster)  ← NEW BEST
  [03/10] Generating … ✗  wrong result
  [04/10] Generating … ✓  291.7 ms  (6.32× faster)  ← NEW BEST
  ...

──────────────────────────────────────────────────────────────
  Final Report
──────────────────────────────────────────────────────────────
  Baseline : 1842.3 ms
  Best     : 291.7 ms  (6.32× faster)

What it optimizes

AutoSQL tackles the classic anti-patterns that kill query performance:

Anti-pattern What AutoSQL does
Correlated subquery in WHERE Replaces with pre-aggregated JOIN / CTE
Correlated subquery in SELECT Replaces with pre-aggregated JOIN
Multiple passes over the same table Collapses into a single scan

Correctness guarantee

AutoSQL never accepts a faster query that returns different results.
Every candidate is run and its output hashed. Only an exact match against the baseline hash counts as correct.


Outputs

After the run, autosql_log.json contains every attempt:

{
  "baseline_ms": 1842.3,
  "best_ms": 291.7,
  "speedup": 6.32,
  "iterations": [
    { "iteration": 1, "time_ms": 312.4, "speedup": 5.89, "correct": true, "query": "..." },
    ...
  ]
}

Options

Flag Default Description
--query (required) SQL string or .sql file path
--db (required) SQLite database path
--iterations 10 How many rewrites to attempt
--model llama-3.3-70b-versatile LLM model to use
--provider (auto-detect) groq, openai, anthropic, ollama, together, openrouter
--base-url (none) Custom OpenAI-compatible endpoint URL
--min-speedup 0 (disabled) Stop early once this speedup multiplier is reached (e.g. 3.0)
--timeout 0 (unlimited) Per-execution timeout in milliseconds; aborts runaway queries

Extending

AutoSQL intentionally stays minimal. Easy extension points:

  • Other databases — swap sqlite3 for psycopg2, duckdb, etc. Only run_query() changes.
  • Other metrics — add memory, I/O reads, or query plan cost alongside wall-clock time.
  • Indexes — let the LLM propose CREATE INDEX statements alongside the query rewrite.
  • Batch mode — feed a folder of .sql files, get a report for each.

Requirements

  • Python 3.10+
  • openai (pip install openai) — covers Groq, OpenAI, Ollama, Together, OpenRouter
  • anthropic (pip install anthropic) — only for Claude/Anthropic models
  • At least one API key set as an environment variable (see Quickstart above)
  • SQLite database

Supported providers

Provider --provider Key env var Notes
Groq groq GROQ_API_KEY Default; free tier available
OpenAI openai OPENAI_API_KEY GPT-4o, o3, etc.
Anthropic anthropic ANTHROPIC_API_KEY Claude models; needs pip install anthropic
Ollama ollama (none) Runs locally; no key required
Together AI together TOGETHER_API_KEY
OpenRouter openrouter OPENROUTER_API_KEY
Any OpenAI-compatible openai OPENAI_API_KEY Pass --base-url to override endpoint

About

AutoSQL — Self-optimizing SQL query pipeline. Automatically rewrites slow queries, measures speedup, and keeps improvements. One file, zero dependencies, free API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages