|
| 1 | +# CodeGraph FastAPI Backend - Architecture Analysis |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +CodeGraph currently exposes a **FastAPI** backend from `api.index:app` and serves the built React UI from the same process when `app/dist` exists. |
| 6 | + |
| 7 | +The framework-specific HTTP code is concentrated in `api/index.py`. Most of the backend domain modules (`graph.py`, `project.py`, `analyzers/`, `git_utils/`, `info.py`, `llm.py`) are reusable Python components that do not depend on FastAPI. |
| 8 | + |
| 9 | +## 1. Backend Layout |
| 10 | + |
| 11 | +``` |
| 12 | +api/ |
| 13 | +├── __init__.py # Public package exports |
| 14 | +├── index.py # FastAPI app, auth dependencies, routes, SPA serving |
| 15 | +├── graph.py # FalkorDB graph access (sync + async helpers) |
| 16 | +├── llm.py # GraphRAG + LiteLLM chat integration |
| 17 | +├── info.py # Repository metadata stored in Redis/FalkorDB |
| 18 | +├── project.py # Clone/local repo analysis orchestration |
| 19 | +├── auto_complete.py # Prefix search helper |
| 20 | +├── prompts.py # Chat/Cypher prompt templates |
| 21 | +│ |
| 22 | +├── analyzers/ |
| 23 | +│ ├── analyzer.py # Abstract analyzer base class |
| 24 | +│ ├── source_analyzer.py # File scanning + analyzer dispatch |
| 25 | +│ ├── python/analyzer.py # Python analyzer |
| 26 | +│ ├── java/analyzer.py # Java analyzer |
| 27 | +│ ├── csharp/analyzer.py # C# analyzer |
| 28 | +│ └── c/analyzer.py # Present in tree, but not registered |
| 29 | +│ |
| 30 | +├── entities/ # Entity/File wrappers and encoders |
| 31 | +├── git_utils/ # Git history graph and repo utilities |
| 32 | +└── code_coverage/ # Coverage helpers |
| 33 | +``` |
| 34 | + |
| 35 | +## 2. HTTP Layer (`api/index.py`) |
| 36 | + |
| 37 | +### 2.1 Application and routing |
| 38 | + |
| 39 | +- The backend app is `FastAPI()`. |
| 40 | +- All API routes are mounted under `/api/...`. |
| 41 | +- A catch-all route serves static files from `app/dist` and falls back to `index.html` for the React SPA. |
| 42 | + |
| 43 | +### 2.2 Authentication dependencies |
| 44 | + |
| 45 | +`api/index.py` defines two FastAPI dependencies: |
| 46 | + |
| 47 | +- `public_or_auth`: used by read-only endpoints. If `CODE_GRAPH_PUBLIC=1`, the request is allowed without auth; otherwise it checks the `Authorization` header against `SECRET_TOKEN`. |
| 48 | +- `token_required`: used by mutating endpoints and always checks the `Authorization` header against `SECRET_TOKEN`. |
| 49 | + |
| 50 | +The current `_verify_token()` helper also treats a missing `SECRET_TOKEN` as allowing requests with no `Authorization` header. |
| 51 | + |
| 52 | +### 2.3 Request models |
| 53 | + |
| 54 | +The API uses Pydantic request models for POST bodies, including: |
| 55 | + |
| 56 | +- `RepoRequest` |
| 57 | +- `NeighborsRequest` |
| 58 | +- `AutoCompleteRequest` |
| 59 | +- `FindPathsRequest` |
| 60 | +- `ChatRequest` |
| 61 | +- `AnalyzeFolderRequest` |
| 62 | +- `AnalyzeRepoRequest` |
| 63 | +- `SwitchCommitRequest` |
| 64 | + |
| 65 | +### 2.4 Endpoint inventory |
| 66 | + |
| 67 | +**Read endpoints** (`public_or_auth`): |
| 68 | + |
| 69 | +- `GET /api/graph_entities` |
| 70 | +- `POST /api/get_neighbors` |
| 71 | +- `POST /api/auto_complete` |
| 72 | +- `GET /api/list_repos` |
| 73 | +- `POST /api/repo_info` |
| 74 | +- `POST /api/find_paths` |
| 75 | +- `POST /api/chat` |
| 76 | +- `POST /api/list_commits` |
| 77 | + |
| 78 | +**Mutating endpoints** (`token_required`): |
| 79 | + |
| 80 | +- `POST /api/analyze_folder` |
| 81 | +- `POST /api/analyze_repo` |
| 82 | +- `POST /api/switch_commit` |
| 83 | + |
| 84 | +### 2.5 Async behavior |
| 85 | + |
| 86 | +The FastAPI handlers are `async def`, but several heavy operations are still blocking and are moved off the event loop with `asyncio.get_running_loop().run_in_executor(...)`: |
| 87 | + |
| 88 | +- local folder analysis |
| 89 | +- repository clone + analysis |
| 90 | +- LLM chat work |
| 91 | +- commit switching |
| 92 | + |
| 93 | +## 3. Domain Modules |
| 94 | + |
| 95 | +### 3.1 `graph.py` |
| 96 | + |
| 97 | +`Graph` is the core FalkorDB interface used for code-graph mutations and queries. It also exposes helpers such as: |
| 98 | + |
| 99 | +- `get_sub_graph()` |
| 100 | +- `get_neighbors()` |
| 101 | +- `add_entity()` |
| 102 | +- `connect_entities()` |
| 103 | +- `find_paths()` |
| 104 | +- `stats()` |
| 105 | +- backlog helpers used during git-history processing |
| 106 | + |
| 107 | +Async route handlers use `AsyncGraphQuery` and `async_get_repos()` for non-blocking access patterns. |
| 108 | + |
| 109 | +### 3.2 `project.py` |
| 110 | + |
| 111 | +`Project` represents either: |
| 112 | + |
| 113 | +- a cloned git repository via `Project.from_git_repository(url)`, or |
| 114 | +- a local repository via `Project.from_local_repository(path)`. |
| 115 | + |
| 116 | +Its two main orchestration steps are: |
| 117 | + |
| 118 | +- `analyze_sources(ignore)` |
| 119 | +- `process_git_history(ignore)` |
| 120 | + |
| 121 | +### 3.3 `analyzers/source_analyzer.py` |
| 122 | + |
| 123 | +`SourceAnalyzer` walks the repository tree, picks a registered analyzer by file extension, and builds the code graph. |
| 124 | + |
| 125 | +Registered analyzers in the current code: |
| 126 | + |
| 127 | +- `.py` -> `PythonAnalyzer` |
| 128 | +- `.java` -> `JavaAnalyzer` |
| 129 | +- `.cs` -> `CSharpAnalyzer` |
| 130 | + |
| 131 | +The C analyzer source exists, but `.c` and `.h` registrations are commented out. |
| 132 | + |
| 133 | +### 3.4 `git_utils/` |
| 134 | + |
| 135 | +Git history is modeled as a separate FalkorDB graph per repository (for example `{repo_name}_git`). |
| 136 | + |
| 137 | +Key pieces: |
| 138 | + |
| 139 | +- `GitGraph` / `AsyncGitGraph` |
| 140 | +- `build_commit_graph(...)` |
| 141 | +- `switch_commit(...)` |
| 142 | +- helper functions for diff classification and ignore checks |
| 143 | + |
| 144 | +### 3.5 `info.py` |
| 145 | + |
| 146 | +Repository metadata is stored via Redis-compatible access backed by FalkorDB connection settings. Stored fields include: |
| 147 | + |
| 148 | +- `repo_url` |
| 149 | +- `commit` |
| 150 | + |
| 151 | +### 3.6 `llm.py` |
| 152 | + |
| 153 | +Chat requests use GraphRAG-SDK with LiteLLM: |
| 154 | + |
| 155 | +- default `MODEL_NAME` is `gemini/gemini-flash-lite-latest` |
| 156 | +- the backend creates a `KnowledgeGraph` bound to the repository graph |
| 157 | +- `ask()` offloads the synchronous chat session call to a worker thread |
| 158 | + |
| 159 | +## 4. Runtime and Environment |
| 160 | + |
| 161 | +### 4.1 Local development |
| 162 | + |
| 163 | +Typical backend dev command: |
| 164 | + |
| 165 | +```bash |
| 166 | +uv run uvicorn api.index:app --host 127.0.0.1 --port 5000 --reload |
| 167 | +``` |
| 168 | + |
| 169 | +Typical frontend dev command: |
| 170 | + |
| 171 | +```bash |
| 172 | +cd app && npm run dev |
| 173 | +``` |
| 174 | + |
| 175 | +`app/vite.config.ts` proxies `/api` requests to `http://127.0.0.1:5000` during frontend development. |
| 176 | + |
| 177 | +### 4.2 Production/container startup |
| 178 | + |
| 179 | +The checked-in production entrypoints use Uvicorn, not Flask: |
| 180 | + |
| 181 | +- `make run-prod` |
| 182 | +- `start.sh` |
| 183 | +- Docker image entrypoint (`/start.sh`) |
| 184 | + |
| 185 | +### 4.3 Important environment variables |
| 186 | + |
| 187 | +- `FALKORDB_HOST` |
| 188 | +- `FALKORDB_PORT` |
| 189 | +- `FALKORDB_USERNAME` |
| 190 | +- `FALKORDB_PASSWORD` |
| 191 | +- `SECRET_TOKEN` |
| 192 | +- `CODE_GRAPH_PUBLIC` |
| 193 | +- `ALLOWED_ANALYSIS_DIR` |
| 194 | +- `MODEL_NAME` |
| 195 | +- provider-specific LiteLLM credential(s), such as `GEMINI_API_KEY` for the default model |
| 196 | + |
| 197 | +## 5. Storage Model |
| 198 | + |
| 199 | +### 5.1 Code graph |
| 200 | + |
| 201 | +The main repository graph lives in FalkorDB and contains entities such as: |
| 202 | + |
| 203 | +- `File` |
| 204 | +- `Class` |
| 205 | +- `Function` |
| 206 | +- `Interface` |
| 207 | + |
| 208 | +Relationships include: |
| 209 | + |
| 210 | +- `DEFINES` |
| 211 | +- `CALLS` |
| 212 | +- `EXTENDS` |
| 213 | +- `IMPLEMENTS` |
| 214 | + |
| 215 | +### 5.2 Git graph |
| 216 | + |
| 217 | +Commit history is stored in a second graph named `{repo_name}_git`, with commit metadata and parent/child edges. |
| 218 | + |
| 219 | +### 5.3 Repository metadata |
| 220 | + |
| 221 | +Repository URL and current commit are stored in Redis-style hashes keyed as `{repo_name}_info`. |
| 222 | + |
| 223 | +## 6. Request Flows |
| 224 | + |
| 225 | +### 6.1 `POST /api/analyze_repo` |
| 226 | + |
| 227 | +1. FastAPI validates the request body with `AnalyzeRepoRequest`. |
| 228 | +2. `token_required` checks the `Authorization` header. |
| 229 | +3. `Project.from_git_repository()` clones the repo locally. |
| 230 | +4. `analyze_sources()` builds the code graph. |
| 231 | +5. `process_git_history()` builds the repository's git graph. |
| 232 | +6. The endpoint returns `{"status": "success"}`. |
| 233 | + |
| 234 | +### 6.2 `POST /api/chat` |
| 235 | + |
| 236 | +1. FastAPI validates `repo` and `msg`. |
| 237 | +2. `public_or_auth` enforces auth/public rules. |
| 238 | +3. `ask()` creates a GraphRAG chat session for the repository graph. |
| 239 | +4. LiteLLM generates Cypher and a natural-language response. |
| 240 | +5. The endpoint returns `{"status": "success", "response": ...}`. |
| 241 | + |
| 242 | +## 7. Key Takeaways |
| 243 | + |
| 244 | +- The backend is now FastAPI + Uvicorn, not Flask. |
| 245 | +- All public API paths are under `/api/...`. |
| 246 | +- The React app can be served by the backend from `app/dist`. |
| 247 | +- Most backend logic remains framework-agnostic and reusable. |
| 248 | +- Supported analyzers are currently Python, Java, and C#. |
0 commit comments