|
| 1 | +# Indexing, Querying, and Prompt Tuning in GraphRAG for .NET |
| 2 | + |
| 3 | +GraphRAG for .NET keeps feature parity with the Python reference project described in the [Microsoft Research blog](https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/) and the [GraphRAG paper](https://arxiv.org/pdf/2404.16130). This document explains how the .NET workflows map to the concepts documented on [microsoft.github.io/graphrag](https://microsoft.github.io/graphrag/), highlights the supported query modes, and shows how to customise prompts via manual or auto tuning outputs. |
| 4 | + |
| 5 | +## Indexing Architecture |
| 6 | + |
| 7 | +- **Workflow parity.** Each indexing stage matches the Python pipeline and the [default data flow](https://microsoft.github.io/graphrag/index/default_dataflow/): |
| 8 | + - `load_input_documents` → `create_base_text_units` → `summarize_descriptions` |
| 9 | + - `extract_graph` persists `entities` and `relationships` |
| 10 | + - `create_communities` produces `communities` |
| 11 | + - `community_summaries` writes `community_reports` |
| 12 | + - `extract_covariates` stores `covariates` |
| 13 | +- **Storage schema.** Tables share the column layout described under [index outputs](https://microsoft.github.io/graphrag/index/outputs/). The new strongly-typed records (`CommunityRecord`, `CovariateRecord`, etc.) mirror the JSON representation used by the Python implementation. |
| 14 | +- **Cluster configuration.** `GraphRagConfig.ClusterGraph` exposes the same knobs as the Python `cluster_graph` settings, enabling largest-component filtering and deterministic seeding. |
| 15 | + |
| 16 | +## Query Capabilities |
| 17 | + |
| 18 | +The query layer ports the orchestrators documented in the [GraphRAG query overview](https://microsoft.github.io/graphrag/query/overview/): |
| 19 | + |
| 20 | +- **Global search** ([docs](https://microsoft.github.io/graphrag/query/global_search/)) traverses community summaries and graph context to craft answers spanning the corpus. |
| 21 | +- **Local search** ([docs](https://microsoft.github.io/graphrag/query/local_search/)) anchors on a document neighbourhood when you need focused context. |
| 22 | +- **Drift search** ([docs](https://microsoft.github.io/graphrag/query/drift_search/)) monitors narrative changes across time slices. |
| 23 | +- **Question generation** ([docs](https://microsoft.github.io/graphrag/query/question_generation/)) produces follow-up questions to extend an investigation. |
| 24 | + |
| 25 | +Every orchestrator consumes the same indexed tables as the Python project, so the .NET stack interoperates with BYOG scenarios described in the [index architecture guide](https://microsoft.github.io/graphrag/index/architecture/). |
| 26 | + |
| 27 | +## Prompt Tuning |
| 28 | + |
| 29 | +Manual and auto prompt tuning are both available without code changes: |
| 30 | + |
| 31 | +1. **Manual overrides** follow the rules from [manual prompt tuning](https://microsoft.github.io/graphrag/prompt_tuning/manual_prompt_tuning/). |
| 32 | + - Place custom templates under a directory referenced by `GraphRagConfig.PromptTuning.Manual.Directory` and set `Enabled = true`. |
| 33 | + - Filenames follow the stage key pattern `section/workflow/kind.txt` (see table below). |
| 34 | +2. **Auto tuning** integrates the outputs documented in [auto prompt tuning](https://microsoft.github.io/graphrag/prompt_tuning/auto_prompt_tuning/). |
| 35 | + - Point `GraphRagConfig.PromptTuning.Auto.Directory` at the folder containing the generated prompts and set `Enabled = true`. |
| 36 | + - The runtime prefers explicit paths from workflow configs, then manual overrides, then auto-tuned files, and finally the built-in defaults in `prompts/`. |
| 37 | + |
| 38 | +### Stage Keys and Placeholders |
| 39 | + |
| 40 | +| Workflow | Stage key | Purpose | Supported placeholders | |
| 41 | +|----------|-----------|---------|------------------------| |
| 42 | +| `extract_graph` (system) | `index/extract_graph/system.txt` | System prompt that instructs the extractor. | _N/A_ | |
| 43 | +| `extract_graph` (user) | `index/extract_graph/user.txt` | User prompt template for individual text units. | `{{max_entities}}`, `{{text}}` | |
| 44 | +| `community_summaries` (system) | `index/community_reports/system.txt` | System guidance for cluster summarisation. | _N/A_ | |
| 45 | +| `community_summaries` (user) | `index/community_reports/user.txt` | User prompt template for entity lists. | `{{max_length}}`, `{{entities}}` | |
| 46 | + |
| 47 | +Placeholders are replaced at runtime with values drawn from workflow configuration: |
| 48 | + |
| 49 | +- `{{max_entities}}` → `ExtractGraphConfig.EntityTypes.Count + 5` (minimum 1) |
| 50 | +- `{{text}}` → the original text unit content |
| 51 | +- `{{max_length}}` → `CommunityReportsConfig.MaxLength` |
| 52 | +- `{{entities}}` → bullet list of entity titles and descriptions |
| 53 | + |
| 54 | +If a template is omitted, the runtime falls back to the built-in prompts stored under `prompts/` and bundled with the repository. |
| 55 | + |
| 56 | +## Integration Tests |
| 57 | + |
| 58 | +`tests/ManagedCode.GraphRag.Tests/Integration/CommunitySummariesIntegrationTests.cs` exercises the new prompt loader end-to-end using the file-backed pipeline storage. Combined with the existing Aspire-powered suites, the tests demonstrate how indexing, community detection, and summarisation behave with tuned prompts while remaining faithful to the [GraphRAG BYOG guidance](https://microsoft.github.io/graphrag/index/byog/). |
| 59 | + |
| 60 | +## Further Reading |
| 61 | + |
| 62 | +- [GraphRAG prompt tuning overview](https://microsoft.github.io/graphrag/prompt_tuning/overview/) |
| 63 | +- [GraphRAG index methods](https://microsoft.github.io/graphrag/index/methods/) |
| 64 | +- [GraphRAG query overview](https://microsoft.github.io/graphrag/query/overview/) |
| 65 | +- [GraphRAG default dataflow](https://microsoft.github.io/graphrag/index/default_dataflow/) |
| 66 | + |
| 67 | +These resources underpin the .NET implementation and provide broader context for customising or extending the library. |
0 commit comments