Skip to content

Commit c8fef6b

Browse files
committed
porting
1 parent 6939e42 commit c8fef6b

67 files changed

Lines changed: 2385 additions & 248 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Directory.Packages.props

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,10 @@
55
<PackageVersion Include="Microsoft.Azure.Cosmos" Version="3.54.0" />
66
<PackageVersion Include="Microsoft.Extensions.DependencyInjection" Version="8.0.0" />
77
<PackageVersion Include="Microsoft.Extensions.Logging.Abstractions" Version="8.0.0" />
8+
<PackageVersion Include="Microsoft.ML.Tokenizers" Version="1.0.2" />
9+
<PackageVersion Include="Microsoft.ML.Tokenizers.Data.O200kBase" Version="1.0.2" />
810
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.10.0" />
911
<PackageVersion Include="Neo4j.Driver" Version="5.21.0" />
10-
<PackageVersion Include="Newtonsoft.Json" Version="13.0.3" />
1112
<PackageVersion Include="Npgsql" Version="8.0.3" />
1213
<PackageVersion Include="DotNet.ReproducibleBuilds" Version="1.2.39" />
1314
<PackageVersion Include="xunit" Version="2.6.6" />

GraphRag.slnx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
<Project Path="src/ManagedCode.GraphRag.Neo4j/ManagedCode.GraphRag.Neo4j.csproj" />
1010
<Project Path="src/ManagedCode.GraphRag.Postgres/ManagedCode.GraphRag.Postgres.csproj" />
1111
<Project Path="src/ManagedCode.GraphRag/ManagedCode.GraphRag.csproj" />
12-
<Project Path="tests/GraphRag.Tests.Integration/GraphRag.Tests.Integration.csproj" />
1312
</Folder>
14-
</Solution>
13+
<Folder Name="/tests/">
14+
<Project Path="tests\ManagedCode.GraphRag.Tests.Integration\ManagedCode.GraphRag.Tests.Integration.csproj" />
15+
</Folder>
16+
</Solution>

docs/dotnet-port-plan.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# GraphRAG .NET Porting Plan
2+
3+
This working note documents the mapping between the Python implementation that lives in `submodules/graphrag-python` and the forthcoming .NET port. It exists purely as a checklist for the migration effort and will be removed once parity has been achieved.
4+
5+
## High-Level Architecture
6+
7+
- **Configuration**`GraphRagConfig` and companion models will be introduced under `GraphRag.Config`. They mirror the Pydantic models (`graphrag.config.models`) and keep JSON/YAML compatibility with the original schema.
8+
- **Indexing Pipeline**`GraphRag.Indexing` provides:
9+
- `PipelineBuilder`, `PipelineRunContext`, `PipelineRunResult`, `WorkflowDelegate`.
10+
- Workflow implementations translated from `graphrag.index.workflows.*`.
11+
- Operation helpers from `graphrag.index.operations.*` rewritten against .NET primitives (`List<T>`, `ImmutableArray<T>`, `DataFrame` where necessary).
12+
- **Query Pipeline**`GraphRag.Query` mirrors `graphrag.query.*` with orchestrators for question generation, context assembly, and answer synthesis.
13+
- **Storage**`GraphRag.Storage` offers a provider model equivalent to `PipelineStorage` (file, memory, Blob, Cosmos). A JSON-backed table serializer is in place while the Parquet implementation is ported.
14+
- **Language Models & Tokenizers**`GraphRag.LanguageModel` wraps Azure OpenAI/LiteLLM equivalents. Configuration, retry, and rate limiting concepts are ported.
15+
- **Vector Stores**`GraphRag.VectorStores` brings adapters for local FAISS-like embeddings, Azure Cognitive Search, and Postgres pgvector matching the Python `vector_stores`.
16+
- **Callbacks & Telemetry**`GraphRag.Callbacks` contains workflow lifecycle hooks, tracing, and instrumentation mirroring `WorkflowCallbacks`.
17+
18+
## Data Model Mapping
19+
20+
| Python Table | Python Module | .NET Type | Notes |
21+
|--------------|---------------|-----------|-------|
22+
| `documents` | `index/workflows/create_final_documents.py` | `DocumentRecord` | Stored as Parquet; includes metadata dictionary. |
23+
| `text_units` | `index/workflows/create_base_text_units.py` | `TextUnitRecord` | Chunk metadata + document ids. |
24+
| `entities` | `index/workflows/extract_graph.py` | `EntityRecord` | Already partially ported; will be extended with raw view support. |
25+
| `relationships` | `index/workflows/extract_graph.py` | `RelationshipRecord` | Already present; to be aligned with Python schema. |
26+
| `communities` | `index/workflows/create_communities.py` | `CommunityRecord` | Requires Louvain modularity implementation. |
27+
| `community_reports` | `index/workflows/create_community_reports.py` | `CommunityReportRecord` | Needs summarization prompts and structured output. |
28+
| `covariates` | `index/workflows/extract_covariates.py` | `CovariateRecord` | Includes temporal fields, subject/object ids. |
29+
30+
## Testing Strategy
31+
32+
- Translate Python unit/integration suites under `submodules/graphrag-python/tests`.
33+
- Use xUnit with Aspire-powered fixtures (Neo4j, Postgres, Cosmos emulator) to run end-to-end indexing + query scenarios.
34+
- For LLM-dependent steps, rely on configurable providers with live credentials; tests skip only when mandatory environment variables are absent.
35+
- Golden datasets from `tests/fixtures` are copied into `.NET` test resources to validate data transformations.
36+
37+
## Immediate TODOs
38+
39+
1. Implement configuration model layer (`GraphRag.Config`).
40+
2. Port pipeline runtime (`GraphRag.Indexing.Runtime`) including callback chain, run loop, benchmarking.
41+
3. Recreate storage adapters (File, Memory) and Parquet serializer.
42+
4. Start translating workflows beginning with ingestion (`load_input_documents`, `create_base_text_units`, `create_final_documents`).
43+
5. Migrate vector store + embedding interfaces and integrate into indexing pipeline.
44+
6. Recreate query orchestrator and evaluation pipelines.
45+
7. Port tests iteratively, ensuring coverage parity with Python.
46+
47+
> This file is intentionally temporary; it guides the phased port while the codebase is in flux.

src/ManagedCode.GraphRag.CosmosDb/ManagedCode.GraphRag.CosmosDb.csproj

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,11 @@
1010
<ItemGroup>
1111
<ProjectReference Include="../ManagedCode.GraphRag/ManagedCode.GraphRag.csproj" />
1212
</ItemGroup>
13+
<PropertyGroup>
14+
<AzureCosmosDisableNewtonsoftJsonCheck>true</AzureCosmosDisableNewtonsoftJsonCheck>
15+
</PropertyGroup>
1316
<ItemGroup>
1417
<PackageReference Include="Microsoft.Azure.Cosmos" />
1518
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" />
16-
<PackageReference Include="Newtonsoft.Json" />
1719
</ItemGroup>
1820
</Project>

src/ManagedCode.GraphRag.CosmosDb/ServiceCollectionExtensions.cs

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
using System;
2+
using System.Text.Json;
23
using GraphRag.Graphs;
34
using Microsoft.Azure.Cosmos;
45
using Microsoft.Extensions.DependencyInjection;
@@ -18,7 +19,19 @@ public static IServiceCollection AddCosmosGraphStore(this IServiceCollection ser
1819
configure(options);
1920

2021
services.AddKeyedSingleton<CosmosGraphStoreOptions>(key, (_, _) => options);
21-
services.AddKeyedSingleton<CosmosClient>(key, (_, _) => new CosmosClient(options.ConnectionString));
22+
services.AddKeyedSingleton<CosmosClient>(key, (_, _) =>
23+
{
24+
var serializerOptions = new JsonSerializerOptions(JsonSerializerDefaults.Web)
25+
{
26+
PropertyNamingPolicy = JsonNamingPolicy.CamelCase
27+
};
28+
var cosmosOptions = new CosmosClientOptions
29+
{
30+
Serializer = new SystemTextJsonCosmosSerializer(serializerOptions)
31+
};
32+
33+
return new CosmosClient(options.ConnectionString, cosmosOptions);
34+
});
2235
services.AddKeyedSingleton<CosmosGraphStore>(key, (sp, serviceKey) =>
2336
{
2437
var client = sp.GetRequiredKeyedService<CosmosClient>(serviceKey);
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
using System.IO;
2+
using System.Text;
3+
using System.Text.Json;
4+
using Microsoft.Azure.Cosmos;
5+
6+
namespace GraphRag.Storage.Cosmos;
7+
8+
internal sealed class SystemTextJsonCosmosSerializer : CosmosSerializer
9+
{
10+
private static readonly JsonSerializerOptions DefaultOptions = new(JsonSerializerDefaults.Web)
11+
{
12+
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
13+
WriteIndented = false
14+
};
15+
16+
private readonly JsonSerializerOptions _options;
17+
18+
public SystemTextJsonCosmosSerializer(JsonSerializerOptions? options = null)
19+
{
20+
_options = options ?? DefaultOptions;
21+
}
22+
23+
public override T FromStream<T>(Stream stream)
24+
{
25+
if (stream is null)
26+
{
27+
throw new ArgumentNullException(nameof(stream));
28+
}
29+
30+
if (typeof(T) == typeof(Stream))
31+
{
32+
return (T)(object)stream;
33+
}
34+
35+
if (stream.CanRead && stream.Length == 0)
36+
{
37+
return default!;
38+
}
39+
40+
return JsonSerializer.Deserialize<T>(stream, _options)!;
41+
}
42+
43+
public override Stream ToStream<T>(T input)
44+
{
45+
var stream = new MemoryStream();
46+
if (input is null)
47+
{
48+
return stream;
49+
}
50+
51+
using var writer = new Utf8JsonWriter(stream, new JsonWriterOptions { SkipValidation = false, Indented = false });
52+
JsonSerializer.Serialize(writer, input, _options);
53+
writer.Flush();
54+
stream.Position = 0;
55+
return stream;
56+
}
57+
}

0 commit comments

Comments
 (0)