Vector storage#212
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds end-to-end vector store support to DataPusher Plus, embedding resource data via a local SentenceTransformer model and querying via ChromaDB and OpenRouter.
- Introduces a new
DataPusherVectorStoreclass for embedding, querying, and managing vector data. - Integrates vector embedding into the upload job pipeline with optional temporal coverage extraction.
- Adds configuration settings and a helper to check embedding status.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ckanext/datapusher_plus/vector_store.py | New module implementing vector store integration |
| ckanext/datapusher_plus/jobs.py | Hooks embedding into the datapusher job pipeline |
| ckanext/datapusher_plus/helpers.py | Adds helper to query embedding status |
| ckanext/datapusher_plus/config.py | Adds configuration flags and defaults for vector store |
Comments suppressed due to low confidence (2)
ckanext/datapusher_plus/jobs.py:1599
- The new vector store embedding workflow in the job pipeline lacks corresponding unit or integration tests. Consider adding tests to cover
DataPusherVectorStore.embed_resourceand the job integration path.
if conf.ENABLE_VECTOR_STORE and VECTOR_STORE_AVAILABLE:
ckanext/datapusher_plus/jobs.py:1638
- The function
parsedateis not imported in this module, causing a NameError at runtime. Add the appropriate import (e.g.,from dateutil.parser import parse as parsedate).
min_year = parsedate(str(min_date)).year
| "ckanext.datapusher_plus.embedding_device", "cpu" | ||
| ) | ||
| # OpenRouter API Key | ||
| OPENROUTER_API_KEY = tk.config.get( |
There was a problem hiding this comment.
A default OpenRouter API key is hard-coded in source. This poses a security risk; consider loading it exclusively from a secure environment variable.
ef4f36d to
62962b5
Compare
|
Thanks @minhajuddin2510 for this work. Before this can land, a few concerns to flag for a follow-up version: Description. The PR body is empty and the change adds a new Rebase needed. The PR modifies Could you close + reopen with a description + rebase against current |
No description provided.