new SKILL: generate ingest-sources script#9
Conversation
|
|
add |
| ## Step 3: Write `tmp/ingest_{REF}_sources.py` | ||
|
|
||
| Fill in all values from Steps 1–2 and write the script to `tmp/ingest_{REF}_sources.py`. | ||
| Every variable below must contain a real value — never write placeholder text to the file. |
There was a problem hiding this comment.
We may want to add a scripts/ingest_sources.py reference file and update the skill to copy that and update it accordingly. That might be cleaner than putting the entire script into the skill.
|
|
||
| To confirm for an unknown DB, check the schema with: | ||
| ```python | ||
| print(db.metadata.tables["Sources"].columns.keys()) |
There was a problem hiding this comment.
There is a small possibility that the user wants a different name for their Sources table (eg, sources or even galaxies). We might need to support this at some point.
| @@ -0,0 +1,222 @@ | |||
| --- | |||
| name: ingest-sources | |||
| description: "Generate and run a Python script that ingests sources (astronomical objects) into an AstroDB Sources table from a data table. Use this skill when the user says: ingest sources, ingest objects, add new sources to the database, add objects to SIMPLE, or provides a FITS/CSV/ECSV file and wants to populate the Sources table. Works standalone or as the step after match-schema." | |||
There was a problem hiding this comment.
You may want to remove SIMPLE from the description, maybe use add objects to the database instead
kelle
left a comment
There was a problem hiding this comment.
I made some small comments but my major feedback is to setup "evals" using the skill-creator skill and the FITS file you're using as data.
| ## Prerequisites | ||
|
|
||
| 1. **Database**: JSON data files + `database.toml` (astrodb-template-db layout). | ||
| If absent, run the `create-astrodb` skill first. |
There was a problem hiding this comment.
This is an understatement. Not quite sure what advice to give if the toml file is missing.
| 3. **Data table**: FITS, CSV, ECSV, or any astropy-readable format, with at minimum | ||
| a source name column and a discovery reference column. | ||
| 4. **Publications populated**: every reference value must already exist in `Publications`. | ||
| If not, tell the user to run `ingest_publication` first. |
There was a problem hiding this comment.
Do we want to make ingesting the publications part of this skill? It seems like it should be easy enough.
| | Other reference | No | | | ||
|
|
||
| After confirmation, read the first value of the reference column — use it as `{REF}` | ||
| to name the output script (e.g. `Burg24`). |
There was a problem hiding this comment.
I would rather prompt the user or use the name of the input_file.
| SCHEMA_PATH = "tests/astrodb-template-db" | ||
| DB_NAME = "tests/astrodb-template-tests" |
There was a problem hiding this comment.
| SCHEMA_PATH = "tests/astrodb-template-db" | |
| DB_NAME = "tests/astrodb-template-tests" |
Shouldn't be needed. All of this is now sent in the TOML file.
| base_path=SCHEMA_PATH, | ||
| db_name=DB_NAME, |
There was a problem hiding this comment.
| base_path=SCHEMA_PATH, | |
| db_name=DB_NAME, | |
| db.save_database(directory="data/") | ||
| logger.info("Database saved to data/") | ||
| else: | ||
| logger.info("Dry run complete — NOT saved. Set SAVE_DB = True to persist.") |
There was a problem hiding this comment.
| logger.info("Dry run complete — NOT saved. Set SAVE_DB = True to persist.") | |
| logger.info("Dry run complete — NOT saved. Set SAVE_DB = True to write the database to JSON files.") |
| ## Key Behaviors | ||
|
|
||
| 1. **Missing RA/Dec**: if `RA_COL = None`, `ingest_source` queries SIMBAD automatically. | ||
| If SIMBAD has no match, that row is skipped with a warning. | ||
| 2. **Duplicate sources**: if a source already exists, `ingest_source` adds the new name | ||
| as an alternate in `Names` — it does not re-insert into `Sources`. | ||
| 3. **Missing reference**: `reference` must already be in `Publications` or ingestion fails. | ||
| Remind the user to run `ingest_publication` first. | ||
| 4. **Unicode dashes**: handled automatically by `ingest_source` | ||
| (en dash, em dash, minus sign, figure dash → `-`). | ||
| 5. **DB schema column names**: defaults (`ra_deg`/`dec_deg`/`epoch_year`) match | ||
| astrodb-template-db. SIMPLE-db uses `ra`/`dec`/`epoch`. Wrong values cause all rows | ||
| to silently skip — always confirm the target DB in Step 2B. No newline at end of file |
There was a problem hiding this comment.
This seems duplicated in the reference file.
Adds a new Claude SKILL that generates a Python ingest script for adding sources from data table into astrodb.
Closes: #8