Skip to content

new SKILL: generate ingest-sources script#9

Draft
ying2212 wants to merge 3 commits into
astrodbtoolkit:mainfrom
ying2212:ingest-source-skill
Draft

new SKILL: generate ingest-sources script#9
ying2212 wants to merge 3 commits into
astrodbtoolkit:mainfrom
ying2212:ingest-source-skill

Conversation

@ying2212
Copy link
Copy Markdown
Collaborator

Adds a new Claude SKILL that generates a Python ingest script for adding sources from data table into astrodb.

Closes: #8

@ying2212 ying2212 marked this pull request as draft May 12, 2026 18:18
@kelle
Copy link
Copy Markdown
Contributor

kelle commented May 12, 2026

@kelle
Copy link
Copy Markdown
Contributor

kelle commented May 12, 2026

add astrodb prefix to skill name.

Comment on lines +98 to +101
## Step 3: Write `tmp/ingest_{REF}_sources.py`

Fill in all values from Steps 1–2 and write the script to `tmp/ingest_{REF}_sources.py`.
Every variable below must contain a real value — never write placeholder text to the file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to add a scripts/ingest_sources.py reference file and update the skill to copy that and update it accordingly. That might be cleaner than putting the entire script into the skill.


To confirm for an unknown DB, check the schema with:
```python
print(db.metadata.tables["Sources"].columns.keys())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small possibility that the user wants a different name for their Sources table (eg, sources or even galaxies). We might need to support this at some point.

@@ -0,0 +1,222 @@
---
name: ingest-sources
description: "Generate and run a Python script that ingests sources (astronomical objects) into an AstroDB Sources table from a data table. Use this skill when the user says: ingest sources, ingest objects, add new sources to the database, add objects to SIMPLE, or provides a FITS/CSV/ECSV file and wants to populate the Sources table. Works standalone or as the step after match-schema."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to remove SIMPLE from the description, maybe use add objects to the database instead

Copy link
Copy Markdown
Contributor

@kelle kelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some small comments but my major feedback is to setup "evals" using the skill-creator skill and the FITS file you're using as data.

## Prerequisites

1. **Database**: JSON data files + `database.toml` (astrodb-template-db layout).
If absent, run the `create-astrodb` skill first.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an understatement. Not quite sure what advice to give if the toml file is missing.

3. **Data table**: FITS, CSV, ECSV, or any astropy-readable format, with at minimum
a source name column and a discovery reference column.
4. **Publications populated**: every reference value must already exist in `Publications`.
If not, tell the user to run `ingest_publication` first.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make ingesting the publications part of this skill? It seems like it should be easy enough.

| Other reference | No | |

After confirmation, read the first value of the reference column — use it as `{REF}`
to name the output script (e.g. `Burg24`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather prompt the user or use the name of the input_file.

Comment on lines +117 to +118
SCHEMA_PATH = "tests/astrodb-template-db"
DB_NAME = "tests/astrodb-template-tests"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SCHEMA_PATH = "tests/astrodb-template-db"
DB_NAME = "tests/astrodb-template-tests"

Shouldn't be needed. All of this is now sent in the TOML file.

Comment on lines +123 to +124
base_path=SCHEMA_PATH,
db_name=DB_NAME,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base_path=SCHEMA_PATH,
db_name=DB_NAME,

db.save_database(directory="data/")
logger.info("Database saved to data/")
else:
logger.info("Dry run complete — NOT saved. Set SAVE_DB = True to persist.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info("Dry run complete — NOT saved. Set SAVE_DB = True to persist.")
logger.info("Dry run complete — NOT saved. Set SAVE_DB = True to write the database to JSON files.")

Comment on lines +210 to +222
## Key Behaviors

1. **Missing RA/Dec**: if `RA_COL = None`, `ingest_source` queries SIMBAD automatically.
If SIMBAD has no match, that row is skipped with a warning.
2. **Duplicate sources**: if a source already exists, `ingest_source` adds the new name
as an alternate in `Names` — it does not re-insert into `Sources`.
3. **Missing reference**: `reference` must already be in `Publications` or ingestion fails.
Remind the user to run `ingest_publication` first.
4. **Unicode dashes**: handled automatically by `ingest_source`
(en dash, em dash, minus sign, figure dash → `-`).
5. **DB schema column names**: defaults (`ra_deg`/`dec_deg`/`epoch_year`) match
astrodb-template-db. SIMPLE-db uses `ra`/`dec`/`epoch`. Wrong values cause all rows
to silently skip — always confirm the target DB in Step 2B. No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems duplicated in the reference file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New SKILL: ingest sources

3 participants