feat: Introduce Phase 1 of AntimonySerializer for LLM integration#305
feat: Introduce Phase 1 of AntimonySerializer for LLM integration#305dyrpsf wants to merge 5 commits intosbmlteam:masterfrom
Conversation
draeger
left a comment
There was a problem hiding this comment.
I think it would be beneficial to have many individual methods to convert SBML components to antimonoy without having to serialize the entire model (or SBML document) at once.
A use case I have in mind is back-and forth translation between text boxes, similar to https://arxiv.org/pdf/2309.03344.
Maybe later this could also be used for the implementation of Eclipse or IntelliJ plugins, similar to https://doi.org/10.1093/bioinformatics/btad753.
I don't think we need to go all the way yet, but it can be helpful to have these possible applications in mind when taking the first steps.
|
This is really interesting! I agree with @draeger that your best bet is to have individual elements know how to export themselves to Antimony, instead of having a single ur-function that handles everything. There's nuance with initial species values that isn't captured yet (and is technically wrong, since Antimony assumes species are in concentrations):
You might also want to go ahead and check now whether 'boundaryCondition' is set. If so, write '$[id]' instead of just '[id]' The upshot is the following several options: (And of course, an initial value might not be set at all.) |
Extracted SBML component serialization into individual methods and added a generic SBase router (laying groundwork for IDE plugins). Implemented strict Species initialization logic to handle Antimony's inherent concentration assumptions, substanceOnly mapping, and boundaryCondition prefixing.
|
This is brilliant feedback from both of you! @draeger - Thank you for linking those papers. I completely see your vision now: for a bidirectional UI plugin to remain responsive, it needs to instantly serialize the specific component the user clicked on without the immense overhead of traversing the entire @luciansmith - Thank you for jumping in with that detailed conversion matrix! You are completely right about the nuance of Antimony assuming concentrations natively, and ensuring the math maps correctly is critical. I've just pushed an update that addresses both of these architectural requirements:
Having this granular API and accurate concentration math in place now is definitely the perfect foundation before we tackle the advanced rules and reactions! |
Corrected a minor asterisk typo in the SBase router Javadoc and restored the Phase 2 TODO comments that were temporarily displaced during the modular refactoring.
|
(Just pushed a quick follow-up commit to fix a minor Javadoc formatting typo and restore the Phase 2 |
Thank you very much for your kind words and approval, @draeger ! Getting this architectural foundation right was definitely the best way to kick things off. I'm looking forward to tackling the advanced rules and reactions in the next phase! |
|
@draeger , @luciansmith - Following up on the Phase 1 approval, I've just pushed the Phase 2 implementation targeting Reactions and Kinetic Laws. Since this branch was already approved for the foundational structure, I figured it made sense to attach this logical next step here before merging, rather than opening a fragmented PR. Phase 2 Updates:
With both Species math and Reaction parsing fully implemented, this pipeline is in fantastic shape for downstream LLM/UI consumption. Let me know if you'd like any tweaks to the ASTNode formatting! |
|
@draeger , @luciansmith - Phase 3 is now complete and pushed! Phase 3 Updates:
This completes the full serialization pipeline outlined in the original TODOs. The Ready for review whenever you both have time! |
|
One more advanced feature: named stoichiometries: Doesn't have to go into this PR, but should be on your list somewhere. There's also a lot of extra options for events: delays, priorities, persistence, and t0=false. |
|
@luciansmith Thanks for the review and the heads-up on those edge cases! I've officially added named stoichiometries and the advanced event options (delays, priorities, persistence, t0) to my immediate to-do list. Since the core serialization pipeline (Phase 1-3: Species, Compartments, Reactions, base Rules, and Events) is now fully implemented and passing all tests, I'll plan to tackle those advanced features in a dedicated follow-up PR to keep this one scoped. Let me know if you are comfortable approving this branch as the foundation, or if you'd prefer I squeeze those advanced event/stoichiometry features in here before we merge! |
Description
Following up on discussions regarding the token-efficiency of the Antimony scripting language and potential use-cases for bidirectional IDE plugins (e.g., VSCode-Antimony), this PR introduces the Phase 1 scaffolding for a native JSBML-to-Antimony serializer (
AntimonySerializer.java).By bypassing the XML structure entirely, this utility acts as a highly optimized pipeline for feeding SBML data into Large Language Models and text-editor UIs.
Current Implementation (Phase 1)
This initial PR sets up the architecture and successfully translates the foundational structural elements into valid Antimony syntax:
toAntimony(Species),toAntimony(Compartment)).toAntimony(SBase element)to act as a dynamic hook for IDE plugins, allowing selective serialization without traversing the entireModel.hasOnlySubstanceUnitsandboundaryConditionflags to dynamically map SBML initial amounts/concentrations into Antimony's native concentration assumptions (dynamically multiplying or dividing by the compartment ID as necessary).Future Scope
I have left a placeholder
TODOin the code for the more complex serialization logic (Reactions, Events, Algebraic/Rate Rules, and FBC constraints). I plan to build out this advanced mapping logic comprehensively during the upcominggsoc-sysbio-llm-toolscoding period.Testing
AntimonySerializerTest.javato ensure robust coverage of Phase 1 parsing logic, including theSBasedynamic router and the concentration math matrix.jsbml-corecontinues to compile cleanly.