Skip to content

feat: Introduce Phase 1 of AntimonySerializer for LLM integration#305

Open
dyrpsf wants to merge 5 commits intosbmlteam:masterfrom
dyrpsf:feature-antimony-serializer-base
Open

feat: Introduce Phase 1 of AntimonySerializer for LLM integration#305
dyrpsf wants to merge 5 commits intosbmlteam:masterfrom
dyrpsf:feature-antimony-serializer-base

Conversation

@dyrpsf
Copy link
Copy Markdown
Contributor

@dyrpsf dyrpsf commented Apr 27, 2026

Description

Following up on discussions regarding the token-efficiency of the Antimony scripting language and potential use-cases for bidirectional IDE plugins (e.g., VSCode-Antimony), this PR introduces the Phase 1 scaffolding for a native JSBML-to-Antimony serializer (AntimonySerializer.java).

By bypassing the XML structure entirely, this utility acts as a highly optimized pipeline for feeding SBML data into Large Language Models and text-editor UIs.

Current Implementation (Phase 1)

This initial PR sets up the architecture and successfully translates the foundational structural elements into valid Antimony syntax:

  • Modular Extraction: Serialization logic is broken into standalone methods (toAntimony(Species), toAntimony(Compartment)).
  • Generic Router: Implemented toAntimony(SBase element) to act as a dynamic hook for IDE plugins, allowing selective serialization without traversing the entire Model.
  • Strict Species Initialization: Implements rigorous checking for hasOnlySubstanceUnits and boundaryCondition flags to dynamically map SBML initial amounts/concentrations into Antimony's native concentration assumptions (dynamically multiplying or dividing by the compartment ID as necessary).

Future Scope

I have left a placeholder TODO in the code for the more complex serialization logic (Reactions, Events, Algebraic/Rate Rules, and FBC constraints). I plan to build out this advanced mapping logic comprehensively during the upcoming gsoc-sysbio-llm-tools coding period.

Testing

  • Added AntimonySerializerTest.java to ensure robust coverage of Phase 1 parsing logic, including the SBase dynamic router and the concentration math matrix.
  • Verified that jsbml-core continues to compile cleanly.

Copy link
Copy Markdown
Member

@draeger draeger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be beneficial to have many individual methods to convert SBML components to antimonoy without having to serialize the entire model (or SBML document) at once.

A use case I have in mind is back-and forth translation between text boxes, similar to https://arxiv.org/pdf/2309.03344.

Maybe later this could also be used for the implementation of Eclipse or IntelliJ plugins, similar to https://doi.org/10.1093/bioinformatics/btad753.

I don't think we need to go all the way yet, but it can be helpful to have these possible applications in mind when taking the first steps.

@luciansmith
Copy link
Copy Markdown
Member

This is really interesting! I agree with @draeger that your best bet is to have individual elements know how to export themselves to Antimony, instead of having a single ur-function that handles everything. There's nuance with initial species values that isn't captured yet (and is technically wrong, since Antimony assumes species are in concentrations):

  • If hasOnlySubstanceUnits is true:
    • Export 'substanceOnly species' instead of just 'species'
    • Then check whether initialAmount or initialConcentration is set:
      • If initialAmount is set, write 'id = [initial amount]'
      • If initialConcentration is set, write 'id = [initialConcentration] * [compartment id]'
  • If hasOnlySubstanceUnits is false:
    • Export 'species'
    • Then check whether initialAmount or initialConcentration is set:
      • If initialAmount is set, write 'id = [initial amount] / [compartment id]'
      • If initialConcentration is set, write 'id = [initialConcentration]'

You might also want to go ahead and check now whether 'boundaryCondition' is set. If so, write '$[id]' instead of just '[id]'

The upshot is the following several options:
species S1 = 3 //hOSU=false, initialConcentration, boundary=false
species S1 = 3/C //hOSU=false, initialAmount, boundary=false
substanceOnly species S1 = 3 //hOSU=true, initialAmount, boundary=false
substanceOnly species S1 = 3* C //hOSU=true, initialConcentration, boundary=false
species $S1 = 3 //hOSU=false, initialConcentration, boundary=true
species $S1 = 3/C //hOSU=false, initialAmount, boundary=true
substanceOnly species $S1 = 3 //hOSU=true, initialAmount, boundary=true
substanceOnly species $S1 = 3* C //hOSU=true, initialConcentration, boundary=true

(And of course, an initial value might not be set at all.)

Extracted SBML component serialization into individual methods and added a generic SBase router (laying groundwork for IDE plugins). Implemented strict Species initialization logic to handle Antimony's inherent concentration assumptions, substanceOnly mapping, and boundaryCondition prefixing.
@dyrpsf
Copy link
Copy Markdown
Contributor Author

dyrpsf commented Apr 30, 2026

This is brilliant feedback from both of you!

@draeger - Thank you for linking those papers. I completely see your vision now: for a bidirectional UI plugin to remain responsive, it needs to instantly serialize the specific component the user clicked on without the immense overhead of traversing the entire Model on every keystroke.

@luciansmith - Thank you for jumping in with that detailed conversion matrix! You are completely right about the nuance of Antimony assuming concentrations natively, and ensuring the math maps correctly is critical.

I've just pushed an update that addresses both of these architectural requirements:

  1. Modular Extraction & SBase Router: I broke the logic into standalone, overloaded methods (toAntimony(Species), toAntimony(Compartment)) and implemented a generic toAntimony(SBase element) router. This acts as the direct, dynamic hook for future IDE plugins.
  2. Species Concentration Math: I updated the toAntimony(Species) method to implement the exact logic matrix Lucian provided. It now dynamically injects the * [compId] or / [compId] math based on the hasOnlySubstanceUnits flag, and successfully prepends the $ for boundary conditions.
  3. Updated Tests: Added comprehensive JUnit coverage to verify both the SBase dynamic routing and the advanced Species initialization math.

Having this granular API and accurate concentration math in place now is definitely the perfect foundation before we tackle the advanced rules and reactions!

Corrected a minor asterisk typo in the SBase router Javadoc and restored the Phase 2 TODO comments that were temporarily displaced during the modular refactoring.
@dyrpsf
Copy link
Copy Markdown
Contributor Author

dyrpsf commented Apr 30, 2026

(Just pushed a quick follow-up commit to fix a minor Javadoc formatting typo and restore the Phase 2 TODO comments that accidentally got displaced during the SBase router refactor. The core serialization logic remains exactly as described above!)

@dyrpsf dyrpsf requested a review from draeger April 30, 2026 03:22
Copy link
Copy Markdown
Member

@draeger draeger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!

@dyrpsf
Copy link
Copy Markdown
Contributor Author

dyrpsf commented Apr 30, 2026

Great start!

Thank you very much for your kind words and approval, @draeger ! Getting this architectural foundation right was definitely the best way to kick things off. I'm looking forward to tackling the advanced rules and reactions in the next phase!

@dyrpsf
Copy link
Copy Markdown
Contributor Author

dyrpsf commented May 5, 2026

@draeger , @luciansmith - Following up on the Phase 1 approval, I've just pushed the Phase 2 implementation targeting Reactions and Kinetic Laws.

Since this branch was already approved for the foundational structure, I figured it made sense to attach this logical next step here before merging, rather than opening a fragmented PR.

Phase 2 Updates:

  • Reaction Parsing: Fully implemented toAntimony(Reaction r). It accurately maps reactants, products, and stoichiometry (automatically formatting 2.0 to 2 for cleaner Antimony syntax).
  • Reversibility: Accurately maps the -> (reversible) and => (irreversible) operators based on the SBML reversible flag.
  • Kinetic Laws: Leveraged JSBML's native ASTNode.formulaToString() to seamlessly parse and append kinetic math strings to the reactions.
  • Testing: Expanded AntimonySerializerTest.java to cover basic reactions, complex stoichiometry, kinetic laws, and dynamic SBase routing. All 12 tests are passing cleanly (BUILD SUCCESS via Maven).

With both Species math and Reaction parsing fully implemented, this pipeline is in fantastic shape for downstream LLM/UI consumption. Let me know if you'd like any tweaks to the ASTNode formatting!

@dyrpsf
Copy link
Copy Markdown
Contributor Author

dyrpsf commented May 6, 2026

@draeger , @luciansmith - Phase 3 is now complete and pushed!

Phase 3 Updates:

  • Rules: Added support for AssignmentRule (:=), RateRule (' =), and AlgebraicRule (0 =).
  • Events: Fully implemented Event serialization, including triggers (at (math):) and multiple event assignments (var = math, ...).
  • Robust Testing: AntimonySerializerTest.java now covers all rule types and complex event assignments, safely mapping JSBML ASTNodes to Antimony syntax.

This completes the full serialization pipeline outlined in the original TODOs. The toAntimony(SBase) router can now dynamically serve Species, Compartments, Reactions, Rules, and Events directly to downstream UI plugins.

Ready for review whenever you both have time!

@luciansmith
Copy link
Copy Markdown
Member

One more advanced feature: named stoichiometries:

S1 + n S2 -> S3; k1*S1*S2^n

n = 3

Doesn't have to go into this PR, but should be on your list somewhere.

There's also a lot of extra options for events: delays, priorities, persistence, and t0=false.

@dyrpsf
Copy link
Copy Markdown
Contributor Author

dyrpsf commented May 7, 2026

@luciansmith Thanks for the review and the heads-up on those edge cases!

I've officially added named stoichiometries and the advanced event options (delays, priorities, persistence, t0) to my immediate to-do list.

Since the core serialization pipeline (Phase 1-3: Species, Compartments, Reactions, base Rules, and Events) is now fully implemented and passing all tests, I'll plan to tackle those advanced features in a dedicated follow-up PR to keep this one scoped.

Let me know if you are comfortable approving this branch as the foundation, or if you'd prefer I squeeze those advanced event/stoichiometry features in here before we merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants