Skip to content

Commit d927a10

Browse files
committed
Release 3.8.0
Major mapping and benchmarking overhaul. Update SMSD to 6.9.0 and clean up the adapter layer. Improve chemistry robustness, candidate selection, identity handling, and benchmark reporting.
1 parent 6cef3cb commit d927a10

29 files changed

Lines changed: 2946 additions & 604 deletions

.mvn/jvm.config

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
--sun-misc-unsafe-memory-access=allow
2+
--enable-native-access=ALL-UNNAMED

README.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,23 @@
33
Introduction
44
============
55

6-
`Reaction Decoder Tool (RDT) v3.7.0`
6+
`Reaction Decoder Tool (RDT) v3.8.0`
77
--------------------------------------
88

99
**Toolkit-agnostic reaction mapping engine** with CDK adapter. Deterministic, no training data required.
1010

1111
### Golden Dataset Benchmark (Lin et al. 2022, 1,851 reactions)
1212

13-
| Tool | Exact Match | Atom Accuracy | Training Data | Deterministic |
14-
|------|-------------|---------------|---------------|---------------|
15-
| **RDT v3.7.0** | **82.0%** | **96.4%** | **None** | **Yes** |
16-
| RXNMapper | 83.74% | - | Unsupervised | No |
17-
| RDTool (published) | 76.18% | - | None | Yes |
18-
| ChemAxon | 70.45% | - | Proprietary | Yes |
13+
Current benchmark reporting now separates strict atom numbering from chemically equivalent mappings.
14+
15+
| Tool | Mol-Map Exact | Atom-Map Exact | Atom-Map Chemically Equivalent | Bond-Change Exact | Deterministic |
16+
|------|---------------|----------------|--------------------------------|-------------------|---------------|
17+
| **RDT v3.8.0** | **75.6%** | **23.2%** | **99.2%** | **99.2%** | **Yes** |
18+
| RXNMapper | - | 83.74% | - | - | No |
19+
| RDTool (published) | - | 76.18% | - | - | Yes |
20+
| ChemAxon | - | 70.45% | - | - | Yes |
21+
22+
Detailed benchmark snapshots are in `reports/golden-benchmark-report.md`.
1923

2024
*Reference: Lin A et al. Molecular Informatics 41(4):e2100138, 2022. DOI: [10.1002/minf.202100138](https://doi.org/10.1002/minf.202100138)*
2125

@@ -122,7 +126,7 @@ The package namespace has changed from `uk.ac.ebi` to `com.bioinceptionlabs` in
122126
<!-- Old (v2.x) -->
123127
<groupId>uk.ac.ebi.rdt</groupId>
124128

125-
<!-- New (v3.7.0+) -->
129+
<!-- New (v3.8.0+) -->
126130
<groupId>com.bioinceptionlabs</groupId>
127131
```
128132

@@ -163,7 +167,7 @@ Performance
163167
| Test suite | 164 tests, 100% pass |
164168
| Test time | ~120s (4x faster than v2.x) |
165169
| Codebase | 68 files (reduced from 345) |
166-
| Dependencies | SMSD 6.7.0, CDK 2.12 (lightweight) |
170+
| Dependencies | SMSD 6.9.0, CDK 2.12 (lightweight) |
167171
| Deterministic | Yes (no ML training needed) |
168172

169173
How to Cite RDT?
@@ -196,7 +200,7 @@ Sub-commands
196200
`AAM using SMILES`
197201

198202
```
199-
java -jar rdt-3.7.0-jar-with-dependencies.jar -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -c -j AAM -f TEXT
203+
java -jar rdt-3.8.0-jar-with-dependencies.jar -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -c -j AAM -f TEXT
200204
```
201205

202206
`Perform AAM` for Transporters
@@ -205,14 +209,14 @@ Sub-commands
205209
`AAM using SMILES` (accept mapping with no bond changes -b)
206210

207211
```
208-
java -jar rdt-3.7.0-jar-with-dependencies.jar -Q SMI -q "O=C(O)C(N)CC(=O)N.O=C(O)C(N)CS>>C(N)(CC(=O)N)C(=O)O.O=C(O)C(N)CS" -b -g -c -j AAM -f TEXT
212+
java -jar rdt-3.8.0-jar-with-dependencies.jar -Q SMI -q "O=C(O)C(N)CC(=O)N.O=C(O)C(N)CS>>C(N)(CC(=O)N)C(=O)O.O=C(O)C(N)CS" -b -g -c -j AAM -f TEXT
209213
```
210214

211215
`Annotate Reaction using SMILES`
212216
---------------------------------
213217

214218
```
215-
java -jar rdt-3.7.0-jar-with-dependencies.jar -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -c -j ANNOTATE -f XML
219+
java -jar rdt-3.8.0-jar-with-dependencies.jar -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -c -j ANNOTATE -f XML
216220
```
217221

218222

@@ -222,12 +226,12 @@ Sub-commands
222226
`Compare Reactions using SMILES with precomputed AAM mappings`
223227

224228
```
225-
java -jar rdt-3.7.0-jar-with-dependencies.jar -Q RXN -q example/ReactionDecoder_mapped.rxn -T RXN -t example/ReactionDecoder_mapped.rxn -j COMPARE -f BOTH -u
229+
java -jar rdt-3.8.0-jar-with-dependencies.jar -Q RXN -q example/ReactionDecoder_mapped.rxn -T RXN -t example/ReactionDecoder_mapped.rxn -j COMPARE -f BOTH -u
226230
```
227231

228232

229233
`Compare Reactions using RXN files`
230234

231235
```
232-
java -jar rdt-3.7.0-jar-with-dependencies.jar -Q RXN -q example/ReactionDecoder_mapped.rxn -T RXN -t example/ReactionDecoder_mapped.rxn -j COMPARE -f BOTH
236+
java -jar rdt-3.8.0-jar-with-dependencies.jar -Q RXN -q example/ReactionDecoder_mapped.rxn -T RXN -t example/ReactionDecoder_mapped.rxn -j COMPARE -f BOTH
233237
```

pom.xml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<groupId>com.bioinceptionlabs</groupId>
55
<artifactId>rdt</artifactId>
66
<description>Reaction Decoder Tool</description>
7-
<version>3.7.0</version>
7+
<version>3.8.0</version>
88
<packaging>jar</packaging>
99
<properties>
1010
<jdk.version>21</jdk.version>
@@ -185,7 +185,7 @@
185185
<dependency>
186186
<groupId>com.bioinceptionlabs</groupId>
187187
<artifactId>smsd</artifactId>
188-
<version>6.7.0</version>
188+
<version>6.9.0</version>
189189
</dependency>
190190

191191
<!-- https://mvnrepository.com/artifact/commons-cli/commons-cli -->
@@ -275,7 +275,10 @@
275275
<groupId>org.apache.maven.plugins</groupId>
276276
<artifactId>maven-surefire-plugin</artifactId>
277277
<version>3.5.5</version>
278+
<configuration>
279+
<argLine>--sun-misc-unsafe-memory-access=allow --enable-native-access=ALL-UNNAMED</argLine>
280+
</configuration>
278281
</plugin>
279282
</plugins>
280283
</build>
281-
</project>
284+
</project>

reports/golden-benchmark-report.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Golden Benchmark Report
2+
3+
Release: RDT v3.8.0
4+
5+
Date: 2026-04-02
6+
7+
Dataset: Lin et al. 2022 golden dataset
8+
9+
Commands:
10+
11+
```bash
12+
mvn -q -Dtest=GoldenDatasetBenchmarkTest -Dgolden.max=20 test
13+
mvn -q -Dtest=GoldenDatasetBenchmarkTest -Dgolden.max=100 test
14+
mvn -q -Dtest=GoldenDatasetBenchmarkTest -Dgolden.max=250 test
15+
```
16+
17+
## Metric definitions
18+
19+
- `Mapping success`: mapper returned a solution without hard failure
20+
- `Mol-map exact`: exact equality of induced reactant-molecule to product-molecule relation set
21+
- `Atom-map exact`: exact reactant-atom to product-atom match against the gold file
22+
- `Atom-map chemically equivalent`: same bond-change set as gold, even if atom numbering differs
23+
- `Bond-change exact`: exact equality of the full bond-change set
24+
- `Bond-change count exact`: exact equality of total bond-change count
25+
- `Bond-change type exact`: exact equality of `FORM`/`BREAK`/`ORDER` counts
26+
- `Reaction-center exact`: exact equality of the changed-atom set
27+
- `Reaction-center atoms`: atom-level reaction-center accuracy
28+
- `True chemistry miss`: bond-change set differs from gold
29+
- `Speed`: reactions per second for the measured slice
30+
31+
## Results
32+
33+
| Slice | Mapping success | Mol-map exact | Atom-map exact | Atom-map chemically equivalent | Bond-change exact | Bond-change count exact | Bond-change type exact | Reaction-center exact | Reaction-center atoms | True chemistry miss | Speed |
34+
|------|-----------------|---------------|----------------|-------------------------------|-------------------|-------------------------|------------------------|-----------------------|----------------------|---------------------|-------|
35+
| `20` | `20/20 (100.0%)` | `11/20 (55.0%)` | `2/20 (10.0%)` | `20/20 (100.0%)` | `20/20 (100.0%)` | `20/20 (100.0%)` | `20/20 (100.0%)` | `20/20 (100.0%)` | `870/870 (100.0%)` | `0/20 (0.0%)` | `1.7 rxn/sec` |
36+
| `100` | `100/100 (100.0%)` | `71/100 (71.0%)` | `27/100 (27.0%)` | `100/100 (100.0%)` | `100/100 (100.0%)` | `100/100 (100.0%)` | `100/100 (100.0%)` | `100/100 (100.0%)` | `4509/4509 (100.0%)` | `0/100 (0.0%)` | `2.6 rxn/sec` |
37+
| `250` | `250/250 (100.0%)` | `189/250 (75.6%)` | `58/250 (23.2%)` | `248/250 (99.2%)` | `248/250 (99.2%)` | `248/250 (99.2%)` | `248/250 (99.2%)` | `248/250 (99.2%)` | `11747/11769 (99.8%)` | `2/250 (0.8%)` | `2.0 rxn/sec` |
38+
39+
## Interpretation
40+
41+
- The current branch is strong on chemistry correctness.
42+
- The main benchmark penalty is strict atom numbering, not wrong reaction chemistry.
43+
- On the first `100` reactions there were `0` true chemistry misses.
44+
- On the `250` slice there were `2` true chemistry misses and `190` alternate valid maps.
45+
- Reaction-center quality is effectively saturated on the measured slices.
46+
- Mol-map exact is much higher than atom-map exact, which is consistent with symmetry-equivalent atom labeling inside otherwise correct component mappings.
47+
48+
## Practical conclusion
49+
50+
The current benchmark should be read as:
51+
52+
- high chemistry correctness
53+
- moderate molecule-level exactness
54+
- low strict atom-number exactness
55+
- low throughput relative to the long-term target
56+
57+
The next optimization target should be strict atom-map canonicalization under symmetry, not bond-change chemistry.

src/main/java/com/bioinceptionlabs/reactionblast/cdk/CDKToolkit.java

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,12 @@
3939
import org.openscience.cdk.tools.CDKHydrogenAdder;
4040
import org.openscience.cdk.tools.manipulator.AtomContainerManipulator;
4141
import org.openscience.smsd.AtomAtomMapping;
42-
import org.openscience.smsd.Isomorphism;
43-
import org.openscience.smsd.Substructure;
4442
import org.openscience.smsd.AtomBondMatcher;
43+
import org.openscience.smsd.BaseMapping;
4544
import org.openscience.smsd.MoleculeInitializer;
4645

46+
import com.bioinceptionlabs.reactionblast.mapping.ReactionMappingEngine;
47+
import com.bioinceptionlabs.reactionblast.mapping.SmsdReactionMappingEngine;
4748
import com.bioinceptionlabs.reactionblast.model.AtomNode;
4849
import com.bioinceptionlabs.reactionblast.model.BondEdge;
4950
import com.bioinceptionlabs.reactionblast.model.ChemToolkit;
@@ -64,6 +65,9 @@
6465
*/
6566
public class CDKToolkit implements ChemToolkit {
6667

68+
private static final ReactionMappingEngine MAPPING_ENGINE
69+
= SmsdReactionMappingEngine.getInstance();
70+
6771
private final SmilesParser smilesParser;
6872
private final SmilesGenerator canonicalSmilesGen;
6973
private final SmilesGenerator mappedSmilesGen;
@@ -159,7 +163,7 @@ public boolean isSubstructure(MolecularGraph query, MolecularGraph target) {
159163
try {
160164
IAtomContainer q = unwrap(query);
161165
IAtomContainer t = unwrap(target);
162-
Substructure sub = new Substructure(q, t,
166+
BaseMapping sub = MAPPING_ENGINE.findSubstructure(q, t,
163167
AtomBondMatcher.atomMatcher(true, true),
164168
AtomBondMatcher.bondMatcher(true, true), true);
165169
return sub.isSubgraph();
@@ -175,7 +179,7 @@ public Map<AtomNode, AtomNode> findMCS(MolecularGraph mol1, MolecularGraph mol2)
175179
IAtomContainer ac2 = unwrap(mol2);
176180
MoleculeInitializer.initializeMolecule(ac1);
177181
MoleculeInitializer.initializeMolecule(ac2);
178-
Isomorphism iso = new Isomorphism(ac1, ac2,
182+
BaseMapping iso = MAPPING_ENGINE.findMcs(ac1, ac2,
179183
org.openscience.smsd.BaseMapping.Algorithm.VFLibMCS,
180184
AtomBondMatcher.atomMatcher(false, false),
181185
AtomBondMatcher.bondMatcher(false, false));

0 commit comments

Comments
 (0)