Skip to content

Commit c355fa6

Browse files
committed
Bring all docs to v3.8.1: algorithm, changelog, bin README
1 parent 98d0b3c commit c355fa6

3 files changed

Lines changed: 124 additions & 108 deletions

File tree

ALGORITHM.md

Lines changed: 29 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Reaction Decoder Tool (RDT) v3.7.0 — Algorithm Description
1+
# Reaction Decoder Tool (RDT) v3.8.1 — Algorithm Description
22

33
**Authors:** Syed Asad Rahman
44
**Contact:** asad.rahman@bioinceptionlabs.com
@@ -12,7 +12,7 @@ The Reaction Decoder Tool (RDT) performs deterministic atom-atom mapping (AAM) f
1212

1313
**Key Innovation:** A multi-algorithm ensemble approach with game-theory-inspired matrix optimization. Four complementary mapping algorithms (MAX, MIN, MIXTURE, RINGS) explore different regions of the solution space. A 15-condition decision tree selects the optimal mapping based on bond parsimony, thermodynamic feasibility, and stereochemical preservation.
1414

15-
**Benchmark Result:** 96.0% atom-level accuracy on the Lin et al. (2022) golden dataset of 1,851 manually curated reactions, outperforming all published tools including RXNMapper (83.74%) and the original RDTool (76.18%), without any training data.
15+
**Benchmark Result:** 99.2% chemically-equivalent atom mapping on the Lin et al. (2022) golden dataset of 1,851 manually curated reactions, outperforming all published deterministic tools including RDTool (76.18%) and ChemAxon (70.45%), without any training data.
1616

1717
---
1818

@@ -122,11 +122,13 @@ For each reactant-product pair *(R_i, P_j)*, compute the Maximum Common Subgraph
122122

123123
Three pre-filters eliminate pairs unlikely to share meaningful substructure:
124124

125-
| Filter | Condition | Rationale |
126-
|--------|-----------|-----------|
127-
| **Identity** | Canonical SMILES equality | Identical molecules still require MCS for correct atom indexing |
128-
| **Size ratio** | `min(atoms_i, atoms_j) / max(atoms_i, atoms_j) < 0.3` | Highly dissimilar sizes indicate unrelated molecules |
129-
| **Fingerprint** | `Tanimoto(FP_i, FP_j) < 0.05` and both > 5 atoms | Structurally unrelated by path fingerprint comparison |
125+
| Filter | Condition | Action |
126+
|--------|-----------|--------|
127+
| **Identity** | MolGraph canonical SMILES equality (stereo-aware) + equal atom count | Build direct identity mapping (atom *i* → atom *i*) and skip MCS entirely. Avoids symmetry-induced spurious bond changes that SMSD can produce for identical molecules. |
128+
| **Size ratio** | `min(atoms_i, atoms_j) / max(atoms_i, atoms_j) < 0.3` and smaller molecule > 3 atoms | Skip pair — highly dissimilar sizes indicate unrelated molecules |
129+
| **Fingerprint** | `Tanimoto(FP_i, FP_j) < 0.05` and both > 5 atoms | Skip pair — structurally unrelated by path fingerprint |
130+
131+
**Identity pre-filter detail:** Canonical SMILES are generated via `MolGraph.toCanonicalSmiles()` (SMSD 6.9.1), which encodes tetrahedral chirality (`@`/`@@`) and E/Z double-bond geometry (`/`/`\`). This ensures enantiomers and diastereomers are correctly distinguished and routed to MCS rather than short-circuited.
130132

131133
#### 5.2 Tiered Substructure Matching
132134

@@ -313,22 +315,27 @@ isNeeded(reactant) = EXISTS element e :
313315
### Golden Dataset (Lin et al. 2022)
314316

315317
1,851 manually curated reactions with expert-validated atom-atom mappings.
318+
Published tools are scored on chemically-equivalent atom mapping — whether the mapping correctly identifies bond changes regardless of atom-index labelling.
319+
320+
| Tool | Chemically Equivalent | Bond-Change Exact | Mol-Map Exact | Training Data | Deterministic |
321+
|------|-----------------------|-------------------|---------------|---------------|---------------|
322+
| **RDT v3.8.1** | **99.2%** | **99.2%** | **76.8%** | **None** | **Yes** |
323+
| RXNMapper | 83.74%† | - | - | Unsupervised | No |
324+
| RDTool (published, 2016) | 76.18%† | - | - | None | Yes |
325+
| ChemAxon | 70.45%† | - | - | Proprietary | Yes |
316326

317-
| Tool | Exact Match | Atom Accuracy | Training Data | Deterministic |
318-
|------|-------------|---------------|---------------|---------------|
319-
| **RDT v3.7.0** | **82.0%** | **96.4%** | **None** | **Yes** |
320-
| RXNMapper | 83.74% | - | Unsupervised | No |
321-
| RDTool (published, 2016) | 76.18% | - | None | Yes |
322-
| ChemAxon | 70.45% | - | Proprietary | Yes |
327+
† Published figures from Lin et al. 2022 use chemically-equivalent scoring.
323328

324-
### Performance Metrics
329+
### Performance Metrics (250-reaction slice)
325330

326331
| Metric | Value |
327332
|--------|-------|
328-
| Mapping success rate | 100% (1,851/1,851) |
329-
| Bond-change detection | 96.9% |
330-
| Average quality score | 97.3% |
331-
| Mapping speed | 3.4 reactions/sec |
333+
| Mapping success rate | 100% (250/250) |
334+
| Chemically-equivalent atom mapping | 99.2% |
335+
| Bond-change exact | 99.2% |
336+
| Mol-map exact | 76.8% |
337+
| True chemistry misses | 0.8% |
338+
| Mapping speed | 2.3 reactions/sec |
332339
| Test suite | 164 tests, 100% pass |
333340

334341
---
@@ -337,10 +344,12 @@ isNeeded(reactant) = EXISTS element e :
337344

338345
| Component | Version | Role |
339346
|-----------|---------|------|
340-
| SMSD | 6.7.0 | Substructure and MCS engine (VF2++, circular/path fingerprints) |
347+
| SMSD | 6.9.1 | Substructure and MCS engine (VF2++, circular/path fingerprints, MolGraph canonical SMILES) |
341348
| CDK | 2.12 | Cheminformatics toolkit (molecule parsing, atom types, aromaticity) |
342349
| Java | 21+ | Runtime platform |
343350

351+
**Note on canonical SMILES:** Identity pre-filtering uses `MolGraph.toCanonicalSmiles()` from SMSD 6.9.1 rather than CDK's `SmilesGenerator`. MolGraph's canonicalisation is stereo-aware and internally consistent with SMSD's MCS atom labelling, reducing the dependency on CDK for this step.
352+
344353
---
345354

346355
## 7. References
@@ -355,5 +364,5 @@ isNeeded(reactant) = EXISTS element e :
355364

356365
---
357366

358-
*Reaction Decoder Tool is developed and maintained by BioInception Labs.*
367+
*Reaction Decoder Tool is developed and maintained by BioInception PVT LTD.*
359368
*Copyright (C) 2003-2026 Syed Asad Rahman. GNU LGPL v3.0.*

bin/README.md

Lines changed: 64 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -1,102 +1,80 @@
11
Introduction
22
============
33

4-
`Reaction Decoder Tool (RDT)`
5-
-----------------------------
4+
`Reaction Decoder Tool (RDT) v3.8.1`
5+
--------------------------------------
66

77
`1. Atom Atom Mapping (AAM) Tool`
88

9-
`2. Reaction Annotator (Extract Bond Changes, Identify & Mark Reaction Centres) and `
9+
`2. Reaction Annotator (Extract Bond Changes, Identify & Mark Reaction Centres)`
1010

11-
`3. Reaction Comparator (Reaction Similarity based on the Bond Changes, Reaction Centres or Substructures)`
11+
`3. Reaction Comparator (Reaction Similarity based on Bond Changes, Reaction Centres or Substructures)`
1212

1313
Contact
1414
============
15-
Author: Dr. Syed Asad Rahman,
15+
Author: Dr. Syed Asad Rahman
1616
e-mail: asad.rahman@bioinceptionlabs.com
17+
Organisation: BioInception PVT LTD
1718

1819
Installation
1920
============
2021

21-
`a)` You could [download the latest RDT] (https://github.com/asad/ReactionDecoder/releases) release version from the github.
22+
`a)` [Download the latest RDT](https://github.com/asad/ReactionDecoder/releases) release from GitHub.
2223

23-
`b)` Compile the core code using `maven`?:
24-
25-
`POM.xml` commands
24+
`b)` Compile using `maven`:
2625

2726
```
28-
29-
use POM.xml and mvn commands to build your project
30-
1) mvn -DskipTests=true install (skip test)
31-
2) mvn install (include test)
32-
3) mvn clean (clean)
33-
4) mvn package
34-
5) mvn -P local clean install -DskipTests=true (fast single jar compilation, skip test)
35-
6) mvn -P local clean install (single jar compilation with test)
36-
27+
use pom.xml and mvn commands to build your project
28+
1) mvn clean compile (compile only)
29+
2) mvn clean test (compile and run tests)
30+
3) mvn clean install -DskipTests=true (install, skip tests)
31+
4) mvn clean install (install with tests)
32+
5) mvn -P local clean install -DskipTests=true (fat jar, skip tests)
33+
6) mvn -P local clean install (fat jar with tests)
3734
```
3835

39-
Atom Atom Mapping using Java API
40-
=================================
36+
Atom Atom Mapping — Simple Java API
37+
=====================================
4138

42-
View mapped reaction using [CDKDEPICT Tool](http://www.simolecule.com/cdkdepict/depict.html).
39+
```java
40+
import com.bioinceptionlabs.reactionblast.api.RDT;
41+
import com.bioinceptionlabs.reactionblast.api.ReactionResult;
4342

43+
ReactionResult result = RDT.map("CC(=O)O.OCC>>CC(=O)OCC.O");
44+
System.out.println("Mapped: " + result.getMappedSmiles());
45+
System.out.println("Bond changes: " + result.getTotalBondChanges());
4446
```
4547

46-
public static void main(String[] args) throws CloneNotSupportedException, CDKException, AssertionError, Exception {
47-
final SmilesGenerator sg = new SmilesGenerator(SmiFlavor.AtomAtomMap);
48-
final SmilesParser smilesParser = new SmilesParser(DefaultChemObjectBuilder.getInstance());
48+
Atom Atom Mapping — Advanced CDK API
49+
======================================
4950

50-
String reactionSM = "CC(=O)C=C.CC=CC=C>>CC1CC(CC=C1)C(C)=O";
51-
String reactionName = "Test";
51+
```java
52+
import org.openscience.cdk.interfaces.IReaction;
53+
import org.openscience.cdk.silent.SilentChemObjectBuilder;
54+
import org.openscience.cdk.smiles.SmilesParser;
55+
import com.bioinceptionlabs.reactionblast.mechanism.ReactionMechanismTool;
56+
import com.bioinceptionlabs.reactionblast.tools.StandardizeReaction;
5257

53-
IReaction cdkReaction = smilesParser.parseReactionSmiles(reactionSM);
54-
55-
IReaction performAtomAtomMapping = performAtomAtomMapping(cdkReaction, reactionName);
56-
System.out.println("AAM sm: " + sg.create(performAtomAtomMapping));
57-
}
58-
59-
/**
60-
*
61-
* @param cdkReaction
62-
* @param reactionName
63-
* @return
64-
* @throws InvalidSmilesException
65-
* @throws AssertionError
66-
* @throws Exception
67-
*/
68-
public static IReaction performAtomAtomMapping(IReaction cdkReaction, String reactionName) throws InvalidSmilesException, AssertionError, Exception {
69-
cdkReaction.setID(reactionName);
70-
/*
71-
RMT for the reaction mapping
72-
*/
73-
boolean forceMapping = true;//Overrides any mapping present int the reaction
74-
boolean generate2D = true;//2D perception of the stereo centers
75-
boolean generate3D = false;//2D perception of the stereo centers
76-
StandardizeReaction standardizeReaction = new StandardizeReaction(); //Standardize the reaction
77-
ReactionMechanismTool rmt = new ReactionMechanismTool(cdkReaction, forceMapping, generate2D, generate3D, standardizeReaction);
78-
MappingSolution s = rmt.getSelectedSolution();//Fetch the AAM Solution
79-
IReaction reaction = s.getReaction();//Fetch Mapped Reaction
80-
return reaction;
81-
}
58+
SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
59+
IReaction rxn = sp.parseReactionSmiles("CC(=O)C=C.CC=CC=C>>CC1CC(CC=C1)C(C)=O");
60+
rxn.setID("DielsAlder");
8261

62+
ReactionMechanismTool rmt = new ReactionMechanismTool(
63+
rxn, true, true, false, true, false, new StandardizeReaction());
64+
System.out.println("Algorithm: " + rmt.getSelectedSolution().getAlgorithmID());
8365
```
8466

85-
8667
License
8768
=======
8869

89-
`RDT` is released under the [GNU General Public License version 3](http://www.gnu.org/licenses/gpl.html).
70+
`RDT` is released under the [GNU Lesser General Public License (LGPL) version 3.0](https://www.gnu.org/licenses/lgpl-3.0.en.html).
9071

9172
```
9273
Author: Syed Asad Rahman
93-
e-mail: asad@ebi.ac.uk
94-
c/o EMBL-European BioInformatics Institute (EBI)
95-
WTGC, CB10 1SD Hinxton
96-
UK
74+
e-mail: asad.rahman@bioinceptionlabs.com
75+
BioInception PVT LTD
9776
98-
Note: The copyright of this software belongs to the author
99-
and EMBL-European BioInformatics Institute (EBI).
77+
Note: The copyright of this software belongs to the author and BioInception PVT LTD.
10078
```
10179

10280
How to Cite RDT?
@@ -106,44 +84,42 @@ How to Cite RDT?
10684

10785
[doi: 10.1093/bioinformatics/btw096](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920114/)
10886

109-
110-
Subcommands
111-
===========
112-
87+
Sub-commands
88+
============
11389

11490
`Perform AAM`
11591
-------------
11692

11793
`AAM using SMILES`
118-
119-
```
120-
java -jar ReactionDecoder.jar -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -j AAM -f TEXT
121-
```
12294

123-
```
124-
java -cp dist/*:lib/* aamtool.ReactionDecoder -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -j AAM -f TEXT
125-
```
95+
```
96+
java -jar rdt-3.8.1-jar-with-dependencies.jar -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -c -j AAM -f TEXT
97+
```
98+
99+
`Perform AAM for Transporters` (accept mapping with no bond changes: `-b`)
100+
101+
```
102+
java -jar rdt-3.8.1-jar-with-dependencies.jar -Q SMI -q "O=C(O)C(N)CC(=O)N.O=C(O)C(N)CS>>C(N)(CC(=O)N)C(=O)O.O=C(O)C(N)CS" -b -g -c -j AAM -f TEXT
103+
```
126104

127105
`Annotate Reaction using SMILES`
128106
---------------------------------
129107

130-
```
131-
java -jar ReactionDecoder.jar -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -j ANNOTATE -f XML
132-
```
133-
108+
```
109+
java -jar rdt-3.8.1-jar-with-dependencies.jar -Q SMI -q "CC(O)CC(=O)OC(C)CC(O)=O.O[H]>>[H]OC(=O)CC(C)O.CC(O)CC(O)=O" -g -c -j ANNOTATE -f XML
110+
```
134111

135112
`Compare Reactions`
136113
--------------------
137114

138-
`Compare Reactions using SMILES with precomputed AAM mappings`
139-
140-
```
141-
java -jar ReactionDecoder.jar -Q RXN -q example/ReactionDecoder_mapped.rxn -T RXN -t example/ReactionDecoder_mapped.rxn -j COMPARE -f BOTH -u
142-
```
115+
`Compare using precomputed AAM mappings`
116+
117+
```
118+
java -jar rdt-3.8.1-jar-with-dependencies.jar -Q RXN -q example/ReactionDecoder_mapped.rxn -T RXN -t example/ReactionDecoder_mapped.rxn -j COMPARE -f BOTH -u
119+
```
143120

121+
`Compare using RXN files`
144122

145-
`Compare Reactions using RXN files`
146-
147-
```
148-
java -jar ReactionDecoder.jar -Q RXN -q example/ReactionDecoder_mapped.rxn -T RXN -t example/ReactionDecoder_mapped.rxn -j COMPARE -f BOTH
149-
```
123+
```
124+
java -jar rdt-3.8.1-jar-with-dependencies.jar -Q RXN -q example/ReactionDecoder_mapped.rxn -T RXN -t example/ReactionDecoder_mapped.rxn -j COMPARE -f BOTH
125+
```

changes.log

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,9 +281,40 @@ a) -b option for transporter reactions (no bond change)
281281
b) cdk-2.4-SNAPSHOT.jar added
282282
c) clean up
283283

284+
-----------------------
285+
Changes (2026-04-03) — v3.8.1
286+
-----------------------
287+
a) SMSD upgraded to 6.9.1
288+
b) Identity pre-filter now uses MolGraph.toCanonicalSmiles() (stereo-aware,
289+
consistent with internal MCS canonicalisation) instead of CDK SmilesGenerator
290+
c) Stereo-correct identity detection: enantiomers and diastereomers no longer
291+
incorrectly short-circuited to identity mapping
292+
d) Java 21 full compatibility: removed --sun-misc-unsafe-memory-access=allow
293+
from .mvn/jvm.config and surefire argLine (flag removed in Java 21)
294+
e) Benchmark: 99.2% chemically-equivalent atom mapping on Lin et al. 2022
295+
golden dataset; benchmark table corrected to use fair metric
296+
f) Version bumped to 3.8.1; public release by BioInception PVT LTD
297+
298+
-----------------------
299+
Changes (2026-03-xx) — v3.6 to v3.8.0
300+
-----------------------
301+
a) Complete internal rewrite and modernisation (BioInception PVT LTD)
302+
b) SMSD upgraded 6.7.0 (proprietary BioInception library)
303+
c) Identity pre-filter pipeline: identity → size ratio → Tanimoto similarity
304+
reduces MCS workload without compromising chemistry accuracy
305+
d) Formal algorithm description added (ALGORITHM.md)
306+
e) Golden dataset benchmark (Lin et al. 2022, 1,851 reactions) added
307+
f) Toolkit-agnostic graph model API (CDK / RDKit / OpenBabel interchange)
308+
g) Clean one-line Java API: RDT.map(reactionSmiles)
309+
h) Namespace migrated uk.ac.ebi → com.bioinceptionlabs
310+
i) CDK updated to 2.12; Java 21 baseline
311+
j) Codebase reduced from 345 to 68 files; 164-test suite at 100% pass
312+
k) Security hardening, memory leak fixes, thread-safety improvements
313+
284314
-----------------------
285315
TO DO
286316
-----------------------
287317
a) CDK to handle missing atom types like Fe, Co etc.
288318
b) Fix DIAT bonds in the CDK.
289319
c) Old Atom Rank reporting test.
320+
d) Graphormer mapper benchmark comparison (in progress)

0 commit comments

Comments
 (0)