Skip to content

Commit b472f67

Browse files
committed
Fix MCS deadlock: poll() timeouts, future cancellation, SMSD 6.10.1
Root cause: CompletionService.take() blocks forever when MCS workers hang inside the SMSD engine. Replaced with poll(timeout) at three levels: - GraphMatcher: 15s per-pair poll + 60s overall matcher budget - CallableAtomMappingTool: 120s per-algorithm poll - Phase 1 inline .call() moved to executor with timeout Orphaned futures are now cancelled (interrupt=true) after timeout to free the shared static MAPPING_EXECUTOR pool. GraphMatcher finally block uses shutdownNow() to interrupt stuck local MCS threads. Also: SMSD upgraded to 6.10.1; energy filter NPE guard added in SmsdReactionMappingEngine.applyDefaultFilters().
1 parent 63e4900 commit b472f67

6 files changed

Lines changed: 91 additions & 21 deletions

File tree

ALGORITHM.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ For each reactant-product pair *(R_i, P_j)*, compute a Maximum Common Subgraph (
171171
φ_{ij} := {(a_k, a_k) : k = 1…|A(R_i)|} (direct 1:1 mapping)
172172
skip MCS
173173

174-
Canonical SMILES are generated by `MolGraph.toCanonicalSmiles()` (SMSD 6.9.1), which encodes tetrahedral chirality (`@`/`@@`) and E/Z geometry (`/`/`\`). This is essential: using a stereo-unaware generator would incorrectly short-circuit enantiomers (e.g. (R)-lactic acid ≡ (S)-lactic acid) to a spurious identity mapping.
174+
Canonical SMILES are generated by `MolGraph.toCanonicalSmiles()` (SMSD 6.10.1), which encodes tetrahedral chirality (`@`/`@@`) and E/Z geometry (`/`/`\`). This is essential: using a stereo-unaware generator would incorrectly short-circuit enantiomers (e.g. (R)-lactic acid ≡ (S)-lactic acid) to a spurious identity mapping.
175175

176176
**Stage 2 — Size ratio filter:**
177177

@@ -393,7 +393,7 @@ RINGS resolves the majority of reactions via the funnel at a 2-4x computational
393393

394394
| Component | Version | Role |
395395
|-----------|---------|------|
396-
| SMSD | 6.9.1 | MCS engine: VF2++ subgraph isomorphism, circular/path fingerprints, MolGraph canonical SMILES (stereo-aware) |
396+
| SMSD | 6.10.1 | MCS engine: VF2++ subgraph isomorphism, circular/path fingerprints, MolGraph canonical SMILES (stereo-aware) |
397397
| CDK | 2.12 | Molecule I/O, atom typing, aromaticity perception, ring finding |
398398
| Java | 21+ | Platform |
399399

changes.log

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,14 @@ c) clean up
284284
-----------------------
285285
Changes (2026-04-03) — v3.9.0
286286
-----------------------
287-
a) Identity shortcut stereo + multiplicity fix: isIdentityReaction() now uses
287+
a) SMSD upgraded to 6.10.1 with energy filter NPE guard.
288+
b) Deadlock fix: GraphMatcher and CallableAtomMappingTool now use
289+
CompletionService.poll(timeout) instead of take() (which blocks forever),
290+
with per-pair budget (15s), overall matcher budget (60s), and algorithm
291+
worker timeout (120s). Orphaned futures are cancelled on timeout to free
292+
the shared executor pool. Phase 1 inline call moved to executor with
293+
timeout to prevent main-thread deadlocks.
294+
c) Identity shortcut stereo + multiplicity fix: isIdentityReaction() now uses
288295
SmiFlavor.Canonical|Stereo (E/Z and R/S are distinguished) and a sorted
289296
List instead of a TreeSet (stoichiometric multiplicity is preserved).
290297
Previously, F/C=C/F>>F/C=C\F was incorrectly classified as a transporter

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@
178178
<dependency>
179179
<groupId>com.bioinceptionlabs</groupId>
180180
<artifactId>smsd</artifactId>
181-
<version>6.9.1</version>
181+
<version>6.10.1</version>
182182
</dependency>
183183

184184
<!-- https://mvnrepository.com/artifact/commons-cli/commons-cli -->

src/main/java/com/bioinceptionlabs/reactionblast/mapping/CallableAtomMappingTool.java

Lines changed: 33 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ public class CallableAtomMappingTool implements Serializable {
5959
private final static ILoggingTool LOGGER
6060
= createLoggingTool(CallableAtomMappingTool.class);
6161
private static final long serialVersionUID = 0x29e2adb1716b13eL;
62+
/** Hard timeout per algorithm worker. Covers all MCS pairs + matrix selection. */
63+
private static final long ALGORITHM_TIMEOUT_MS = 120_000L; // 2 minutes
6264
private static final int MAPPING_PARALLELISM
6365
= Math.max(2, Math.min(3, Runtime.getRuntime().availableProcessors()));
6466
private static final ExecutorService MAPPING_EXECUTOR
@@ -146,10 +148,13 @@ private void generateAtomAtomMapping(
146148
boolean hasRings = hasRingSystems(standardizedReaction);
147149
IMappingAlgorithm firstPass = (checkComplex && hasRings) ? RINGS : MIN;
148150
if (totalMolecules <= 5) {
151+
java.util.concurrent.Future<Reactor> phase1Future = null;
149152
try {
150-
Reactor firstPassResult = new MappingThread(
153+
phase1Future = MAPPING_EXECUTOR.submit(new MappingThread(
151154
"IMappingAlgorithm." + firstPass.name(),
152-
standardizedReaction, firstPass, removeHydrogen).call();
155+
standardizedReaction, firstPass, removeHydrogen));
156+
Reactor firstPassResult = phase1Future.get(
157+
ALGORITHM_TIMEOUT_MS, java.util.concurrent.TimeUnit.MILLISECONDS);
153158
putSolution(firstPass, firstPassResult);
154159

155160
if (isMappingAcceptable(firstPassResult)) {
@@ -158,6 +163,11 @@ private void generateAtomAtomMapping(
158163
return;
159164
}
160165
LOGGER.debug(firstPass + " mapping insufficient — running remaining algorithms");
166+
} catch (java.util.concurrent.TimeoutException e) {
167+
LOGGER.warn(firstPass + " phase timed out after " + ALGORITHM_TIMEOUT_MS + "ms");
168+
if (phase1Future != null) {
169+
phase1Future.cancel(true);
170+
}
161171
} catch (InterruptedException | ExecutionException e) {
162172
LOGGER.debug(firstPass + " phase failed: " + e.getMessage());
163173
LOGGER.error(e);
@@ -184,24 +194,31 @@ private void generateAtomAtomMapping(
184194
remaining = new IMappingAlgorithm[]{MIN, MAX, MIXTURE, RINGS};
185195
}
186196

197+
java.util.List<java.util.concurrent.Future<Reactor>> submittedFutures = new java.util.ArrayList<>();
187198
try {
188199
CompletionService<Reactor> cs = new ExecutorCompletionService<>(MAPPING_EXECUTOR);
189200
int jobCounter = 0;
190201
for (IMappingAlgorithm algo : remaining) {
191202
LOGGER.debug("Submitting " + algo.description());
192-
cs.submit(new MappingThread("IMappingAlgorithm." + algo.name(),
193-
standardizedReaction, algo, removeHydrogen));
203+
submittedFutures.add(cs.submit(new MappingThread("IMappingAlgorithm." + algo.name(),
204+
standardizedReaction, algo, removeHydrogen)));
194205
jobCounter++;
195206
}
207+
int collected = 0;
196208
for (int i = 0; i < jobCounter; i++) {
197209
try {
198-
Reactor chosen = cs.take().get();
210+
java.util.concurrent.Future<Reactor> future =
211+
cs.poll(ALGORITHM_TIMEOUT_MS, java.util.concurrent.TimeUnit.MILLISECONDS);
212+
if (future == null) {
213+
LOGGER.warn("Algorithm poll timed out after " + ALGORITHM_TIMEOUT_MS
214+
+ "ms — " + (jobCounter - collected) + " remaining algorithms skipped");
215+
break;
216+
}
217+
collected++;
218+
Reactor chosen = future.get(); // already complete
199219
putSolution(chosen.getAlgorithm(), chosen);
200220
} catch (ExecutionException e) {
201-
// One algorithm worker failed — log and continue collecting the rest.
202-
// The lower MCS layer already handles per-pair failures; this catch
203-
// prevents a single bad algorithm from silently dropping all remaining
204-
// successful results.
221+
collected++;
205222
LOGGER.debug("Algorithm worker failed: " + e.getCause());
206223
LOGGER.error(e);
207224
} catch (InterruptedException e) {
@@ -210,6 +227,13 @@ private void generateAtomAtomMapping(
210227
break;
211228
}
212229
}
230+
// Cancel any orphaned workers still running in the shared pool
231+
// so they don't starve future mapping requests
232+
for (java.util.concurrent.Future<Reactor> f : submittedFutures) {
233+
if (!f.isDone()) {
234+
f.cancel(true);
235+
}
236+
}
213237
LOGGER.debug("======DONE CallableAtomMappingTool=======");
214238
} catch (Exception e) {
215239
LOGGER.debug("ERROR: in AtomMappingTool: " + e.getMessage());

src/main/java/com/bioinceptionlabs/reactionblast/mapping/GraphMatcher.java

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,12 @@ public class GraphMatcher extends Debugger {
7878
private static final int SINGLE_SUBGRAPH_MATCH = 1;
7979
private static final long SUBGRAPH_TIMEOUT_MS = 5_000L;
8080
private static final long MCS_TIMEOUT_MS = 10_000L;
81+
/** Hard timeout per poll() call waiting for the next completed MCS pair. */
82+
private static final long MCS_POLL_TIMEOUT_MS = 15_000L;
83+
/** Overall wall-clock budget for the entire matcher() call. */
84+
private static final long MATCHER_BUDGET_MS = 60_000L;
85+
/** Hard timeout for the executor shutdown after collection. */
86+
private static final long MATCHER_SHUTDOWN_TIMEOUT_MS = 2_000L;
8187

8288
/**
8389
* @author Syed Asad Rahman <asad.rahman@bioinceptionlabs.com>
@@ -449,22 +455,45 @@ public static Collection<MCSSolution> matcher(Holder mh) throws Exception {
449455

450456
LOGGER.debug("submited " + taskCounter + " jobs");
451457
Collection<MCSSolution> threadedUniqueMCSSolutions = new ArrayList<>();
458+
int collected = 0;
459+
long matcherDeadline = currentTimeMillis() + MATCHER_BUDGET_MS;
452460
for (int count = 0; count < taskCounter; count++) {
453461
try {
454-
MCSSolution isomorphism = callablesQueue.take().get();
455-
threadedUniqueMCSSolutions.add(isomorphism);
462+
long remaining = matcherDeadline - currentTimeMillis();
463+
if (remaining <= 0) {
464+
LOGGER.warn("Matcher budget (" + MATCHER_BUDGET_MS
465+
+ "ms) exhausted — " + (taskCounter - collected)
466+
+ " remaining pairs skipped");
467+
break;
468+
}
469+
long pollMs = Math.min(remaining, MCS_POLL_TIMEOUT_MS);
470+
java.util.concurrent.Future<MCSSolution> future =
471+
callablesQueue.poll(pollMs, TimeUnit.MILLISECONDS);
472+
if (future == null) {
473+
LOGGER.warn("MCS poll timed out after " + pollMs
474+
+ "ms — " + (taskCounter - collected)
475+
+ " remaining pairs will be skipped");
476+
break;
477+
}
478+
MCSSolution isomorphism = future.get(); // already complete
479+
if (isomorphism != null) {
480+
threadedUniqueMCSSolutions.add(isomorphism);
481+
}
482+
collected++;
456483
} catch (ExecutionException ex) {
484+
collected++;
457485
Throwable cause = ex.getCause() != null ? ex.getCause() : ex;
458486
LOGGER.error(SEVERE, "MCS worker failed", cause);
459487
}
460488
}
461489
// Add directly-constructed identity mappings (bypassed MCSThread)
462490
threadedUniqueMCSSolutions.addAll(directMCSSolutions);
463-
// This will make the executor accept no new threads
464-
// and finish all existing threads in the queue
491+
// Shut down the local executor; interrupt any stuck threads
465492
executor.shutdown();
466-
// Wait until all threads are finished
467-
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
493+
if (!executor.awaitTermination(MATCHER_SHUTDOWN_TIMEOUT_MS, TimeUnit.MILLISECONDS)) {
494+
LOGGER.warn("MCS executor did not shut down cleanly — forcing shutdown");
495+
executor.shutdownNow();
496+
}
468497

469498
LOGGER.debug("==Gathering MCS solution from the Thread==");
470499
long replayedMappings = 0;
@@ -502,7 +531,7 @@ public static Collection<MCSSolution> matcher(Holder mh) throws Exception {
502531
LOGGER.error(SEVERE, null, ex);
503532
} finally {
504533
if (executor != null) {
505-
executor.shutdown();
534+
executor.shutdownNow();
506535
}
507536
}
508537
return unmodifiableCollection(mcsSolutions);

src/main/java/com/bioinceptionlabs/reactionblast/mapping/SmsdReactionMappingEngine.java

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,17 @@ public BaseMapping findSubstructure(IQueryAtomContainer query,
105105
@Override
106106
public void applyDefaultFilters(BaseMapping mapping) {
107107
if (mapping != null) {
108-
mapping.setChemFilters(true, true, true);
108+
try {
109+
mapping.setChemFilters(true, true, true);
110+
} catch (NullPointerException ex) {
111+
// SMSD 6.10.1 energy filter can NPE on certain molecule pairs
112+
// where bond-energy lookup returns null. Fall back to stereo+fragment only.
113+
try {
114+
mapping.setChemFilters(true, true, false);
115+
} catch (Exception fallback) {
116+
// last resort — no filters at all
117+
}
118+
}
109119
}
110120
}
111121
}

0 commit comments

Comments
 (0)