Skip to content

Don't persist inbound committed onions in prod#4599

Open
valentinewallace wants to merge 1 commit into
lightningdevkit:mainfrom
valentinewallace:2026-05-no-persist-onions-yet
Open

Don't persist inbound committed onions in prod#4599
valentinewallace wants to merge 1 commit into
lightningdevkit:mainfrom
valentinewallace:2026-05-no-persist-onions-yet

Conversation

@valentinewallace
Copy link
Copy Markdown
Contributor

A few PRs ago, we started persisting inbound committed HTLC onions in Channels. These onions were persisted to lay groundwork for reconstructing the ChannelManager's pending HTLC maps from them in a future version. However, we've since discovered a different direction where we can instead reconstruct these same maps using persistent monitor events, which may mean that we don't need to persist these onions.

Since persisting a bunch of onions on every manager write is a big commitment that we're not fully confident in yet, switch it off for now until we either confirm the monitor events direction and can delete all this onion persisting code OR realize that we definitely do need it.

@valentinewallace valentinewallace added this to the 0.3 milestone May 5, 2026
@valentinewallace valentinewallace self-assigned this May 5, 2026
@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label May 5, 2026
@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 5, 2026

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Comment thread lightning/src/ln/channel.rs
@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented May 5, 2026

I've now thoroughly reviewed every changed file and every hunk in the diff. My prior review analysis was correct and comprehensive. The key points:

  1. Production write path: inbound_committed_update_adds stays empty (push gated by #[cfg(test)]), so optional_vec skips TLV 75 entirely.
  2. Production read path: TLV 75 absent → inbound_committed_update_adds_opt stays None (manual init at line 16372) → if let Some(...) at line 16554 is not entered → committed HTLCs keep Legacy state.
  3. Production reconstruct: None.is_some_and(...)false → reconstruction never attempted.
  4. Production legacy read: None.map_or(true, ...)true → legacy forward_htlcs always read.
  5. Write/read symmetry: SERIALIZATION_VERSION remains 1, forward_htlcs always written and always read.
  6. Test paths: Onions still written in tests, reconstruction randomly enabled for coverage.

My prior inline comment was a false positive that I already corrected in the prior top-level comment. No new issues found.

No issues found.

The prior inline comment about optional_vec initialization was already confirmed as a false positive in the prior top-level review. The manual ReadableArgs impl for FundedChannel explicitly initializes inbound_committed_update_adds_opt to None (not Some(Vec::new())), so the if let Some(update_adds) guard correctly skips the block when TLV 75 is absent.

All code paths (production write, production read, test write, test read, reconstruction gating, legacy map reading) are internally consistent and correctly handle the #[cfg(test)] gating and the Option<u8> version constant.

@ldk-reviews-bot ldk-reviews-bot requested a review from wpaulino May 5, 2026 22:08
@valentinewallace valentinewallace force-pushed the 2026-05-no-persist-onions-yet branch from c50518a to d68dbf8 Compare May 5, 2026 22:11
@valentinewallace valentinewallace removed the request for review from wpaulino May 5, 2026 22:16
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 54.54545% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.42%. Comparing base (b64efcd) to head (b82b6a4).
⚠️ Report is 86 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channel.rs 20.00% 4 Missing ⚠️
lightning/src/ln/channelmanager.rs 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4599      +/-   ##
==========================================
+ Coverage   86.22%   86.42%   +0.19%     
==========================================
  Files         159      158       -1     
  Lines      109170   109294     +124     
  Branches   109170   109294     +124     
==========================================
+ Hits        94136    94453     +317     
+ Misses      12424    12290     -134     
+ Partials     2610     2551      -59     
Flag Coverage Δ
fuzzing-fake-hashes 5.08% <0.00%> (?)
fuzzing-real-hashes 22.77% <18.18%> (?)
tests 86.15% <54.54%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider rolling back all the behavioral parts of #4227 rather than only gating writing in prod. With this PR, prod stops writing the inbound onion data, but the version/reconstruction machinery remains. That leaves SERIALIZATION_VERSION == 2 effectively reserved for “reconstruct HTLC maps from channel data.” If any released version already interprets version 2 that way, we cannot safely reuse version 2 for an unrelated serialization bump later. Older readers would start reconstructing.

The standalone cleanups/refactors from the original PR that are useful on their own can be kept ofc. If we decide to pick up the inbound-onion reconstruction path later, reverting the revert should be straightforward, and in the meantime we avoid carrying extra confusing dormant code in an already subtle area.

@valentinewallace
Copy link
Copy Markdown
Contributor Author

I think we should consider rolling back all the behavioral parts of #4227 rather than only gating writing in prod. With this PR, prod stops writing the inbound onion data, but the version/reconstruction machinery remains. That leaves SERIALIZATION_VERSION == 2 effectively reserved for “reconstruct HTLC maps from channel data.” If any released version already interprets version 2 that way, we cannot safely reuse version 2 for an unrelated serialization bump later. Older readers would start reconstructing.

The standalone cleanups/refactors from the original PR that are useful on their own can be kept ofc. If we decide to pick up the inbound-onion reconstruction path later, reverting the revert should be straightforward, and in the meantime we avoid carrying extra confusing dormant code in an already subtle area.

As mentioned in the PR description and offline, the monitor events approach is still in early enough stages that I'm not really confident at all that we don't need the dormant code. As long as we don't persist anything in prod, we always retain the option to cleanly remove the it later. So my preference would be to make at least a bit more progress on monitor events first, and I'm happy to prioritize removing the dormant code if/as soon as it makes more sense. Also, even if we do decide to go in that direction ~immediately, I still would prefer to land this PR first to minimize 0.3 blockage.

W/r/t reserving the serialization version, similar to previous discussions I don't see it as a huge issue since we've never bumped it in the past, but I'm also fine with hardcoding it to u8::max for now to sidestep those concerns. Or maybe go with the TLV flag approach, I forget if there was some reason not to do that.

Copy link
Copy Markdown
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that it is not clear yet that the dormant code is definitely unnecessary. My point was more that revert now, then revert-the-revert later if we decide to pick this path back up, seems like a reasonable option too. But I do not feel strongly enough to block on that if you would rather keep the code around for now and remove it as soon as the monitor-events direction is clearer.

In that case, I do think we should fully disable reconstruction for now. I know we do not currently expect unrelated manager serialization version bumps, but while prod intentionally is not writing the inbound onion data, reserving version 2 for reconstruction feels like a needless constraint.

@valentinewallace valentinewallace force-pushed the 2026-05-no-persist-onions-yet branch from d68dbf8 to 5e3e347 Compare May 15, 2026 15:53
@valentinewallace
Copy link
Copy Markdown
Contributor Author

I understood that it is not clear yet that the dormant code is definitely unnecessary. My point was more that revert now, then revert-the-revert later if we decide to pick this path back up, seems like a reasonable option too.

Understood, that is definitely an option.

Pushed an update that doesn't commit us to any manager serialization bump as discussed offline.

A few PRs ago, we started persisting inbound committed HTLC onions in Channels.
These onions were persisted to lay groundwork for reconstructing the
ChannelManager's pending HTLC maps from them in a future version. However,
we've since discovered a different direction where we can instead reconstruct
these same maps using persistent monitor events, which may mean that we don't
need to persist these onions.

Since persisting a bunch of onions on every manager write is a big commitment
that we're not fully confident in yet, switch it off for now until we either
confirm the monitor events direction and can delete all this onion persisting
code OR realize that we definitely do need it.
@valentinewallace valentinewallace force-pushed the 2026-05-no-persist-onions-yet branch from 5e3e347 to b82b6a4 Compare May 15, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

weekly goal Someone wants to land this this week

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants