Skip to content

Attributable failures#9888

Open
GeorgeTsagk wants to merge 10 commits into
lightningnetwork:masterfrom
GeorgeTsagk:attr-failures
Open

Attributable failures#9888
GeorgeTsagk wants to merge 10 commits into
lightningnetwork:masterfrom
GeorgeTsagk:attr-failures

Conversation

@GeorgeTsagk
Copy link
Copy Markdown
Collaborator

@GeorgeTsagk GeorgeTsagk commented Jun 2, 2025

Description

Adds support for Attributable failures (feature 36/37). This changes the error encryption and the data encoding for the update_fail_htlc message. We now include the attribution structure in the extra data of the peer message, which helps with assigning blame across the route and also extracting the reported HTLC hold times for each node across the route.

For now we default to using non-strict attribution decryption, which means that we're basically preserving the old behavior in order to give time to nodes to upgrade. Otherwise this could lead to premature blame of our direct peers, possibly blacklisting all of our channels early in the payment lifecycle.

This is based on the lightning-onion PR which is also updated to reflect the latest state of attributable failures Bolt PR.

Testing

Current unit/integration tests seem to be passing, which validates that we are preserving the old behavior.

Todo:

  • Add integration tests that somehow validate the attribution data (maybe via the hold times)
  • Carry out an interoperability test with LDK/Acinq

Even though the todo list above is important to consider this PR ready to merge, this PR at current state is still good for a first round of review.

@GeorgeTsagk GeorgeTsagk self-assigned this Jun 2, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@saubyk saubyk added this to the v0.20.0 milestone Jun 2, 2025
@saubyk saubyk added this to lnd v0.20 Jun 2, 2025
@saubyk saubyk moved this to In progress in lnd v0.20 Jun 2, 2025
@GeorgeTsagk
Copy link
Copy Markdown
Collaborator Author

Bumped lightning-onion and rebased on master

@GeorgeTsagk
Copy link
Copy Markdown
Collaborator Author

The linter failure is only about the replace in the go.mod, which will have to stay until the dependent lightning-onion PR is merged

go.mod:226:1: replacement are not allowed: github.com/lightningnetwork/lightning-onion (gomoddirectives)
replace github.com/lightningnetwork/lightning-onion => github.com/joostjager/lightning-onion v0.0.0-20250630141312-2898b9c46c4e

@saubyk saubyk modified the milestones: v0.20.0, v0.21.0 Sep 5, 2025
@saubyk saubyk removed this from lnd v0.20 Sep 5, 2025
@ziggie1984
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for attributable failures, which is a significant and complex change to the error handling and encryption logic. The core refactoring to separate shared secret extraction from error encrypter creation is well-executed and improves the design.

I've found a few issues related to the implementation:

  • An incorrect unit calculation for HTLC hold times, which deviates from the BOLT specification.
  • A logic error in handling failures for introduction nodes in a blinded path, where the wrong message type is created and attribution data is discarded.
  • A minor opportunity for code simplification in the new logging logic.

Overall, the changes are well-structured, but the identified issues, particularly the correctness ones, should be addressed. The PR description is clear about the work-in-progress nature, and this review should help in finalizing the implementation.

Comment thread htlcswitch/hop/error_encryptor.go
Comment thread htlcswitch/link.go
Comment thread htlcswitch/failure.go Outdated
@GeorgeTsagk
Copy link
Copy Markdown
Collaborator Author

Rebased
Reminder: lint fails because of the temporary replace directive in go.mod that points to the lightning-onion PR

@GeorgeTsagk
Copy link
Copy Markdown
Collaborator Author

Rebased

@GeorgeTsagk
Copy link
Copy Markdown
Collaborator Author

Added an itest to verify reported hold times from a multi hop error

@saubyk saubyk modified the milestones: v0.21.0, v0.22.0 Apr 9, 2026
@saubyk saubyk added this to lnd v0.22 Apr 9, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in lnd v0.22 Apr 9, 2026
@saubyk saubyk moved this from Backlog to In progress in lnd v0.22 Apr 9, 2026
@lightninglabs-deploy
Copy link
Copy Markdown
Collaborator

@GeorgeTsagk, remember to re-request review from reviewers when ready

@github-actions github-actions Bot added the severity-critical Requires expert review - security/consensus critical label May 20, 2026
@github-actions
Copy link
Copy Markdown

🔴 PR Severity: CRITICAL

Automated classification | 23 files | ~886 lines changed

🔴 Critical (16 files)
  • htlcswitch/circuit.go - HTLC circuit lifecycle in forwarding state machine
  • htlcswitch/circuit_map.go - In-memory and persistent circuit state tracking
  • htlcswitch/failure.go - HTLC failure message handling and attribution
  • htlcswitch/hop/error_encryptor.go - Error encryption/onion wrapping (+188/-43 lines)
  • htlcswitch/hop/iterator.go - Hop payload iterator for onion decoding
  • htlcswitch/interceptable_switch.go - HTLC interception layer
  • htlcswitch/link.go - Channel link state machine (core forwarding logic)
  • htlcswitch/mailbox.go - Packet mailbox for the HTLC switch
  • htlcswitch/mock.go - Mock implementations for htlcswitch package
  • htlcswitch/switch.go - Core HTLC switch coordination
  • htlcswitch/test_utils.go - Test utilities in htlcswitch package
  • lnwallet/channel.go - Channel commitment transaction management
  • lnwallet/payment_descriptor.go - Payment descriptor for commitment state
  • lnwire/attr_data.go - New Lightning wire protocol attribute data type (added)
  • peer/brontide.go - Encrypted peer connection handling (Noise protocol)
  • server.go - Core server coordination
🟠 High (3 files)
  • lnrpc/lightning.proto - RPC/API definition changes (new fields)
  • lnrpc/routerrpc/router_backend.go - Router RPC backend
  • routing/payment_lifecycle.go - Payment routing lifecycle
🟡 Medium (2 files)
  • payments/db/kv_store.go - Payment KV store (+42 lines)
  • payments/db/payment.go - Payment data model

Analysis

This PR introduces attributable payment failures - a mechanism for identifying which hop along a payment route caused a failure. The changes span multiple critical subsystems simultaneously:

  • htlcswitch/: The bulk of changes touch the core HTLC forwarding engine. hop/error_encryptor.go receives a major rework to support per-hop attribution data in onion-encrypted failure messages.
  • lnwire/attr_data.go: Adds a new wire protocol data type for attribution - a protocol-level change.
  • lnwallet/channel.go + payment_descriptor.go: Channel commitment state modified to propagate attribution data through the commitment transaction lifecycle.
  • peer/brontide.go: Peer message handling updated to support the new attribution wire format.

Severity bump triggers (all three fired; CRITICAL is already the maximum):

  1. 23 non-test/non-generated files changed (threshold: >20)
  2. ~886 lines changed in non-test/non-generated files (threshold: >500)
  3. Multiple distinct critical packages touched: htlcswitch, lnwallet, lnwire, peer, server.go

Changes to the HTLC error encryption path and wire protocol are security-sensitive - incorrect attribution data could leak routing information or cause hard-to-diagnose payment failures.


To override, add a severity-override-{critical,high,medium,low} label.
<!-- pr-severity-bot -->

@GeorgeTsagk GeorgeTsagk force-pushed the attr-failures branch 2 times, most recently from be35c47 to 517ee50 Compare May 20, 2026 12:33
@GeorgeTsagk
Copy link
Copy Markdown
Collaborator Author

GeorgeTsagk commented May 20, 2026

Rebased on master, changed attribution data TLV to the final one (101 -> 1)

GeorgeTsagk and others added 10 commits May 20, 2026 19:21
Co-authored-by: Joost Jager <joost.jager@gmail.com>
Bump lightning-onion dependency to include attributable error support.
Preparation for the instantiation of an attributable error encrypter.
This gets rid of the sphinx encrypter instantiation in OnionProcessor.
This would otherwise be problematic when an attributable encrypter
would need to be created there without having access to the error
structure.

Co-authored-by: Joost Jager <joost.jager@gmail.com>
We add some simple helpers for the purpose of parsing the attribution
data from and to the wire format.
When propagating a failure backwards along a route we want to persist
the received ExtraData of the downstream link so that any handling can
occur on our node before returning the updated ExtraData to our upstream
link.
We now call into the new encryption/decryption methods and provide the
received attribution data as an argument. The result is then also
returned to our upstream peer.
Add test coverage for the new attributable failures feature:
- lnwire/attr_data_test.go: round-trip, TLV type, empty/nil cases
- hop/error_encryptor_test.go: encode/decode, backward compat, hold
  time, reextract, introduction/relaying encrypter round-trips
- htlcswitch/attributable_failure_test.go: end-to-end encrypt/decrypt
  with attribution data, backward compat without attr data, hold times
Add a hold_times field to the Failure proto message so that callers of
SendPaymentV2 can see per-hop hold times reported via attributable
errors. The hold times are persisted in the KV store (backward
compatible via trailing optional read) and populated in both the
database and streaming RPC paths.

Changes:
- lnrpc/lightning.proto: add repeated uint32 hold_times = 10
- payments/db: add HoldTimes to HTLCFailInfo + KV serialization
- routing: copy HoldTimes from ForwardingError to HTLCFailInfo
- lnrpc/routerrpc: populate HoldTimes in both marshallHtlcFailure
  and marshallError
Add an integration test that sets up Alice -> Bob -> Carol, triggers a
payment failure at Carol (unknown payment hash), and verifies that
Alice receives hold times in the RPC failure response. The test checks:
- hold_times has one entry per route hop (1:1 correspondence)
- failure source index matches Carol (index 2)
- each hold time is within a reasonable bound
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

severity-critical Requires expert review - security/consensus critical

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

4 participants