Skip to content

fix: split concatenated HCI events instead of dropping them#925

Open
yannpoupon wants to merge 2 commits into
google:mainfrom
yannpoupon:fix/usb-split-concatenated-hci-events
Open

fix: split concatenated HCI events instead of dropping them#925
yannpoupon wants to merge 2 commits into
google:mainfrom
yannpoupon:fix/usb-split-concatenated-hci-events

Conversation

@yannpoupon
Copy link
Copy Markdown

Fix: USB transport silently drops HCI events packed in the same USB transfer

Summary

When a Bluetooth USB dongle delivers two HCI events in a single USB transfer, Bumble's UsbTransport.dequeue() calls sink.on_packet() once with the full buffer. HCI_Event.from_bytes then parses only the first event, logs a warning about surplus bytes, and silently discards the rest. If the discarded bytes happen to be an HCI_COMMAND_COMPLETE_EVENT, the corresponding send_sync_command future never resolves, its 10-second timeout fires, and the caller raises TimeoutError.

Observed failure

Some tests fails sporadically (~1 out of 200 test) with a TimeoutError during the BLE connection setup.

Pytest log except

2026-05-18 11:59:32.221 [   DEBUG][bumble.host:host.py (send_hci_packet):660] ### HOST -> CONTROLLER: HCI_LE_SET_SCAN_ENABLE_COMMAND:
  le_scan_enable:    1
  filter_duplicates: 0
2026-05-18 11:59:32.225 [   DEBUG][bumble.host:host.py (on_hci_packet):997] ### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 1
  command_opcode:          HCI_LE_SET_SCAN_ENABLE_COMMAND
  return_parameters:
    status: SUCCESS
2026-05-18 11:59:32.232 [   DEBUG][bumble.host:host.py (on_hci_packet):997] ### CONTROLLER -> HOST: HCI_LE_ADVERTISING_REPORT_EVENT:
  reports[0]:
    event_type:   ADV_IND
    address_type: PUBLIC_DEVICE
    address:      00:04:63:92:0B:2B/P
    data:         020105030202fe170964323163376537652d353262302d313166312d396262
    rssi:         -39
2026-05-18 11:59:32.234 [   DEBUG][bumble.host:host.py (on_hci_packet):997] ### CONTROLLER -> HOST: HCI_LE_ADVERTISING_REPORT_EVENT:
  reports[0]:
    event_type:   SCAN_RSP
    address_type: PUBLIC_DEVICE
    address:      00:04:63:92:0B:2B/P
    data:         09ffa60210eb01000001
    rssi:         -39
2026-05-18 11:59:32.236 [   DEBUG][bumble.host:host.py (send_hci_packet):660] ### HOST -> CONTROLLER: HCI_LE_SET_SCAN_ENABLE_COMMAND:
  le_scan_enable:    0
  filter_duplicates: 0
2026-05-18 11:59:32.240 [ WARNING][bumble.hci:hci.py (from_bytes):6298] truncating parameters (packet=043e1e020100015c8f4bb68b761202011a0bff4c0010063f1d8fae5dd8020a00c00e04010c2000): from 36 to 30
2026-05-18 11:59:32.241 [   DEBUG][bumble.host:host.py (on_hci_packet):997] ### CONTROLLER -> HOST: HCI_LE_ADVERTISING_REPORT_EVENT:
  reports[0]:
    event_type:   ADV_IND
    address_type: RANDOM_DEVICE
    address:      76:8B:B6:4B:8F:5C
    data:         02011a0bff4c0010063f1d8fae5dd8020a00
    rssi:         -64

... 10 seconds 

Traceback (most recent call last):
  File "/home/developer/.local/share/uv/python/cpython-3.14.0-linux-aarch64-gnu/lib/python3.14/asyncio/tasks.py", line 488, in wait_for
    return await fut
           ^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/developer/brc3-application/test/.venv/lib/python3.14/site-packages/bumble/host.py", line 685, in _send_command
    response = await asyncio.wait_for(
               ^^^^^^^^^^^^^^^^^^^^^^^
        self.pending_response, timeout=response_timeout
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/developer/.local/share/uv/python/cpython-3.14.0-linux-aarch64-gnu/lib/python3.14/asyncio/tasks.py", line 487, in wait_for
    async with timeouts.timeout(timeout):
               ~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/home/developer/.local/share/uv/python/cpython-3.14.0-linux-aarch64-gnu/lib/python3.14/asyncio/timeouts.py", line 114, in __aexit__
    raise TimeoutError from exc_val
TimeoutError
2026-05-18 11:59:42.246 [ WARNING][bumble.device:device.py (send_sync_command):2794] !!! Command HCI_LE_SET_SCAN_ENABLE_COMMAND timed out

Analysis

# --- Initiates BLE scanning ---

2026-05-18 11:59:32.221 [DEBUG] ### HOST -> CONTROLLER: HCI_LE_SET_SCAN_ENABLE_COMMAND:
  le_scan_enable:    1
  filter_duplicates: 0

# --- Real component device found in advertisement ---

2026-05-18 11:59:32.232 [DEBUG] ### CONTROLLER -> HOST: HCI_LE_ADVERTISING_REPORT_EVENT:
  reports[0]:
    event_type:   ADV_IND
    address:      00:04:63:92:0B:2B/P          ← device
    data:         ...  ← UUID in advertisement

# --- connect() calls stop_scanning → sends HCI_LE_SET_SCAN_ENABLE_COMMAND(0) ---

2026-05-18 11:59:32.236 [DEBUG] ### HOST -> CONTROLLER: HCI_LE_SET_SCAN_ENABLE_COMMAND:
  le_scan_enable:    0
  filter_duplicates: 0

# --- THE PROBLEM: dongle packs a last ADV_IND + the scan-disable complete
#     into ONE USB interrupt transfer; Bumble parses only the ADV_IND ---

2026-05-18 11:59:32.240 [WARNING] truncating parameters
    (packet=043e1e020100015c8f4bb68b761202011a0bff4c0010063f1d8fae5dd8020a00c00e04010c2000):
    from 36 to 30

#   Decoded:
#   ┌──────────────────────────────────────────────────────────────────┐
#   │ 04 3e 1e 02 01 00 01 5c 8f 4b b6 8b 76 12 02 01 1a 0b ff 4c 00  │
#   │ 10 06 3f 1d 8f ae 5d d8 02 0a 00 c0                              │
#   │   └── HCI_LE_ADVERTISING_REPORT_EVENT (30 param bytes, iPhone)   │
#   │                                                                  │
#   │ 0e 04 01 0c 20 00                ← DISCARDED                     │
#   │ │   │  │  └─────── opcode 0x200C = HCI_LE_SET_SCAN_ENABLE        │
#   │ │   │  └────────── num_hci_command_packets = 1                   │
#   │ │   └───────────── param_len = 4                                 │
#   │ └───────────────── HCI_COMMAND_COMPLETE_EVENT (0x0e)             │
#   └──────────────────────────────────────────────────────────────────┘

# --- send_sync_command never receives the completion; 10 s later: ---

2026-05-18 11:59:42.246 [WARNING] !!! Command HCI_LE_SET_SCAN_ENABLE_COMMAND timed out

# --- asyncio.timeout() raises TimeoutError → test fails ---

Why it is sporadic

The race condition requires a third-party BLE device to be advertising nearby at the precise moment the scan-disable command is being processed. The dongle's firmware batches the last incoming ADV_IND together with the outgoing command-complete response into one USB interrupt transfer. When no such device is in range or the timing is slower is he two HCI events arrive in separate USB transfers and are parsed correctly.

Pytest log for a working example

2026-05-18 12:05:46.764 [   DEBUG][bumble.host:host.py (on_hci_packet):997] ### CONTROLLER -> HOST: HCI_LE_ADVERTISING_REPORT_EVENT:
  reports[0]:
    event_type:   SCAN_RSP
    address_type: PUBLIC_DEVICE
    address:      A4:0D:BC:8D:09:6A/P
    data:         09ffa60210eb01050001
    rssi:         -37
2026-05-18 12:05:46.766 [   DEBUG][bumble.host:host.py (send_hci_packet):660] ### HOST -> CONTROLLER: HCI_LE_SET_SCAN_ENABLE_COMMAND:
  le_scan_enable:    0
  filter_duplicates: 0
2026-05-18 12:05:46.771 [   DEBUG][bumble.host:host.py (on_hci_packet):997] ### CONTROLLER -> HOST: HCI_COMMAND_COMPLETE_EVENT:
  num_hci_command_packets: 1
  command_opcode:          HCI_LE_SET_SCAN_ENABLE_COMMAND
  return_parameters:
    status: SUCCESS

Root cause

UsbTransport.InSource.transfer_callback enqueues the full USB buffer as a single item, prepended with the HCI packet-type byte:

# bumble/transport/usb.py  –  transfer_callback (before fix)
packet = (
    bytes([packet_type])
    + transfer.getBuffer()[: transfer.getActualLength()]
)
self.loop.call_soon_threadsafe(self.queue.put_nowait, packet)

dequeue() then passes the whole blob to sink.on_packet() in one call:

# bumble/transport/usb.py  –  dequeue (before fix)
async def dequeue(self):
    while not self.closed:
        ...
        if self.sink:
            try:
                self.sink.on_packet(packet)   # ← entire USB transfer, may hold N events

HCI_Event.from_bytes (in hci.py) is called on the full blob. It reads
param_len from byte offset 2, parses those bytes, and if there is anything
left it logs a warning and returns—leaving the tail bytes unconsumed and lost:

# bumble/hci.py  –  HCI_Event.from_bytes (unchanged)
if len(packet) > 3 + parameters_length:
    logger.warning(
        f'truncating parameters (packet={packet.hex()}): '
        f'from {len(packet) - 3} to {parameters_length}'
    )
parameters = packet[3 : 3 + parameters_length]   # tail bytes silently dropped

Fix

Split the USB buffer into individual HCI event frames inside dequeue() and call sink.on_packet() once per frame. Non-event (ACL) transfers are passed through unchanged.

# bumble/transport/usb.py  –  dequeue (after fix)
async def dequeue(self):
    while not self.closed:
        try:
            packet = await self.queue.get()
        except asyncio.CancelledError:
            return
        if self.sink:
            try:
                packet_type = packet[0]
                if packet_type == hci.HCI_EVENT_PACKET:
                    # A single USB interrupt transfer may contain multiple HCI
                    # events concatenated. Walk the buffer and dispatch each
                    # complete event individually so that no packet is silently
                    # discarded (e.g. a command-complete piggy-backed behind an
                    # advertising report in the same transfer).
                    offset = 1  # skip the prepended packet_type byte
                    while offset < len(packet):
                        if offset + 2 > len(packet):
                            break  # incomplete header
                        param_len = packet[offset + 1]
                        end = offset + 2 + param_len
                        if end > len(packet):
                            break  # incomplete packet
                        self.sink.on_packet(
                            bytes([packet_type]) + packet[offset:end]
                        )
                        offset = end
                else:
                    self.sink.on_packet(packet)
            except Exception:
                logger.exception(
                    color('!!! Exception in sink.on_packet', 'red')
                )

Before vs. after — the 36-byte USB transfer

Byte range Content Before fix After fix
[1:33] HCI_LE_ADVERTISING_REPORT_EVENT (iPhone) dispatched dispatched
[33:39] HCI_COMMAND_COMPLETE_EVENT for scan-disable discarded dispatched

With the fix, send_sync_command receives the completion immediately, the connect() coroutine proceeds to HCI_LE_CREATE_CONNECTION_COMMAND, and the BLE connection is established successfully.

Affected file

bumble/transport/usb.pyUsbTransport.InSource.dequeue()

@barbibulle
Copy link
Copy Markdown
Collaborator

Thanks for the very detailed report. You're right that the original assumption that each USB transfer would only contain a single complete packet doesn't always hold, so what you're proposing here is right.
I suspect that it isn't necessarily limited to the HCI events in interrupt transfers, and could also happen for HCI ACL data and other types in bulk transfers, as well as SCO transfers.
Since I have a pending PR that improves the USB transport implementation in order to support SCO packets (which, incidentally, is subject to exactly that issue of HCI packets that can span multiple transfers), I suggest that I enhance my PR a bit and also add "accumulators" to all USB transfer types, which would solve your issue here, and also support the other packet types. I'll add you as a reviewer so you can take a look and try it out before we merge it.

@yannpoupon
Copy link
Copy Markdown
Author

Thanks for the very detailed report. You're right that the original assumption that each USB transfer would only contain a single complete packet doesn't always hold, so what you're proposing here is right. I suspect that it isn't necessarily limited to the HCI events in interrupt transfers, and could also happen for HCI ACL data and other types in bulk transfers, as well as SCO transfers. Since I have a pending PR that improves the USB transport implementation in order to support SCO packets (which, incidentally, is subject to exactly that issue of HCI packets that can span multiple transfers), I suggest that I enhance my PR a bit and also add "accumulators" to all USB transfer types, which would solve your issue here, and also support the other packet types. I'll add you as a reviewer so you can take a look and try it out before we merge it.

Yes that sounds good for me if you add a fix for this issue in your PR and I could test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants