Skip to content

asyncio: fix SSL connections by using native TLS transport#884

Open
roydahan wants to merge 1 commit into
scylladb:masterfrom
roydahan:fix-asyncio-ssl
Open

asyncio: fix SSL connections by using native TLS transport#884
roydahan wants to merge 1 commit into
scylladb:masterfrom
roydahan:fix-asyncio-ssl

Conversation

@roydahan
Copy link
Copy Markdown
Collaborator

Summary

  • Replace SimpleStrategy with NetworkTopologyStrategy across the codebase (ScyllaDB dropped SimpleStrategy support)
  • Fix AsyncioConnection SSL support by using asyncio's native TLS transport (loop.create_connection(ssl=)) instead of ssl.SSLContext.wrap_socket(), which Python 3.8+ rejects in asyncio's sock_sendall/sock_recv
  • This resolves the "protocol version 21 not supported" errors when connecting to TLS-requiring ScyllaDB clusters (the TLS Alert byte 0x15 was misread as protocol version 21)

Details

SimpleStrategy → NetworkTopologyStrategy

All CQL CREATE KEYSPACE statements, Python strategy objects, and test assertions updated. The SimpleStrategy class definition in cassandra/metadata.py is preserved for backward compatibility.

AsyncioConnection SSL Fix

Problem: Python 3.8+ explicitly rejects ssl.SSLSocket in asyncio.loop.sock_sendall() and sock_recv() with TypeError("ssl.SSLSocket is not supported"). The base Connection._connect_socket() wraps the socket with ssl.SSLContext.wrap_socket(), producing an SSLSocket that asyncio refuses.

Solution: Override _connect_socket() in AsyncioConnection to skip SSL wrapping, keeping a plain TCP socket. After the socket connects (preserving shard-aware port binding), upgrade to TLS asynchronously via loop.create_connection(sock=, ssl=). A new _AsyncioProtocol class bridges asyncio's transport/protocol API back to Connection.process_io_buffer() for reading. Writes use transport.write() for SSL connections. Non-SSL connections are unchanged.

Edge cases handled: SSL handshake failure properly unblocks the write coroutine and defuncts the connection.

Fixes #330

@roydahan roydahan force-pushed the fix-asyncio-ssl branch 2 times, most recently from c4b4b48 to 99e7245 Compare May 11, 2026 18:38
@dkropachev
Copy link
Copy Markdown
Collaborator

@roydahan , please rebase

@roydahan
Copy link
Copy Markdown
Collaborator Author

Rebased.
Last run of CI passed successfully includeing the asyncio failed tests.

@roydahan
Copy link
Copy Markdown
Collaborator Author

Can anyone review this PR?

Copy link
Copy Markdown
Collaborator

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right direction for asyncio TLS: using asyncio's native transport/protocol layer avoids the ssl.SSLSocket incompatibility with sock_recv() / sock_sendall().

I think it would be worth tightening the design before merge. Right now TLS and plain CQL use different I/O models, and TLS writes do not appear to account for asyncio transport backpressure. My main suggestion would be to consider using one transport/protocol path for both TLS and non-TLS asyncio connections in this PR.

The reason I think it belongs here is that the hard part is not only "make TLS work"; it is introducing asyncio transports into this reactor correctly. Once the reactor has transport lifecycle, protocol callbacks, write flow control, close handling, and tests for those pieces, keeping plain CQL on the old socket-coroutine path leaves two separate implementations of the same connection state machine. That makes future behavior harder to reason about and increases the chance that fixes land in one path but not the other.

CI already gives broad connectivity coverage through the asyncio integration matrix, including an SSL scenario via TestSslThroughNlb, so I would focus new tests on the new transport mechanics and compatibility edge cases rather than adding another generic SSL integration test.

self._read_watcher = asyncio.run_coroutine_threadsafe(
self.handle_read(), loop=self._loop
)
if self._using_ssl:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use the transport/protocol path for both TLS and non-TLS connections here? With the current split, asyncio has two different I/O models in the same reactor: TLS uses asyncio.Protocol/transport, while plain CQL stays on sock_recv()/sock_sendall().

I think unifying them in this PR may be cleaner because the work needed for TLS is mostly the same work needed for a correct transport-based reactor: setup, read callbacks, write flow control, close handling, and tests. If those pieces are implemented only for TLS now, we end up with two connection state machines to maintain and a higher chance of future fixes applying to one path but not the other.

A shared create_connection(sock=..., ssl=self.ssl_context or None) path could make lifecycle, reads, writes, and flow control consistent across both modes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion for a follow-up, but I'd rather keep it out of scope for this PR. The non-SSL sock_recv/sock_sendall path has been working in production for years — rewriting it to use transport/protocol introduces risk with no immediate benefit. The two paths are isolated, so future fixes to one won't accidentally break the other. Happy to do this as a separate PR if we want to modernize the reactor.

await self._loop.sock_sendall(self._socket, next_msg)
if self._transport:
# SSL: use asyncio transport (handles TLS transparently)
self._transport.write(next_msg)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we account for transport backpressure here? transport.write() queues into asyncio's transport buffer and returns immediately, unlike await sock_sendall(). Under sustained writes, this loop can drain _write_queue faster than the socket can actually send.

One possible shape is to have the protocol expose a write-ready event:

def pause_writing(self):
    self.write_ready.clear()

def resume_writing(self):
    self.write_ready.set()

and then wait before writing:

await self._protocol.write_ready.wait()
self._transport.write(next_msg)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Implemented pause_writing()/resume_writing() on _AsyncioProtocol with a write_ready Event. The write loop now awaits write_ready.wait() before each transport.write(). Also unblocks from connection_lost() to prevent shutdown hangs.

'does not implement .finish()')


class _AsyncioProtocol(asyncio.Protocol):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this protocol owns the transport callbacks, I think it would be useful for it to also own flow-control state. Implementing pause_writing() / resume_writing() here would let the writer respect asyncio's high/low watermarks. It may also be worth unblocking any paused writer from connection_lost() so shutdown cannot leave the write coroutine waiting indefinitely.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added pause_writing()/resume_writing() with a write_ready Event, and connection_lost() sets it to prevent the writer from hanging on shutdown.

Comment thread cassandra/io/asyncioreactor.py Outdated
self._transport = None
self._protocol = _AsyncioProtocol(self) if self.ssl_context else None
self._using_ssl = bool(self.ssl_context)
self._ssl_ready = asyncio.Event() if self.ssl_context else None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small compatibility point: the repo still supports Python 3.9, and the existing code passes loop=self._loop to Queue and Lock for that runtime. Should this new Event follow the same pattern so it is bound to the reactor loop rather than the caller thread's loop?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Moved _ssl_ready and _protocol creation after loop_args is computed, passing it to both asyncio.Event(**loop_args) and _AsyncioProtocol(self, loop_args) (which uses it for its internal write_ready Event).

)
self._send_options_message()

def _connect_socket(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This override looks like it accidentally skips the base class sockopts handling. It would be good to preserve Cluster(sockopts=...) for asyncio connections:

if self.sockopts:
    for args in self.sockopts:
        self._socket.setsockopt(*args)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added if self.sockopts: block at the end of _connect_socket(), matching the base class.

Comment thread cassandra/io/asyncioreactor.py Outdated
server_hostname = None
if self.ssl_options:
server_hostname = self.ssl_options.get("server_hostname", None)
if not server_hostname:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may change behavior for an explicit ssl_options={"server_hostname": ""}. For asyncio create_connection(sock=..., ssl=...), server_hostname="" is the documented way to bypass hostname/SNI handling when no host argument is provided. Could we distinguish a missing value from an explicitly empty one, for example with is None or a sentinel?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Changed to if server_hostname is None: to preserve explicit empty string for suppressing SNI.

Python 3.8+ rejects ssl.SSLSocket in asyncio's sock_sendall/sock_recv
with TypeError. This caused the driver to fail connecting to ScyllaDB
clusters requiring TLS, manifesting as 'protocol version 21 not
supported' errors (0x15 = TLS Alert byte misread as protocol version).

Fix by using asyncio's native TLS transport (loop.create_connection
with ssl= parameter) instead of wrapping sockets with
ssl.SSLContext.wrap_socket(). This preserves shard-aware port binding
done during _initiate_connection().

Add _AsyncioProtocol to bridge asyncio's transport/protocol API back
to Connection.process_io_buffer() for SSL data reads. Non-SSL
connections continue using the existing sock_recv path.

Fixes scylladb#330
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Asyncio backend: SSL doesn't work

2 participants