Skip to content

Conversation

@haikoschol
Copy link

This PR changes how multistream-select messages received on an opening outbound substream are handled on webrtc connections. For details please see the issue description.

The PR builds on top of #441 for the str0m update. But looking at it again I realized that it also made changes to multistream-select handling. I'll look into what those are and how they relate to my changes. In any case, I believe my changes bring the implementation closer to compliance with the multistream-select spec and enable interoperability with smoldot.

fixes #464

@haikoschol haikoschol force-pushed the haiko-webrtc-outbound-multistream-nego-fix branch from 93e6151 to 85fae21 Compare November 5, 2025 13:43
protocol: protocol.to_string(),
});

// self.rtc.channel(channel_id).unwrap().set_buffered_amount_low_threshold(1024);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this line since it is unrelated to this PR and should be address later when working on backpressure.

event = self.handles.next() => match event {
None => unreachable!(),
Some((channel_id, None | Some(SubstreamEvent::Close))) => {
Some((_, None)) => {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a superficial fix for another unrelated issue which I haven't investigated yet. Without this change, a "channel closed" event is fired repeatedly after a while. I believe this is caused by SubstreamHandle::tx being dropped, presumably when the substream(/channel?) is closed. The fact that the event continues firing suggests that SubstreamHandle::rx continues to be polled. Maybe the handle simply needs to be removed from self.handles?

I've also seen this happen when briefly testing with a rust-libp2p dialer, so this is not related to smoldot interop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested again without this change and now I don't get repeating "channel closed" messages in the logs. Presumably this is due to a change somewhere else that I can't pin down right now. In any case, it was unrelated to multistream-select negotiation.


self.rtc.direct_api().close_data_channel(channel_id);
self.channels.insert(channel_id, ChannelState::Closing);
self.handles.remove(&channel_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this lead to memory leaks due to keeping the channel ids around? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I don't remember what exactly I had in mind with that change. Probably expecting that Event::ChannelClose would be emitted eventually and Connection::on_channel_closed() called, which does the full cleanup. But that's not exactly how it works.

The way it was before is incorrect as well, but that's out of scope for this PR since it is related with signaling of substream closure via the WebRTC protobuf frame.
Anyway, I'll revert this change and we'll revisit it in a follow-up PR.

@haikoschol
Copy link
Author

haikoschol commented Nov 13, 2025

@lexnv I've made some more changes and think this PR is now in a pretty good state.

I had to change a few tests, but hopefully only their implementation, not their meaning.

If I haven't missed anything, all changes should now be limited to the webrtc-specific code.

I've tested the following with a litep2p listener based on this PR:

With a smoldot-ish dialer running in Chrome:

  • connection establishment
  • multistream-select negotiation on outbound substreams
  • multistream-select negotiation on inbound substreams
  • outbound ping
  • inbound ping

With a rust-libp2p dialer using webrtc.rs 0.13.0:

  • connection establishment
  • multistream-select negotiation on inbound substreams
  • perf behaviour

map_err(|_| error::NegotiationError::ParseError(ParseError::InvalidData))?;

let payload = tail[..len].to_vec();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be good if we can greak the code such that we provide better trace logging:

let message = Message::decode(payload.into());

tracing::trace!(
    target: LOG_TARGET,
    ?message,
    "Decoded message while registering response",
);

let mut protocols = match message {
    ..
};

Let me know if this makes sense or if you think it will polute too much the logs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, especially as a trace.

Comment on lines 398 to 399
let (len, tail) = unsigned_varint::decode::usize(remaining).
map_err(|_| error::NegotiationError::ParseError(ParseError::InvalidData))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we are expecting multiple [ len_prefix ++ Message ] to arrive from the register_response of the WebRTC?

Who would normally exhibit this behavior (libp2p or smoldot)?

bytes.put_u8((proto.as_ref().len() + 1) as u8); // + 1 for \n
let _ = Message::Protocol(proto).encode(&mut bytes).unwrap();

let expected_message = bytes.freeze().to_vec();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dq: This slightly changes the meaning of the messages.

Before we were expecting:

[Protocols(/multistream/1.0.0), Protocol(/13371338/proto/1)]

Now we are expecting:

// \n added
[Header(/multistream/1.0.0 /\n),

// \n counted but not checked
Protocol(/13371338/proto/1 [])

Hmm, one of those representation can't be spec compliant right?

Also are we ignoring malformated messages:

  • we decode the frame size correctly
  • but we don't check then if the last character is \n or not sufficeint bytes provided?

Copy link
Author

@haikoschol haikoschol Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the test ensures that the wire format that WebRtcDialerState::propose() produces is what we expect. Before it actually ensured that whatever propose() produces can be decoded with Message::decode(). So I did change the semantics of the test after all. But I would argue that the new meaning is more useful. We could also add decoding and assert we get the expected strings back, but due to the changes made, this can no longer be done by just calling Message::decode().

Copy link
Collaborator

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job so far! Feels like we are getting closer to figuring out the multistream select 🙏

My questions revolve around the spec correctness of the implementation, or if we basically implemented a tiny subset in the past? And mainly, which implementation is causing us to extensively parse the frames (libp2p or smoldot or both?) 🤔

- add logging calls
- clean up imports
- add constant for Protocol::try_from(&b"/multistream/1.0.0"[..])
@haikoschol
Copy link
Author

And mainly, which implementation is causing us to extensively parse the frames (libp2p or smoldot or both?) 🤔

The litep2p implementation is causing us to extensively parse the frames. :D This is because we always deal with the entire content of a WebRTC protobuf frame, which may contain one or more multistream messages.

In rust-libp2p the WebRTC implementation uses MessageIO and LengthDelimited in a layer below the multistream-select logic. The multistream-select logic gets fed from a stream that yields one multistream_select::protocol::Message at a time, even if the header and the selected protocol are in the same WebRTC protobuf frame.

I hope this makes some sense. I'll go into detail in our call today.

})?;

if len > tail.len() {
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this would miss a few protocols, maybe return an error and add a debug log?

Copy link
Author

@haikoschol haikoschol Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this and it caused the test header_line_missing() to fail. This is because that test used Message::Protocols to encode two protocol names (without a header) to use as a negotiation response. This isn't spec compliant in two ways that are irrelevant to what is being tested, so I changed the test to send one protocol name.

@lexnv lexnv requested a review from dmitry-markin November 18, 2025 12:50
Copy link
Collaborator

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks good to me!

Great job on getting str0m updated and figuring out the multistream select implementation 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

webrtc: multistream-select negotiation not fully spec compliant

3 participants