Skip to content

Conversation

@nathanielford
Copy link
Collaborator

Motivation

Currently the Channel object is calling into the PersistentChannel internals to do things like provision an active channel and check state. The division of where work and error handling is managed becomes muddied in this situation, which complicates handling errors in ways that aren't unwrap.

This was also motivated by work to implement the always-failing PersistentChannel PR, but it seemed like it could be broken off into it's own PR.

Solution

Moving the actual mechanics into the PersistentChannel object, which owns channel config, and allows Channel to be a lightweight handles for users to interact with. Specifically state probing and channel creation are now in PersistentChannel.

PersistentChannel calls are wrapped in Result objects that are translated at the Channel level to the appropriate public-facing API objects. get_or_create_active_channel() just becomes get_active_channel() and passes the 'create' flag, which controls whether a new channel is created if it does not already exist.

The external semantics should stay the same as they have been.

This does not yet address the error that can surface in call(), because it's not yet clear what the error reporting story is.

@nathanielford nathanielford force-pushed the refactor/PersistentChannel branch from 51ee8a7 to ab8eb2d Compare December 15, 2025 22:46
@nathanielford nathanielford marked this pull request as ready for review December 16, 2025 00:17
@nathanielford nathanielford force-pushed the refactor/PersistentChannel branch from eca5d2b to ab8eb2d Compare December 16, 2025 19:54
@nathanielford nathanielford force-pushed the refactor/PersistentChannel branch from ab8eb2d to aa24f63 Compare December 16, 2025 19:55
@nathanielford nathanielford self-assigned this Dec 16, 2025
if let Some(s) = ac.connectivity_state.cur() {
return s;
let state = self.inner.state(connect);
match state {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we want the two state() methods to have the same return value so this could simply delegate?

let mut s = self
.active_channel
.lock()
.map_err(|_| "Could not get channel lock.".to_string())?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can stay as unwrap(). IIUC locks can only fail in pretty spectacular ways (someone holding the lock panicked). In that case we might as well panic too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this, but ultimately decided to leave it as unwrap() on the theory that we really do want lock poisoning to cause an all-stop, and fix whatever causes the poisoning rather than allow apps to recover from it. I'm only vaguely uncomfortable with leaving the unwrap in there.

Comment on lines 247 to 249
s.as_ref()
.cloned()
.ok_or_else(|| "Could not clone channel".into())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By here you know s is Some, so you can do s.unwrap().clone() I think.

Or if you do a if let Some(ac) = s else {} instead of if s.is_none() then it will already be unwrapped.

&self,
connect: bool,
) -> Result<Arc<ActiveChannel>, Box<dyn std::error::Error + Sync + Send>> {
let mut s = self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name this ac or active_channel? I'm not sure what s represents.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

self.runtime.clone(),
));
} else {
return Err("No active channel.".into());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can probably return just an Option?

Also maybe we shouldn't combine these two functionalities into one function. They're a little disjoint.

  • Anyone asking for the active channel itself will always want connect=true.
  • Only when requesting the current state, exiting idle mode is optional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without needing to be clear that the error is a lock poisoning, the option makes the most sense, so I changed it to this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also removed the bool flag after going back and forth with various options. After being satisfied that I could arrange the deck chairs any which way settled on the current version. LMK what you think!

@nathanielford nathanielford force-pushed the refactor/PersistentChannel branch from 603cd75 to a1cca14 Compare January 8, 2026 23:00
@nathanielford nathanielford force-pushed the refactor/PersistentChannel branch from a1cca14 to 861d86f Compare January 8, 2026 23:04
Copy link
Collaborator

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely +100 on the motivation behind this change -- the ActiveChannel should be the thing fiddling with its own state, not the PersistentChannel. A few minor nits, but otherwise LGTM, thanks.

// Channels begin idle so new is a simple constructor. Required parameters
// are not in ChannelOptions.
// Channels begin idle so new does not automatically connect.
// ChannelOptions are only non-required parameters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// ChannelOptions are only non-required parameters.
// ChannelOptions contains only optional parameters.

let active_channel = if connect {
self.get_active_channel()
} else {
match self.locked_active_channel().as_ref().cloned() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_ref().cloned() -> clone() ? (And below)

// Internal use only to get the locked active channel. If this panics it means the lock is
// poisoned and we should also panic.
fn locked_active_channel(&self) -> std::sync::MutexGuard<'_, Option<Arc<ActiveChannel>>> {
self.active_channel.lock().unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure having a method for this one liner is helpful, and in fact makes the call sites more opaque which is probably a bad thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I thought it was worth it was to try and put the unsafe unwrap in one place. But I'll revert it, since I think it's probably not that big a deal.

Comment on lines +157 to +158
/// Returns the current state of the channel. Any errors translate into a
/// TransientFailure state.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the "translate" comment is about. Does that mean there could be an error while simply retrieving the state? I don't think this is possible or appropriate...this method should just tell me the current state of the channel, period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants