-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Separate channel creation and state probe, and error handling thereof. #2465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Separate channel creation and state probe, and error handling thereof. #2465
Conversation
51ee8a7 to
ab8eb2d
Compare
eca5d2b to
ab8eb2d
Compare
ab8eb2d to
aa24f63
Compare
grpc/src/client/channel.rs
Outdated
| if let Some(s) = ac.connectivity_state.cur() { | ||
| return s; | ||
| let state = self.inner.state(connect); | ||
| match state { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we want the two state() methods to have the same return value so this could simply delegate?
grpc/src/client/channel.rs
Outdated
| let mut s = self | ||
| .active_channel | ||
| .lock() | ||
| .map_err(|_| "Could not get channel lock.".to_string())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can stay as unwrap(). IIUC locks can only fail in pretty spectacular ways (someone holding the lock panicked). In that case we might as well panic too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this, but ultimately decided to leave it as unwrap() on the theory that we really do want lock poisoning to cause an all-stop, and fix whatever causes the poisoning rather than allow apps to recover from it. I'm only vaguely uncomfortable with leaving the unwrap in there.
grpc/src/client/channel.rs
Outdated
| s.as_ref() | ||
| .cloned() | ||
| .ok_or_else(|| "Could not clone channel".into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By here you know s is Some, so you can do s.unwrap().clone() I think.
Or if you do a if let Some(ac) = s else {} instead of if s.is_none() then it will already be unwrapped.
grpc/src/client/channel.rs
Outdated
| &self, | ||
| connect: bool, | ||
| ) -> Result<Arc<ActiveChannel>, Box<dyn std::error::Error + Sync + Send>> { | ||
| let mut s = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe name this ac or active_channel? I'm not sure what s represents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
grpc/src/client/channel.rs
Outdated
| self.runtime.clone(), | ||
| )); | ||
| } else { | ||
| return Err("No active channel.".into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can probably return just an Option?
Also maybe we shouldn't combine these two functionalities into one function. They're a little disjoint.
- Anyone asking for the active channel itself will always want connect=true.
- Only when requesting the current state, exiting idle mode is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without needing to be clear that the error is a lock poisoning, the option makes the most sense, so I changed it to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also removed the bool flag after going back and forth with various options. After being satisfied that I could arrange the deck chairs any which way settled on the current version. LMK what you think!
…ake the decision to avoid this only in state().
603cd75 to
a1cca14
Compare
a1cca14 to
861d86f
Compare
dfawley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely +100 on the motivation behind this change -- the ActiveChannel should be the thing fiddling with its own state, not the PersistentChannel. A few minor nits, but otherwise LGTM, thanks.
| // Channels begin idle so new is a simple constructor. Required parameters | ||
| // are not in ChannelOptions. | ||
| // Channels begin idle so new does not automatically connect. | ||
| // ChannelOptions are only non-required parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| // ChannelOptions are only non-required parameters. | |
| // ChannelOptions contains only optional parameters. |
| let active_channel = if connect { | ||
| self.get_active_channel() | ||
| } else { | ||
| match self.locked_active_channel().as_ref().cloned() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as_ref().cloned() -> clone() ? (And below)
| // Internal use only to get the locked active channel. If this panics it means the lock is | ||
| // poisoned and we should also panic. | ||
| fn locked_active_channel(&self) -> std::sync::MutexGuard<'_, Option<Arc<ActiveChannel>>> { | ||
| self.active_channel.lock().unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure having a method for this one liner is helpful, and in fact makes the call sites more opaque which is probably a bad thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason I thought it was worth it was to try and put the unsafe unwrap in one place. But I'll revert it, since I think it's probably not that big a deal.
| /// Returns the current state of the channel. Any errors translate into a | ||
| /// TransientFailure state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the "translate" comment is about. Does that mean there could be an error while simply retrieving the state? I don't think this is possible or appropriate...this method should just tell me the current state of the channel, period.
Motivation
Currently the
Channelobject is calling into thePersistentChannelinternals to do things like provision an active channel and check state. The division of where work and error handling is managed becomes muddied in this situation, which complicates handling errors in ways that aren'tunwrap.This was also motivated by work to implement the always-failing PersistentChannel PR, but it seemed like it could be broken off into it's own PR.
Solution
Moving the actual mechanics into the
PersistentChannelobject, which owns channel config, and allowsChannelto be a lightweight handles for users to interact with. Specifically state probing and channel creation are now inPersistentChannel.PersistentChannelcalls are wrapped inResultobjects that are translated at theChannellevel to the appropriate public-facing API objects.get_or_create_active_channel()just becomesget_active_channel()and passes the 'create' flag, which controls whether a new channel is created if it does not already exist.The external semantics should stay the same as they have been.
This does not yet address the error that can surface in
call(), because it's not yet clear what the error reporting story is.