Conversation
ynezz
left a comment
There was a problem hiding this comment.
Overall LGTM, but the commits needs to be reworked and cleaned up.
Its hard to review currently, because some of the Git commits are usually mixing several topics, the changes are not described or the description is not sufficient enough, thus making the proper review time consuming, impossible.
Additionally it looks like some changes like aggressive_all in 85f87fb are later reworked in 07a7af9 to aggressiveness so making the number of commits/changes unnecessarily bigger, could be squashed etc.
@ynezz Many thanks for the detailed review. I worked mostly on my own in my projects so I am happy with everything that could be done better with Git. :-) Would it fine if the single commits could not be compiled as there are some dependencies in between for a better review? |
There is https://openwrt.org/submitting-patches#in-depth_process_of_preparing_and_submitting_code_to_openwrt basically based on https://www.kernel.org/doc/html/latest/process/submitting-patches.html, so once you start following this, you should be good for most, if not all, open source projects contributions.
Usually the Git commit itself should stand on its own, tackle single topic/logical change only, so it should compile. For some reason its not stated explicitly in the OpenWrt guide, but in the kernel's there is:
|
7665172 to
56f9686
Compare
|
@ynezz : Please take a look. I rebased the commits and separated it by function and also as it can "live" on its own. |
|
I'm really looking forward to trying out usteer this year (for the first time). You don't have to apply the following changes, I suggested. It's up to you. First commit message Second commit message Third commit (overall fine, I adjusted some things slightly) Fourth commit |
|
@ynezz Is there still something that prevents this PR from getting merged? If not, can you do that, or do we need to ping someone? |
KanjiMonster
left a comment
There was a problem hiding this comment.
Can't really say if code itself is good, but a few nitpicks.
Also AFAIU roaming is something the STA does and not the AP, so this isn't about roaming more aggressive but steering more aggressive, right? The naming and messages should be updated.
Also some of the lines are far too long and should be broken up.
| } | ||
| blobmsg_add_u8(&b, "abridged", abridged); | ||
| blobmsg_add_u32(&b, "validity_period", validity_period); | ||
| blobmsg_add_u32(&b, "mbo_reason", 5); |
There was a problem hiding this comment.
This definitely needs to be a #define so it is obvious what this means and where it comes from.
There was a problem hiding this comment.
Please clarify this? This is an additional roaming information for the AP which is not necessary to have it optionally.
There was a problem hiding this comment.
I mean something that explains to someone looking at the code what 5 means without having to look up the standard. E.g.
#define MBO_REASON_I_DONT_LIKE_YOU 5
at an appropriate place and then use that define here. Something like hostapd does.
There was a problem hiding this comment.
choosing "I don't like you" could be misinterpreted by NilsRo and taken personally. Next time maybe choose something neutral instead for an example ;)
There was a problem hiding this comment.
Definitely wasn't intended that way ;-)
There was a problem hiding this comment.
Got it.
I would leave it like it is as it should be enhanced later. It is documented in commit message. The reason code is "better AP found" but there are also other codes like AP overload, load balancing,... which brings more attention of the STA. But I have to made some deeper changes to determine the reason of the steering so I like to do it if this version is merged. That is the reason also it is added as static line.
There was a problem hiding this comment.
Commit messages should not be required to understand what code does. Please do add a define for it, or at least add a comment explaining what that value means and where it comes from (everywhere where you used it).
| config.remote_update_interval = 1000; | ||
| config.initial_connect_delay = 0; | ||
| config.remote_node_timeout = 10; | ||
| config.aggressiveness = 3; |
There was a problem hiding this comment.
The default should be (IMHO) whatever is the current behavior. The new behavior should be opt-in, not opt-out.
Also maybe use an enum or defines for the different levels so its obvious what they mean?
There was a problem hiding this comment.
In my opinion it would be better to switch to a new default to have a better roaming for everybody. According the test results in the forum roaming behave much better with the new default which is aligned to DAWNs implementation. Passive roaming is too friendly for most devices.
Anyway, I can also see a benefit to stay at the old default that nothing is changed after an upgrade but the default should be the best behavior for most cases... So open for a discussion what would be the best for the start...could be also changed later if we stay on the old behavior for the beginning.
There was a problem hiding this comment.
I agree that it makes more sense to actually activate the new feature instead of leaving it deactivated. Usteer is not a default package anyway and people installing it expect it to actually make clients roam. This PR adds a new feature to improve roaming for certain clients like Intel wifi cards.
That sounds like a sensible new default to me. Leaving it deactivated leads to worse experience overall which this pr is trying to improve upon by implementing the required standard/messages for said clients.
If we don't activate a useful new part of this package then certain people won't know about it and never activate it since it's only opt-in.
Yes, it's a new kind of default and it's good to bring up that this might change how usteer works/handles certain situations (good feedback by @KanjiMonster).
But knowing how bad certain clients handle roaming on their own, I pledge for the new default.
Looking forward to hear how other people feel about this :)
There was a problem hiding this comment.
I totally agree with @Djfe. This should come with new default., the release could have a breaking change note with some settings attached to be used by the people that want to keep the old behavior.
|
|
||
| if (si->bss_transition) { | ||
| si->roam_transition_request_validity_end = current_time + 10000; | ||
| validity_period = 10000 / usteer_local_node_get_beacon_interval(ln); /* ~ 10 seconds */ |
There was a problem hiding this comment.
Is that comment supposed to be at the line above? And where do the 10 seconds come from? They are not explained anywhere.
There was a problem hiding this comment.
No, its for the line. The validity period is calculated by beacon interval so is not exactly 10s.
There was a problem hiding this comment.
Then what does the the 10000 mean? Because if you what you get when you divide it by a number (that is presumably > 1) is 10 seconds, 10000 is something larger than 10 seconds, right?
There was a problem hiding this comment.
10000ms have to be split up by beacons (e.g. every 150ms). The interval could be set in hostap config. So if the result of the division has to be rounded the beacon interval count does not exactly reflecting 10s anymore.
There was a problem hiding this comment.
But that means that validity_period is the number of beacons sent out during 10 seconds, right?
Also it still does not explain where the 10 seconds value come from. How was this chosen? Should this be configurable?
There was a problem hiding this comment.
Yes and good question.
This is the common standard and also fixed value within e.g. DAWN. Its related to disassociation imminent timer, that if the disassociation is planned within 30s every 10s a new BSS_TM_REQ is send like a timer and to update the neighbor list if the STA moved to another place. Could make sense for experts (it is available in Cisco Controllers) to change it but not relevant for common usage in my opinion. Not everything should be configurable without need and testing. This can be done in another iteration but with this PR I do not plan to add more functionality.
| si->kick_time = current_time + config.roam_kick_delay; | ||
| usteer_roam_set_state(si, ROAM_TRIGGER_IDLE, &ev); | ||
| if (!candidate->node->rrm_nr) | ||
| MSG(VERBOSE, "Candidates node rrm nr not returned from hostapd. Neighbor list empty!"); |
There was a problem hiding this comment.
What's the meaning of this check / message? Is the user supposed to do something about it?
There was a problem hiding this comment.
This is an bug in hostap daemon that it sometimes does not return the rrm numbers which breaks supported roaming. Log messages describes the the error and not the solution as this could be different in many cases even if it are not caused by the software itself. Should be documented in OpenWrt Wiki if published but anyhow the solution must be searched in hostap. Mostly fixed with a restart of the deamon or sometimes on its own.
There was a problem hiding this comment.
If it's a bug that breaks roaming does it mean it prevents STAs from connecting to a different SSID?
If it breaks stuff, then it should be probably INFO and not VERBOSE, and the message should say something of that effect. As a user, I wouldn't have any idea what rrm nr means or if that is important, so it would just be noise and confusing.
Also if it breaks stuff, shouldn't usteer then try to handle that? It doesn't make sense continuing if this won't work, right?
There was a problem hiding this comment.
Also if it breaks stuff, shouldn't usteer then try to handle that? It doesn't make sense continuing if this won't work, right?
To me the bug @NilsRo found in hostapd sounds more like a follow up that can be tackled now that the behavior was noticed.
Sure, relying on something that might not work all the time can be a problem. But atleast this code change could help in finding the root cause. Or alert of an error, I agree with @KanjiMonster the message needs to be improved to be accessible by everyone.
@NilsRo Was this bug affecting usteer already before your PR and you just added a message so you could restart hostapd when it happened?
Where can the bug be reported, on this mailing list? https://w1.fi/
The rest could be better answered by NilsRo himself than me.
There was a problem hiding this comment.
It described here in detail. openwrt/openwrt#16875
It breaks "controlled" steering also before the PR for DAWN, Usteer, ... The wifi controller software could not get the identifiers anymore to send the target AP to the STA. So the bss transition request if more or less a "I_DONT_LIKE_YOU" but without any additional information. I added the log message to inform the user about it. Sometimes it fixes on its own but sometimes the hostap daemon has to be restarted or whole AP.
What do you suggest to have a better understanding of the message? As it happens from time to time and fixes itself it could be irritating so I put it to VERBOSE. But anyway if this is broken in general it is not visible without changing the logging level....
Some vendors calling it "optimized roaming" (Cisco) or also "roaming aggressiveness". I simply followed this direction but you or right. But its then an inaccuracy in general as all parameters for roaming are starting with roam even f there are AP related. But should be a change in config file so a good thing to do as it takes place for band steering and roaming. |
Add some new options and also changed disassociation_timer/_imminent call to have a better roaming experience. This is necessary for some Intel WiFi cards ignoring actual transition request frames due to expecting a combination of disassociation_timer and disassociation_imminent = true. According to posts on the OpenWrt forum this behavior isn't limited to just Intel WiFi cards. Affected stations respond with BSS-TM-RESP status=1 (error) and refuse to roam. - Add new aggr. levels combined with station specific configuration - Set disassociation imminent and timer as default option to have a better roaming by default - Use abridged flag to put priority on neighbors provided - Take care of correct disassociation_timer corresponding to kick_time - Let Disassociation Timer run a countdown before dissassociating the station if roam_kick_delay is longer than validity_period like defined in IEEE. Fix validity_period to 100 beacon intervals. - Add new logging messages (verbose and debug) to visualize the station behaviour Signed-off-by: Nils Hendrik Rottgardt <n.rottgardt@gmail.com>
Previous reason codes sent to stations were inaccurate. Respond with correct DEAUTH reason codes in order to give stations better control over their roaming decisions. - policy: correct DEAUTH reason is send to the station - ubus: add reason code to method usteer_ubus_kick_client - usteer: clearly define reason codes as enums Signed-off-by: Nils Hendrik Rottgardt <n.rottgardt@gmail.com>
Avoids ping-pong between bands by adding a dynamic signal threshold. If 5Ghz band is below roaming threshold (e.g. -70) and 2,4Ghz over band-steering threshold (e.g. -50) this adds a dynamic threshold depends on the signal level when station has connected. It adapts to worser signal slowly. This allows to reduce the check interval to lower values like 30s as new standard. - band_steering: added dynamic signal threshold - main: added config option band_steering_signal threshold which specifies the difference until a signal is defined as "better" and band steering is triggered - main: reduced interval to 30000ms to get in use of dynamic threshold more often. It is not neccessary to "wait" anymore to steer a station to a higher band. Signed-off-by: Nils Hendrik Rottgardt <n.rottgardt@gmail.com>
Multi Band Operations (MBO) provide stations with additional information about roaming requests. Reassociation delay avoids ping-pong between APs. By default a delay of 30s is requested from the station. MBO code 5 defines signal quality as a reason for roaming. This could be enhanced in the future by responding with other dynamically defined reasons to improve further how transitions are handled by stations. - ubus: add reason code to bss_transition request (actually static only) - ubus: add reassociation delay to let the client know how long it should stay at the new AP dereived from station block timeout. - main: add config option for reassociation delay with 30s as default Signed-off-by: Nils Hendrik Rottgardt <n.rottgardt@gmail.com>
|
Any chance of getting this merged? |
|
I hope it fixes all those code 7 times the STA does not respond and only finally when it gets a code 0 it does. |
It should if the aggressiveness is set higher. 7 is common if roaming request is ignored because it is friendly only. |
Hopefully before 2026. 🙄 BTW, usteer-ng works much better than (still) half-broken usteer. |
|
Anybody still maintaining this project and can merge this? Also the proposal in #2 appears relevant and interesting. |
This change is still included here and can be set in config with aggressiveness parameter. |
|
@Ramon00 : There are some new configurations available to handle new functionality but it is backward compatible. At best take a look here: https://github.com/openwrt/usteer/pull/10/files#diff-d9f0fa315a94c84c2dec48dd27f4184a89356693d724af7c3e7ea644b7ba1a83 If the PR is merged I will also add everything to the OpenWRT Wiki. If you have questions do not hesitate to contact me. |
|
@NilsRo Please consider bringing back usteer-ng as a separate package as this merging is going nowhere, it seems. |
|
+1 |
|
+1 |
|
I have tested usteer-ng and it is working great. It would be nice, if these changes where merged before next major release of OpenWRT. |
|
Since this PR is currently in limbo, I created a fork solely to build it with GitHub actions, and I also added the LuCI app. |
Please compile it so we could install it |
|
Hi, How do I change the aggressiveness value? I changed to 1 in config but the value remains 3 and some clients still being kicked out. But here is set it to 3 still |
|
@ondrani It is perhaps a bug fixed in ng fork - synced yesterday. Will add it to the pull later, uci integration was not fully working. |
|
Could you add this #3, if needed, to your PR? That way we could "fix" two in one. |
|
+1 |
Adds some central new functions: