-
Notifications
You must be signed in to change notification settings - Fork 294
xapi_ha: avoid raising Not_found when joining a liveset #6734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I thought I had make-checked it after rebasing on top of master. Some code changes used the changed functions, so I need to make further changes |
| List.iter | ||
| (fun sr -> | ||
| let vdi = Xha_statefile.find_or_create ~__context ~sr ~cluster_stack in | ||
| statefile_vdis := vdi :: !statefile_vdis | ||
| ) | ||
| srs ; | ||
| [sr] ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could drop this List.iter since the list only has a single element
| List.iter | ||
| (fun (_, exn) -> | ||
| (* Perform a disable since the pool HA state isn't consistent *) | ||
| error "Attempting to disable HA pool-wide" ; | ||
| Helpers.log_exn_continue | ||
| "Disabling HA after a failure during enable" disable_internal | ||
| __context ; | ||
| raise exn | ||
| ) | ||
| errors ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and above changed the code from raising the first exception to raising all of them one after another - I guess it'd be hard for anything to depend on this, so it should be alright (might increase noise in the logs though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't have resumable exceptions, only the first one is actually reported, or none at all
75ce642 to
52d1637
Compare
|
Fixed another bug because the cluster stack values are stringy, so I created a variant to model them. I would have like to use an enum at the API level, but that is very invasive and we know how problematic that can be |
| UNNECESSARY_LENGTH=$(local_grep "List.length.*=+\s*0") | ||
| UNNECESSARY_LENGTH=$((UNNECESSARY_LENGTH+$(local_grep "0\s*=+\s*List.length"))) | ||
| UNNECESSARY_LENGTH=$((UNNECESSARY_LENGTH+$(local_grep "List.length.*\s>\s*0"))) | ||
| UNNECESSARY_LENGTH=$((UNNECESSARY_LENGTH+$(local_grep "List.length.*\s<>\s*0"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest exchanging List.length.*\s>\s*0 and 0\s*<\s*List.length to bring similar logic code closer.
|
|
||
| let ok_or_raise map_error = function Ok v -> v | Error exn -> map_error exn | ||
|
|
||
| let master_address_exn __FUN e = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master_address_exn, master_uuid_exn, and master_not_in_liveset_exn are similar, and only msg is different between them. Could they be combined into one function with an additional argument, e.g. type/kind, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That changes the current functions to have 3 lines instead of 4. It's not worth the loss of locality, IMO
| | [pbd] -> | ||
| pbd | ||
| | pbd :: pbds -> | ||
| (* shouldn't happen... delete all but first pbd to make db consistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be logged?
| (** Retrieves the value for the key, returns a value when it's found and is a | ||
| valid int, otherwise is [None] *) | ||
|
|
||
| val get_of_string : (string -> 'a option) -> string -> 'a option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads a bit complicated and difficult to grasp. I believe this could be phrased as a monad .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the best name I could muster. It's a get and then applying a user-provided function through bind. I really don't know how to name it in a succinct way.
The Not_found and hd exceptions keep popping up, and it's difficult to find them when there are no backtraces logged. Remove usages if them, even if they are not problematic so the actual problematic ones can be flushed out over time. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
Until now, the valid string values were not written anywhere, change the situation. Ideally this would be done using an enum in the idl, but unfortunately this changes types of existing parameters in API calls, so it's quite risky. Instead have a conservative approach to only enumerate the valid values and make Constants the only source of truth for these values, including default ones, which were scatterred around previously. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
Previously it was opened if the default stack was selected, but this could actually be different from XHAd Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
Allows callers to avoid exceptions, the previous get is now called get_exn so it's clear which users have to be have to be mindful of exceptions. Not all get_exn were converted to get, previous behaviour was widespread, and doing the change without changing behaviour is not trivial, better to do it only once its detected. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
Joining the liveset has been found to raise `Not_found` in some cases. Transform these exceptions to others that are more readable and show the exact cause. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
When joining a liveset, Not_found errors could be raised, but the users did not have information on what was that it could not be found. Add which IP address and which UUID could not be found. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
An xcp-ng user reported a failure when enabling HA, with the only error being a Not_found, with this change we now know that it's because the IP of the coordinator is not present in the local database's
ha_peersvalue.While doing this: