-
Notifications
You must be signed in to change notification settings - Fork 93
[scd] Change the subcription locking mechanism to a simple, shared lock. #1363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| DROP TABLE IF EXISTS scd_locks; | ||
|
|
||
| UPDATE schema_versions | ||
| SET schema_version = 'v3.2.0' | ||
| WHERE onerow_enforcer = TRUE; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| CREATE TABLE IF NOT EXISTS scd_locks ( | ||
| key INT64 PRIMARY KEY | ||
| ); | ||
|
|
||
| INSERT INTO scd_locks (key) VALUES (0); | ||
|
|
||
| UPDATE schema_versions | ||
| SET schema_version = 'v3.3.0' | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering whether this should be a major version bump: an older schema with a newer image will appear to work, however the locking won't be effective on the DSS instances with the older images. Do I understand correctly that this will have an impact only on performances? Or will it have an impact on the correctness of the results? If the former: probably not needed to major version bump. But not if the latter.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It won't work if the option is used, but only in that case, so I'm not sure about the answer. I would say that as it's only an experimental one, that safe to keep it as minor, as it shouldn't be required for normal usage. |
||
| WHERE onerow_enforcer = TRUE; | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| DROP TABLE IF EXISTS scd_locks; | ||
|
|
||
| UPDATE schema_versions set schema_version = 'v1.0.0' WHERE onerow_enforcer = TRUE; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| CREATE TABLE IF NOT EXISTS scd_locks ( | ||
the-glu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| key BIGINT PRIMARY KEY | ||
| ); | ||
|
|
||
| INSERT INTO scd_locks (key) VALUES (0); | ||
|
|
||
| UPDATE schema_versions set schema_version = 'v1.1.0' WHERE onerow_enforcer = TRUE; | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| locals { | ||
| rid_db_schema = var.desired_rid_db_version == "latest" ? (var.datastore_type == "cockroachdb" ? "4.0.0" : "1.0.1") : var.desired_rid_db_version | ||
| scd_db_schema = var.desired_scd_db_version == "latest" ? (var.datastore_type == "cockroachdb" ? "3.2.0" : "1.0.1") : var.desired_scd_db_version | ||
| scd_db_schema = var.desired_scd_db_version == "latest" ? (var.datastore_type == "cockroachdb" ? "3.3.0" : "1.1.0") : var.desired_scd_db_version | ||
| aux_db_schema = var.desired_aux_db_version == "latest" ? "1.1.0" : var.desired_aux_db_version | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Performances | ||
|
|
||
| ## The SCD global lock option | ||
|
|
||
| !!! danger | ||
| All DSS instances in a DSS pool must use the same value for this option. Mixing will result in dramatically lower performance. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO there isn't enough safeguards or mitigation that this does not happen. At least IMO we want some visibility, e.g. expose this configuration flag on the |
||
|
|
||
| It has been reported in issue [#1311](https://github.com/interuss/dss/issues/1311) that creating a lot of overlapping operational intents may increase the datastore load in a way that creates timeouts. | ||
|
|
||
| By default, the code will try to lock on required subscriptions when working on operational intents, and having too many of them may lead to issues. | ||
|
|
||
| A solution to that is to switch to a global lock, that is just globally locking operational intents operations, regardless of subscriptions. | ||
|
|
||
| This will result in lower general throughput for operational intents that don't overlap, as only one of them can be processed at a time, but better performance in the issue's case as lock acquisition is simpler. | ||
|
|
||
| You should enable this option depending on your DSS usage/use case and what you want to maximize: | ||
| * If you have non-overlapping traffic and maximum global throughput, don't enable this flag | ||
| * If you have overlapping traffic and don't need high global throughput, enable this flag | ||
|
|
||
| The following graphs show example throughput without (on the left) and with the flag (on the right). This has been run on a local machine; on a real deployment you can expect lower performance (due to various latency), but similar relative numbers. | ||
|
|
||
| All graphs have been generated with the [loadtest present in the monitoring repository](https://github.com/interuss/monitoring/blob/main/monitoring/loadtest/README.md) using `SCD.py`. | ||
|
|
||
|  | ||
| *Overlapping requests. Notice the huge spikes on the left, as the datastore struggles to acquire locks.* | ||
|
|
||
|  | ||
| *Non-overlapping requests. Notice the reduction of performance on the right, with a single lock.* | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a column specifying the instance that created the lock? That might facilitate debugging if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, as we just do a select for update. We could add a column and update it, but that would mean more 'work' in the transaction (we now have to commit something).
And as it would be in the transaction, we would only see the value when it's committed (so too late).