Skip to content

Commit cc06340

Browse files
authored
ConversationStore: Implement migration to postgres (#4810)
* ConversationStore.Migration: Implement function to migrate a conversation * ConversationStore.Cassandra: Add hybrid interpretter, which will work during migration Exceptions: 1. Getting a paginated list of qualified conv Ids 2. It will not work well if the migration fails to delete conv data from Cassandra after copying it to postgres These problems will be solved in following commits * ConverastionStore.Cassandra: Fix edge case bug in getLocalConvIds When fetching `maxIds + 1` convs, it can happen that a user has exactly those many convs left, in this case the `hasMore` field of the page would be false, but we'd be sending a truncated list of convs. * ConverastionStore: Make GetConverastionIds work during migration This is done by making the pagingState encode last conversation Id served. The store effect has a new action to list only remote conv ids. The `GetConverastionIds` action has been removed and implemented generally using `GetLocalConverastionIds` and `GetRemoteConversationIds`. This makes `MultiTabelPage` type obsolete for conv ids, but its still kept around so we don't break any APIs. * ConversationStore.Cassandra: Save users joining their first remote conv in postgres This is consistent with creating new conversations in postgres. This way when the migration is complete already running galley instances won't create more data in Cassandra * ConversationStore.Cassandra: Use `embedClient` * ConversationStore.Migration: Implement function to migrate a remote statuses of a user * ConversationStore.Migration: Add top level functions to do the migration * Conversation.{Migration,Cassandra}: Ensure pending deletes from Cassandra are never read Exception: Listing local conversation ids and listing team conversation ids * ConverastionStore.Cassandra: Document limitation in listing conv ids * galley: Allow choosing migration interpreter for ConversationStore * background-worker: Integrate with ConvSubsystem to allow migrating convs to postgres * integration: Set cassandra keyspace and pg db names for dyn background worker * integration: Add a test to test conversation migration * background-worker: Add some observability to pg migration * integration: Add another test with more convos, but use proteus for speed * ConversationStore.{Cassandra,Postgres}: Use same ordering of UUIDs Cassandra cares for UUID version in ordering. Postgres doesn't, so the Postgres query needs to be weird. * wire-subsystems: Move PGConstraints to Wire.Postgres * ConversationStore.Cassandra: Implement SearchConversation for the migration interpreter It returns empty. * galley-integration: Change assertions about pagination Because sometimes we don't have to get an empty page to realize we're at the end * integration-setup: More CPU/memory for postgresql Also stop any throttling, OOMKilling * charts/integration: Mount background-worker secrets * integration: Set cassandra keyspace for dynamic cannons * integration-setup: Allow tests to run for 5 more mins * galley-integration: Produce better error message * docs: Add steps for migrations * galley-integration: Relax the requirement that getConvs returns convs in same order as IDs
1 parent c1480bf commit cc06340

File tree

49 files changed

+2187
-251
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+2187
-251
lines changed
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
Starting this release, existing deployments can migrate the conversation data to
2+
PostgreSQL from Cassandra. This is necessary for channel search and management
3+
of channels from the team-management UI. It is highly recommended to take a
4+
backup of the Galley Cassandra before triggering the migration.
5+
6+
The migration needs to happen in 3 steps:
7+
8+
1. Prepare wire-server for migration.
9+
10+
This step make sure that wire-server keep working as expected during the
11+
migration. To do this deploy wire-server with this config change:
12+
13+
```yaml
14+
galley:
15+
config:
16+
postgresqlMigration:
17+
conversation: migrate-to-postgresql
18+
```
19+
20+
This change should restart all the galley pods, any new conversations will
21+
now be written to PostgreSQL.
22+
23+
2. Trigger the migration and wait.
24+
25+
This step will actually carry out the migration. To do this deploy
26+
wire-server with this config change:
27+
28+
```yaml
29+
background-worker:
30+
config:
31+
migrateConversations: true
32+
```
33+
34+
This change should restart the background-worker pods. It is recommended to
35+
watch the logs and wait for both of these two metrics to report `1.0`:
36+
`wire_local_convs_migration_finished` and `wire_user_remote_convs_migration_finished`.
37+
This can take a long time depending on number of conversations in the DB.
38+
39+
3. Configure wire-server to only use PostgreSQL for conversations.
40+
41+
This will be the configuration which must be used from now on for every new
42+
release.
43+
44+
```yaml
45+
galley:
46+
config:
47+
postgresqlMigration:
48+
conversation: postgresql
49+
background-worker:
50+
config:
51+
migrateConversations: false
52+
```
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Support migration of all conversation data to Postgresql.

charts/background-worker/templates/_helpers.tpl

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,19 @@ created one (in case the CA is provided as PEM string.)
2323
{{- dict "name" "background-worker-cassandra" "key" "ca.pem" | toYaml -}}
2424
{{- end -}}
2525
{{- end -}}
26+
27+
{{- define "useCassandraTLSGalley" -}}
28+
{{ or (hasKey .cassandraGalley "tlsCa") (hasKey .cassandraGalley "tlsCaSecretRef") }}
29+
{{- end -}}
30+
31+
{{/* Return a Dict of TLS CA secret name and key
32+
This is used to switch between provided secret (e.g. by cert-manager) and
33+
created one (in case the CA is provided as PEM string.)
34+
*/}}
35+
{{- define "tlsSecretRefGalley" -}}
36+
{{- if .cassandraGalley.tlsCaSecretRef -}}
37+
{{ .cassandraGalley.tlsCaSecretRef | toYaml }}
38+
{{- else }}
39+
{{- dict "name" "background-worker-cassandra-galley" "key" "ca.pem" | toYaml -}}
40+
{{- end -}}
41+
{{- end -}}
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{{/* Secret for the provided Cassandra TLS CA. */}}
2+
{{- if not (empty .Values.config.cassandraGalley.tlsCa) }}
3+
apiVersion: v1
4+
kind: Secret
5+
metadata:
6+
name: background-worker-cassandra-galley
7+
labels:
8+
app: background-worker
9+
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
10+
release: "{{ .Release.Name }}"
11+
heritage: "{{ .Release.Service }}"
12+
type: Opaque
13+
data:
14+
ca.pem: {{ .Values.config.cassandraGalley.tlsCa | b64enc | quote }}
15+
{{- end }}

charts/background-worker/templates/configmap.yaml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,24 @@ data:
3030
tlsCa: /etc/wire/background-worker/cassandra/{{- (include "tlsSecretRef" . | fromYaml).key }}
3131
{{- end }}
3232
33+
cassandraGalley:
34+
endpoint:
35+
host: {{ .cassandraGalley.host }}
36+
port: 9042
37+
keyspace: galley
38+
{{- if hasKey .cassandraGalley "filterNodesByDatacentre" }}
39+
filterNodesByDatacentre: {{ .cassandraGalley.filterNodesByDatacentre }}
40+
{{- end }}
41+
{{- if eq (include "useCassandraTLSGalley" .) "true" }}
42+
tlsCa: /etc/wire/background-worker/cassandra-galley/{{- (include "tlsSecretRefGalley" . | fromYaml).key }}
43+
{{- end }}
44+
45+
postgresql: {{ toYaml .postgresql | nindent 6 }}
46+
postgresqlPool: {{ toYaml .postgresqlPool | nindent 6 }}
47+
{{- if hasKey $.Values.secrets "pgPassword" }}
48+
postgresqlPassword: /etc/wire/background-worker/secrets/pgPassword
49+
{{- end }}
50+
3351
{{- with .rabbitmq }}
3452
rabbitmq:
3553
host: {{ .host }}
@@ -46,6 +64,8 @@ data:
4664
{{- end }}
4765
{{- end }}
4866
67+
migrateConversations: {{ .migrateConversations }}
68+
4969
backendNotificationPusher:
5070
{{toYaml .backendNotificationPusher | indent 6 }}
5171
{{- end }}

charts/background-worker/templates/deployment.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ spec:
2727
checksum/configmap: {{ include (print .Template.BasePath "/configmap.yaml") . | sha256sum }}
2828
checksum/secret: {{ include (print .Template.BasePath "/secret.yaml") . | sha256sum }}
2929
checksum/cassandra-secret: {{ include (print .Template.BasePath "/cassandra-secret.yaml") . | sha256sum }}
30+
checksum/cassandra-galley-secret: {{ include (print .Template.BasePath "/cassandra-galley-secret.yaml") . | sha256sum }}
3031
fluentbit.io/parser: json
3132
spec:
3233
serviceAccount: null
@@ -44,11 +45,19 @@ spec:
4445
secret:
4546
secretName: {{ (include "tlsSecretRef" .Values.config | fromYaml).name }}
4647
{{- end }}
48+
{{- if eq (include "useCassandraTLSGalley" .Values.config) "true" }}
49+
- name: "background-worker-cassandra-galley"
50+
secret:
51+
secretName: {{ (include "tlsSecretRefGalley" .Values.config | fromYaml).name }}
52+
{{- end }}
4753
{{- if .Values.config.rabbitmq.tlsCaSecretRef }}
4854
- name: "rabbitmq-ca"
4955
secret:
5056
secretName: {{ .Values.config.rabbitmq.tlsCaSecretRef.name }}
5157
{{- end }}
58+
{{- if .Values.additionalVolumes }}
59+
{{ toYaml .Values.additionalVolumes | nindent 8 }}
60+
{{- end }}
5261
containers:
5362
- name: background-worker
5463
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
@@ -64,10 +73,17 @@ spec:
6473
- name: "background-worker-cassandra"
6574
mountPath: "/etc/wire/background-worker/cassandra"
6675
{{- end }}
76+
{{- if eq (include "useCassandraTLSGalley" .Values.config) "true" }}
77+
- name: "background-worker-cassandra-galley"
78+
mountPath: "/etc/wire/background-worker/cassandra-galley"
79+
{{- end }}
6780
{{- if .Values.config.rabbitmq.tlsCaSecretRef }}
6881
- name: "rabbitmq-ca"
6982
mountPath: "/etc/wire/background-worker/rabbitmq-ca/"
7083
{{- end }}
84+
{{- if .Values.additionalVolumeMounts }}
85+
{{ toYaml .Values.additionalVolumeMounts | nindent 10 }}
86+
{{- end }}
7187
env:
7288
- name: RABBITMQ_USERNAME
7389
valueFrom:

charts/background-worker/templates/secret.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,7 @@ data:
1515
{{- with .Values.secrets }}
1616
rabbitmqUsername: {{ .rabbitmq.username | b64enc | quote }}
1717
rabbitmqPassword: {{ .rabbitmq.password | b64enc | quote }}
18+
{{- if .pgPassword }}
19+
pgPassword: {{ .pgPassword | b64enc | quote }}
20+
{{- end }}
1821
{{- end }}

charts/background-worker/values.yaml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,39 @@ config:
3131
# key: <ca-attribute>
3232
cassandra:
3333
host: aws-cassandra
34+
cassandraGalley:
35+
host: aws-cassandra
36+
37+
# Postgres connection settings
38+
#
39+
# Values are described in https://www.postgresql.org/docs/17/libpq-connect.html#LIBPQ-PARAMKEYWORDS
40+
# To set the password via a brig secret see `secrets.pgPassword`.
41+
#
42+
# `additionalVolumeMounts` and `additionalVolumes` can be used to mount
43+
# additional files (e.g. certificates) into the brig container. This way
44+
# does not work for password files (parameter `passfile`), because
45+
# libpq-connect requires access rights (mask 0600) for them that we cannot
46+
# provide for random uids.
47+
#
48+
# Below is an example configuration we're using for our CI tests.
49+
postgresql:
50+
host: postgresql # DNS name without protocol
51+
port: "5432"
52+
user: wire-server
53+
dbname: wire-server
54+
postgresqlPool:
55+
size: 5
56+
acquisitionTimeout: 10s
57+
agingTimeout: 1d
58+
idlenessTimeout: 10m
59+
60+
61+
# Setting this to `true` will start conversation migration to postgresql.
62+
#
63+
# NOTE: It is very important that galley be configured to with
64+
# `settings.postgresMigration.conversation` with `migration-to-postgresql`
65+
# before setting this to `true`.
66+
migrateConversations: false
3467

3568
backendNotificationPusher:
3669
pushBackoffMinWait: 10000 # in microseconds, so 10ms

charts/integration/templates/integration-integration.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,15 @@ spec:
6060
- name: "federator-ca"
6161
configMap:
6262
name: "federator-ca"
63+
6364
- name: "background-worker-config"
6465
configMap:
6566
name: "background-worker"
6667

68+
- name: "background-worker-secrets"
69+
secret:
70+
secretName: "background-worker"
71+
6772
- name: "stern-config"
6873
configMap:
6974
name: "backoffice"
@@ -250,6 +255,9 @@ spec:
250255
- name: background-worker-config
251256
mountPath: /etc/wire/background-worker/conf
252257

258+
- name: background-worker-secrets
259+
mountPath: /etc/wire/background-worker/secrets
260+
253261
- name: stern-config
254262
mountPath: /etc/wire/stern/conf
255263

docs/src/developer/reference/config-options.md

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1683,10 +1683,9 @@ used as `password` field.
16831683

16841684
### Using PostgreSQL for storing conversation data
16851685

1686-
This is currently not the default and is experimental.
1687-
The migration path from Cassandra is yet to be programmed.
1686+
#### New Installations
16881687

1689-
However, new installations can use this by configuring the wire-server helm
1688+
New installations can use this by configuring the wire-server helm
16901689
chart like this:
16911690

16921691
```yaml
@@ -1696,6 +1695,61 @@ galley:
16961695
conversation: postgresql
16971696
```
16981697

1698+
#### Migration for existing installations
1699+
1700+
Existing installations should migrate the conversation data to PostgreSQL from
1701+
Cassandra. This is necessary for channel search and management of channels from
1702+
the team-management UI. It is highly recommended to take a backup of the Galley
1703+
Cassandra before triggering the migration.
1704+
1705+
The migration needs to happen in 3 steps:
1706+
1707+
1. Prepare wire-server for migration.
1708+
1709+
This step make sure that wire-server keep working as expected during the
1710+
migration. To do this deploy wire-server with this config change:
1711+
1712+
```yaml
1713+
galley:
1714+
config:
1715+
postgresqlMigration:
1716+
conversation: migrate-to-postgresql
1717+
```
1718+
1719+
This change should restart all the galley pods, any new conversations will
1720+
now be written to PostgreSQL.
1721+
1722+
2. Trigger the migration and wait.
1723+
1724+
This step will actually carry out the migration. To do this deploy
1725+
wire-server with this config change:
1726+
1727+
```yaml
1728+
background-worker:
1729+
config:
1730+
migrateConversations: true
1731+
```
1732+
1733+
This change should restart the background-worker pods. It is recommended to
1734+
watch the logs and wait for both of these two metrics to report `1.0`:
1735+
`wire_local_convs_migration_finished` and `wire_user_remote_convs_migration_finished`.
1736+
This can take a long time depending on number of conversations in the DB.
1737+
1738+
3. Configure wire-server to only use PostgreSQL for conversations.
1739+
1740+
This will be the configuration which must be used from now on for every new
1741+
release.
1742+
1743+
```yaml
1744+
galley:
1745+
config:
1746+
postgresqlMigration:
1747+
conversation: postgresql
1748+
background-worker:
1749+
config:
1750+
migrateConversations: false
1751+
```
1752+
16991753
## Configure Cells
17001754

17011755
If Cells integration is enabled, gundeck must be configured with the name of

0 commit comments

Comments
 (0)