Skip to content

Commit 28b5b10

Browse files
authored
Fix postgres migrations deadlocks (#3220)
* Set transaction_per_migration=True * Document Server upgrades
1 parent 7c2db9d commit 28b5b10

File tree

2 files changed

+31
-1
lines changed

2 files changed

+31
-1
lines changed

docs/docs/guides/server-deployment.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,28 @@ export DSTACK_DB_MAX_OVERFLOW=80
400400
You have to ensure your Postgres installation supports that many connections by
401401
configuring [`max_connections`](https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-MAX-CONNECTIONS) and/or using connection pooler.
402402

403+
## Server upgrades
404+
405+
When upgrading the `dstack` server, follow these guidelines to ensure a smooth transition and minimize downtime.
406+
407+
### Before upgrading
408+
409+
1. **Check the changelog**: Review the [release notes :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/releases){:target="_blank"} for breaking changes, new features, and migration notes.
410+
2. **Review backward compatibility**: Understand the [backward compatibility](#backward-compatibility) policy.
411+
3. **Back up your data**: Ensure you always create a backup before upgrading.
412+
413+
### Best practices
414+
415+
- **Test in staging**: Always test upgrades in a non-production environment first.
416+
- **Monitor logs**: Watch server logs during and after the upgrade for any errors or warnings.
417+
- **Keep backups**: Retain backups for at least a few days after a successful upgrade.
418+
419+
### Troubleshooting
420+
421+
**Deadlock when upgrading a multi-replica PostgreSQL deployment**
422+
423+
If a deployment is stuck due to a deadlock when applying DB migrations, try scaling server replicas to 1 and retry the deployment multiple times. Some releases may not support rolling deployments, which is always noted in the release notes. If you think there is a bug, please [file an issue](https://github.com/dstackai/dstack/issues).
424+
403425
## FAQs
404426

405427
??? info "Can I run multiple replicas of dstack server?"

src/dstack/_internal/server/migrations/env.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ def run_migrations_offline():
3636
literal_binds=True,
3737
dialect_opts={"paramstyle": "named"},
3838
)
39-
4039
with context.begin_transaction():
4140
context.run_migrations()
4241

@@ -61,12 +60,21 @@ def run_migrations(connection: Connection):
6160
# https://alembic.sqlalchemy.org/en/latest/batch.html#dealing-with-referencing-foreign-keys
6261
if connection.dialect.name == "sqlite":
6362
connection.execute(text("PRAGMA foreign_keys=OFF;"))
63+
elif connection.dialect.name == "postgresql":
64+
# lock_timeout is needed so that migrations that acquire locks
65+
# do not wait for locks forever, blocking live queries.
66+
# Better to fail and retry a deployment.
67+
connection.execute(text("SET lock_timeout='10s';"))
6468
connection.commit()
6569
context.configure(
6670
connection=connection,
6771
target_metadata=target_metadata,
6872
compare_type=True,
6973
render_as_batch=True,
74+
# Running each migration in a separate transaction.
75+
# Running all migrations in one transaction may lead to deadlocks in HA deployments
76+
# because lock ordering is not respected across all migrations.
77+
transaction_per_migration=True,
7078
)
7179
with context.begin_transaction():
7280
context.run_migrations()

0 commit comments

Comments
 (0)