-
Notifications
You must be signed in to change notification settings - Fork 289
feat: Use Helm hooks for applying database migrations (MPT-12683) #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: integration
Are you sure you want to change the base?
feat: Use Helm hooks for applying database migrations (MPT-12683) #730
Conversation
ffaraone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some comment to start the usual discussion about how to name things :)
please check db connection string urlencoding otherwise we will have problem in presence of some characters.
624819c to
40c05e2
Compare
|
Looks like I missed a few services:
Let me know if others are missing still. I'm marking this PR as Draft until this is finished |
I checked all services that use migrations (both with and without locks). Your list of missed services is correct. |
40c05e2 to
fef00d7
Compare
ffaraone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
…ker and jira_bus too :)
…mongo migrations (MPT-12683)
…strings (MPT-12683)
…tcd client (MPT-12683)
…pendent k8s hooks (per service) instead of init containers and thus guarantee they will always run once per deployment (MPT-12683)
6df48f4 to
13bbf61
Compare
|
i've tried to start cluster, but it failing with this is some debug info it looks for me cluster is trying to start, but failing waiting for wait elk, but it cannot start before hook. I'm not absolutely sure , but maybe need to wait for Jobs without a deletion policy of hook-succeeded or hook-failed. so, helm will
Deletes the Job after it succeeds (or before the next upgrade), this hook will run in background, and Helm won’t block waiting for it, but it looks like disrupt is possible here, because the services may start in parallel. Also a couple of concerns from my side:
So, in my opinion, in the current case need to render one migration Job per service, not only for DB. |
|
After investigating problematic I will not suggest to trigger migrations with Helm hooks, Hooks are synchronous relative to Helm, not our cluster. IMHO if we have to implement best practice to separate migrations from service, need to ship migrations as a Kubernetes Job (not a hook) .Run it as a batch/v1 Job with sane backoffLimit (need to make sure all our changes is idempotent and retriable) This keeps migrations visible/observable and decoupled from Helm’s lifecycle. It’s also a widely recommended approach. https://medium.com/@inchararlingappa/handling-migration-with-helm-28b9884c94a6 https://devops.stackexchange.com/questions/15261/helm-long-running-jobs-vs-long-running-hooks When Helm hooks are okay? Small projects, quick one-off tasks, or “install-time sanity checks.” (Not out case with complicated waiters and startup logic) |
Description
Use Helm Hooks for database migrations for all the services using alembic, clickhouse or mongodb migrations:
rest_api- alembicauth- alembicherald- alembicjira_bus- alembicslacker- alembickatara- alembicrisp_worker- clickhousemetroculus_worker- clickhousegemini_worker- clickhouseinsider_worker- mongodiworker- mongoThere is a seperate hook for each of the above services. That way we can:
mongo,clickhouseandrabbitmqbut alsorest_apito be running before they are applied because some of these migrations make calls torest_api. And of course therest_api's own migrations can't require the service to be running as they are executed before the server starts.Related issue number
MPT-12683: https://softwareone.atlassian.net/browse/MPT-12683
Special notes
In our Optscale deployment we're facing issues with migrations depending on
EtcdLock, mostly as we have a custom deployment ofetcd. So, to work around that (and not rely on etcd) we're instead running the migrations in a Helm hook job.As I needed to apply this to multiple services, I also extracted the
migrate.pyscripts into a common place --tools/db(and made some minor changes to allow it to be run for any service and inside a helm job).Checklist
Unit tests for the changes existN/A