-
Notifications
You must be signed in to change notification settings - Fork 22
workers: revision worker implementation #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workers: revision worker implementation #178
Conversation
93dab06 to
4182a52
Compare
|
@zzzeid Curious about the motivation for using If this is still exploratory work (it is a WIP after all) that's fine too of course. :) |
This is exploratory work, though the main motivation is around graceful shutdown, which is half-broken right now. It's a relatively small change to get graceful shutdown to work if we are mid-processing a job (at least as long as the job is not stalled). In the future we will have to shore-up this implementation to allow for cancelling jobs at will or having a timeout (which will require asyncio) to avoid what happened in bug 1740791. This is still a WIP, there's a couple more changes to add to get the graceful shutdown to work completely as expected on both workers, and there's more testing I am going to be doing. However there should be no functional change to how jobs are processed. Rather than factoring out a buggy functionality in the new abstract worker, figured I would fix this while I'm there. Edit: this is only exploratory in the sense that it's still not fully tested and proven to behave perfectly well. Once that's done, if there aren't any problems uncovered, this will likely land as-is unless there's a good reason not to. |
|
Changing a sync app to instead run as async isn't really a small change, it might be in terms of the diff size but it changes how the app works on a fundamental level. Before we add async features we should consciously decide that it's worth the added complexity in Lando. It's also unclear to me why "to allow for cancelling jobs at will or having a timeout" will require asyncio. Perhaps if you could elucidate this I could get on board with this change, but in my experience using the async features of Python is only rarely worth the trouble and this use case doesn't seem appropriate. |
Can you give an example of what sort of trouble you'd run into? I think using asyncio can complicate debugging, so that could be a big detractor, but otherwise it's a pretty mature feature that I think could be leveraged in beneficial ways -- specifically, the revision worker could be used to run multiple jobs asynchronously since it doesn't modify the state of the remote repo and will need to process a large number of revisions concurrently/continuously. We could possibly also have two types of workers and keep the landing worker sync if we anticipate problems.
There are other alternatives of solving this problem (specifically timeouts), it's not a requirement. However, with the current reliance on the flags that control the running, paused, or processing state of the worker, this implementation is more of a bug fix. For example, if you send a SIGTERM or SIGKILL while a job is processing, the worker will never exit gracefully since the worker state is modified after the signal handler is executed. This particular problem is solved with the event loop. Of course, we could also track this state elsewhere (e.g. in a lock file or DB, etc.) but that also adds different complexity -- I'll be considering other approaches here, but would be curious to hear what kinds of problems you think we'll run into with asyncio. |
4182a52 to
9c13862
Compare
|
Complications in debugging are one example. We'll also likely have to re-write large parts of the app in async if we start writing async-specific functionality (/me shudders thinking of previous personal projects where this exact thing happened...) since you can't call async functions from non-async functions. Once we do that, we'll need to make sure our tests account for the change to async code flow. It's also worth noting that calling sync functions from async functions is blocking anyways and kills the benefit of having the event loop. I haven't checked but some of our dependencies are likely blocking/sync specific, for example accessing the database. We would likely need to update all our My point isn't that we shouldn't be doing things concurrently/in-parallel, or that we shouldn't use async because it isn't a mature feature, it's that we should consider other alternatives as
Yeah, this is an unfortunate situation. Could we avoid making any changes to the DB until the end of the worker's execution (ie a db transaction that only commits once the full task is complete?) Is there other worker state to consider here? |
9c13862 to
7f2a24e
Compare
bac1509 to
7add66f
Compare
7add66f to
d251d17
Compare
c5d2cab to
2962439
Compare
2962439 to
1c52dec
Compare
e01494c to
8b62298
Compare
0907457 to
f849935
Compare
cgsheeh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, mostly some code-level nits that stood out to me.
I'm sure you are aware but this is quite a lot of code on top of being a large design change. :) Splitting out some of the trivial changes into very small PRs would be desirable IMO, just to keep the review process moving along. I'll try and dig deeper into this tomorrow.
| @@ -0,0 +1,448 @@ | |||
| # This Source Code Form is subject to the terms of the Mozilla Public | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be highly desirable for worker.py or landing_worker.py to be registered as a move in Git instead of a new file, to keep git blame history for the file. My vote would be for the landing_worker.py.
- use updated response from differential.revision.search endpoint - translate `stackGraph` field to phids and edges - update `tests.mocks` such that revision response includes new field
This is a work in progress, do not merge! - add new RevisionWorker that pre-processes revisions (bug 1788728) - add new command (lando-cli revision-worker) to start new worker - add new flags to stop workers gracefully - add method to parse diff and list affected files - add mots integration (bug 1740107) - store mots output in revision model - run pre/post mots query - check mots hashes - add mots to requirements - include new Lando revision info via API endpoint - create abstract Worker class (bug 1744327) - add repo.use_revision_worker feature flag (bug 1788732) - add proper loop/process functionality to workers - refactored revision worker and landing worker - implement stack hashes to detect changes in revisions - functional multi edge search - remove s3/boto/etc. dependencies (bug 1753728) - add new "created" landing job status to handle revision propagation - add patch caching on disk - add many to many fields + association to revisions/landing jobs - refactor dependency and stack fetching and parsing using networkx - add main worker flag and capacity/throttle flags TODO: - add more test coverage for revision_worker.py
ec9962d to
2215e14
Compare
NOTE: commit to be cleaned up / squashed. - update project cache to be more streamlined - add support for cache when testing locally - refactor build_stack_graph to use revision directly - reduce the number of calls to phabricator API - improve stack discovery - fix tests
This way, processor and revision workers are separated into their own classes with shared functionality in the base class.
- add ability to specify max loops in workers - simplify sleep/throttle/stop/pause methods - add various unit tests - add integration test to test Supervisor/Processor functionality
- landing worker workflow - merge conflict - add ability to modify edges in phabdouble - test updating stack layout
- add some logging for debugging - lock table when updating revision - modify repo config
342f979 to
45c76ef
Compare
|
Closing this PR and moving to #224 so I can modify PR dependencies more easily. |
This is a work in progress, do not merge!
All commits will be squashed into the minimum number of commits before merging.
revision_worker.pyRevisionWorkerthat pre-processes revisions (bug 1788728)*_WORKER_STOPPED)get_stack_hashes)Workerclasslando-cli landing-workertolando-cli start-landing-workerTODO:
landing_workertoworkersmodule