jobs: factorise common job logic into BaseJob (bug 1983208)#475
jobs: factorise common job logic into BaseJob (bug 1983208)#475
Conversation
20883c3 to
73ee347
Compare
73ee347 to
4116cbe
Compare
cgsheeh
left a comment
There was a problem hiding this comment.
Looks great, thanks for cleaning this up!
Please see my comments about tuples in to_dict, but otherwise this LGTM.
| When(status=cls.SUBMITTED, then=1), | ||
| When(status=cls.IN_PROGRESS, then=2), |
There was a problem hiding this comment.
I wonder if we should process IN_PROGRESS jobs first, since those would have been at the start of the queue when the worker crashed. Probably out of scope for this PR.
There was a problem hiding this comment.
I think it would be better to not consider IN_PROGRESS jobs as ones we need to process, rather, they should be marked as FAILED if there's a crash. We should be able to automatically detect that when starting a landing worker. This would prevent a loop where a job that causes the worker to crash continuously gets attempted.
There was a problem hiding this comment.
I think the case to have IN_PROGESS jobs second is that if said job kills the worker badly, processing another SUBMITTED job next allows us to inch forwards, rather than be in a tight fail-loop that would block the processing of any message. Not great, but it turns a critical failure into a slow down.
There are two cases I can think of where we'd end up in this situation: either a poison pill job, in which case marking as FAILED would make sense, or the worker was killed due to other reasons (restarts, new deploy, cosmic rays, ...), in which case we may not want to set the job to FAILED (though it would be such a rare occurrence that, as long as there are notifications about the failure, we could probably take the hit without too much disruption).
In any case, I'd rather not change this as part of the refactor, so as to not muddle the waters, but I think it's worth considering our options.
There was a problem hiding this comment.
zzzeid
left a comment
There was a problem hiding this comment.
Mostly nits, just BaseJob.__str__ I think which technically is incorrect right now.
| When(status=cls.SUBMITTED, then=1), | ||
| When(status=cls.IN_PROGRESS, then=2), |
There was a problem hiding this comment.
I think it would be better to not consider IN_PROGRESS jobs as ones we need to process, rather, they should be marked as FAILED if there's a crash. We should be able to automatically detect that when starting a landing worker. This would prevent a loop where a job that causes the worker to crash continuously gets attempted.
Co-authored-by: Zeid <2043828+zzzeid@users.noreply.github.com>
Most addressed, except for IN_PROGRESS management, left for later.
BaseJobmodel and move all common attributes from LandingJob and AutomationJob to it, as well as queue-management logic.to_dictmethod for various JSON representation needs.importsthroughout.