-
Notifications
You must be signed in to change notification settings - Fork 0
[CV2-5117] batch formats #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| def classify(self, task_prompt, items_count, max_tokens_per_item=200): | ||
| pass | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I think all changes in this file are just updates from formatter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I link line 292 is the only substantive change: for item in data["input_items"]:
| :param num_of_keywords: int | ||
| :returns: str | ||
| """ | ||
| # TODO: loop over passed in items, or actually trigger processing in batch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine as a first pass on the idea imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahmednasserswe adding you here for visibility, let us know if you have any thoughts about this file specifically or the refactor as a whole.
| class Message(BaseModel): | ||
| body: GenericItem | ||
| request_id: Union[str, int, float] | ||
| items: List[GenericItem] # to support batch, this is a list (can be only 1 item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, not terribly complicated!
| id: Union[str, int, float] | ||
| id: Union[str, int, float] # id (in calling system) for this content | ||
| content_hash: Optional[str] = None | ||
| callback_url: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking this out of here feels bad - why couldn't each item have its own callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classycat is implemented so it is making a single callback with all of the data in it. I suppose we could support both options? Like if the callback is present at the top-level, make a single callback with all the results, if at individual level, respond per item.?(but does the entire batch input get returned with each individual item? Or presto has to parse apart the input list) I guess having neither or both types of callback would be an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this further, I really like having the callback at the upper level, because it is an instruction to Presto system, not to the model (model doesn't know what to do with it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, but it does likely mean that we can't use batch processing for any event on Check API without some somewhat significant revisions - we'd likely be adding callback functionality specifically to deal with batching. Not a terrible outcome, but does make it more complicated elsehwere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch processing for any event on Check API without some somewhat significant revisions
Can you explain how you see this working so we can ensure the format encompasses that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - right now, if we allowed for the callback urls at the individual item level, we'd basically not have to make any changes to Check-API (or Alegre for that matter). The downside of course is that you'd open yourself up to making a ton of HTTP calls, which, not great (IMO i think the answer here is to allow callbacks at either level), but also, no revisions and potentially singular paths for receiving responses for work regardless of if the provenance was from a batch job or an individual job. If we go the route of only a top-level callback, we'd probably have to do some temporary redis key storage to remember what was sent out, and what needs to be done upon receiving, the results of work. Also, if we're running large batches of vectored work.... are we signing up for potentially gigantic POST bodies? Will they work in SQS even, or do they need to even live in SQS? I think they would?
Anyways to answer your question I think the meat of the issue would be that we'd have to introduce a bunch more state management paths for remembering and dealing with the outcomes of callbacks on Check API and Alegre - right now it's basically implied because the structure of the request contains all we need to know, but potentially just introduces more overhead. Not a terrible thing! Just something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so it sounds like for the format, you'd be ok with "callback can be at individual item or batch level (but must be at least one and not both)"?
remember what was sent out, and what needs to be done upon receiving, the results of work.
Or, you have to carry enough state along with the object to know what to do when it comes back. Which is what we are trying to support in making sure the ids match back to the original payload. ie. for timpani, the items carry a target_state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally in the callback belongs at the model level and not at the item level camp.
I think it's right that we would make changes to Alegre/Check API if they start sending batches of items.
- If a caller wants individual callbacks, they should send items one at a time.
- If the caller wants one callback for a batch, then they should send a batch of items.
Right now, everything on Alegre/Check is set for sending items one at a time. When we implement the bulk similarity endpoint on Alegre, I'd expect to make changes to the callback handling in Alegre so that it can receive back one batch of items. This makes sense because Alegre can then process that batch efficiently as a batch --- i.e., calling the bulk insertion point on Open/Elasticsearch. Having items comeback one at a time from a bulk submission prevents the caller from handling the response as a batch.
tl;dr fine if we want to support callbacks at an individual level, but I don't see it as a priority because we already have a workaround: If a caller wants individual callbacks from Presto they should send items to Presto one at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I'm bought in for the toplevel callback - thank you both for the discussion, I think it was useful to at least air it out as it'll have ripple effects. Ship it at the top level!
ashkankzme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry that it took me a while to get to it. I tried my best to be thorough and read everything in depth and left some comments. looking forward to the discussions!
|
|
||
| Presto models should use these structures to ensure that tools that consume the data can access in a consistant way across models | ||
|
|
||
| Details of specific input formats are usually in the schema.py fiels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing I tried to start in the previous refactor of parse_message() was to have these model specific types defined in the model / in a separate file than schemas.py (e.g. classycat_response.py). going forward we don't want to keep updating schemas.py for new models/edits to existing models, and instead update the files specific to the model under construction/review. I have already filed a ticket for taking out whatever 'response classes' defined in the schemas.py file, see this jira ticket
so I would edit this line to include the example of classycat_response.py and emphasize on defining these models separately going forward.
| General design | ||
|
|
||
| * Models that don't handle requests in parallel need to implement batch format (but can give errors if they recieve more than 1 item) | ||
| * "top level" elements are processed and controlled by the Presto system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be beneficial to specify exactly what we mean by 'top level' here, i.e. is it specific fields (we should name if so) or are there more generic rules for defining 'top level' elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is specific fields (which we should name) and the general concept that the outermost set of keys talk to Presto, and the next level down talk to the specific presto model service.
|
|
||
| * Models that don't handle requests in parallel need to implement batch format (but can give errors if they recieve more than 1 item) | ||
| * "top level" elements are processed and controlled by the Presto system | ||
| * Elements inside parameters, and inside individual input items are passed through to the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to not pass other parameters into the models? it's good to have the flexibility of using other parameters upon need if doing so is not disruptive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is confusing. By "top level" I mean the outer-most set of dictionary keys in the payload. I think everything is passed through to the model? But Presto is responsible for enforcing and checking keys at the outermost level, but shouldn't really enforce or check inside parameters or items lists (other than that ids exist). Anything that isn't in the presto schema isn't promised to be passed through (we could always add to the schema if it becomes too constraining).
I think greater flexibility is often faster in the short term but then ends up with everything having to do lots of conditional checking about elements that may or may not exist vs being able to check and enforce schemas and fail back to caller before enqueuing malformed messages to models
| * Models that don't handle requests in parallel need to implement batch format (but can give errors if they recieve more than 1 item) | ||
| * "top level" elements are processed and controlled by the Presto system | ||
| * Elements inside parameters, and inside individual input items are passed through to the model | ||
| * All of the content in a batch (a single html request) must go to the same model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future versions of classycat with local classification could receive an input batch, some of which to be processed locally (the items with previous similar references) and some processed with an LLM (more novel items). I would say this is a desired case for one batch being processed by two different models and the current architecture allows for it. The outputs would ofc still be as if all processed by the same model.
It would be good to have clarity on what we exactly mean here by 'all items in a batch go to the same model'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I mean more precisely, "from the perspective of the calling service, all the of the items in a batch must be dispatched to the same Presto model service". Classycat can of course internally decide to process via different LLMs "under the hood" but don't mix payloads intended for classycat with payloads intended for paraphrase multilingual in the same call.
| ## Outgoing requests from presto | ||
|
|
||
| * The outgoing reponse request includes the payloads and parameters from the incoming request | ||
| * items in the response must to include their id, so that the caller can match individual-level properties from incoming request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: extra 'to'
|
|
||
|
|
||
| class ClassyCatResponse(BaseModel): | ||
| # TODO: replase with the presto repose class? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this to replace ClassyCatResponse with presto response class? can you elaborate a bit more on what this could be like in practice please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol this comment may beat my personal best for number of spelling errors per line!
I think I was meaning that the response message and list of items that ClassyCat was defining are now part of the PrestoResponse class in schemas.py and enforced for all models. So ClassyCat can import and extend that class and the ResponseItem class (into the classy schemas files) to add the ClassyCat specific fields
| :param num_of_keywords: int | ||
| :returns: str | ||
| """ | ||
| # TODO: loop over passed in items, or actually trigger processing in batch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahmednasserswe adding you here for visibility, let us know if you have any thoughts about this file specifically or the refactor as a whole.
| if 'yake_keywords' in model_name: | ||
| if ( | ||
| result_instance is None | ||
| ): # in case the model does not have a parse_input_message method implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo this is not a more readable formatting than what was before lol, any chance to manually make this pretty or revert this line back?
@skyemeedan is there a way to opt out of formatting everything in the future? aside from cases like these, it makes it difficult to find out what the actual diffs are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is hard to read when substantive and formatting changes are mixed together. Which is why I'm a huge fan of configuring my editor to use a strict PEP-8 syntax formatting hook (especially if we all do it!) I find it saves a huge amount of time not having to manually mess with formatting or linting fixes as much.
I could make a separate PR to first apply the formatting to the whole repo? ;-)
I can also just move the comment up a line and it should fit within the line length
| ) | ||
|
|
||
| for item in data["parameters"]["items"]: | ||
| for item in data["input_items"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As scott notes, this is the substantive change in this file, now looping over the set of input items, unnested from parameters
Description
Proposal for changes and batchifycation of presto request and response structures, standardization of argument handling across classes, and some suggestions on edits to get us there
Reference: CV2-5117
How has this been tested?
Has it been tested locally? Are there automated tests?
Are there any external dependencies?
Are there changes required in sysops terraform for this feature or fix?
Have you considered secure coding practices when writing this code?
Please list any security concerns that may be relevant.