-
-
Notifications
You must be signed in to change notification settings - Fork 161
Description
What was unclear or otherwise insufficient?
I am checking all over the docs as how can I stop the inference during the onToken function as tokens are generated. I have a very particular usecase where I can tell, after certain amount of tokens if it's going correctly or I should stop the generation.
Most cases result in unhandled exceptions, calling completion.dispose(); seems to be a terrible idea as it results in unhandled exceptions that don't seem to matter where my try and catch are, right now I throw an error onToken and then call completition.dispose on the error handling, but it hangs like crazy; like I may as well just had left the competition end because it seems that it takes as much.
Like I'd like to see in documentation how it is done, it should be in the default example, in the tutorial, but it doesn't seem to be there; it is nowhere specified, throwing an error seems to be the best option, and completition.dispose seems to fail more often than not, throw an error and then calling content dispose seems to be the only way to prevent the unhandled exception, but it's really slow way, so it can't be that, it'll be faster to kill the program and reload the model than this function.
Sorry but I've spent too much and still haven't been able to reliably stop the inference, if you tell me how, I don't mind making a PR for docs, should be simple enough of a doc PR.
Recommended Fix
Just put the couple of lines in the documentation how to stop the inference.
Additional Context
No response
Are you willing to resolve this issue by submitting a Pull Request?
Yes, I have the time, and I know how to start.