-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Thanks for putting this repo together! This is great work!
There is something I think I am completely missing somehow...
I am confused wrt to the relations between sequences, segments, batches and chunks
At inference, suppose I want to feed the model with a long text (>> context window size) and then ask it questions about the whole thing -
So IIUC I do the following:
I initialize the model and the memory to the learned M0
I break the long text into window-size segments, feet it the first segment, and do a forward pass once, letting the memory be updated
Then, without reinitializing the memory I feed it the next segment, and so on
Finally I can ask it questions wrt to facts spread across all segments, and hopefully it retrieves them successfully (NIHS-style)
If I than want to go on to work on another long text then I need to init the memory back to the learned M0 and go again
Is that correct?
If so - does training work the same?
How do you do the next-token prediction on the long context?
Do you do the same? Feed it very long context and update the memory?
Is the memory initialized to the learned M0 value after each long text? When is M0 updated?
How do batch and chunk fit in here?
Finally - after training - do we expect the model to scale to texts longer then those it has seen in training?
I know this is all a noob set of questions but I have been through the papers and through the code and I can't figure it out...