Relations between sequence, segment, batch and chunk

Thanks for putting this repo together! This is great work! 
There is something I think I am completely missing somehow...
I am confused wrt to the relations between sequences, segments, batches and chunks

At inference, suppose I want to feed the model with a long text (>> context window size) and then ask it questions about the whole thing -
So IIUC I do the following:

I initialize the model and the memory to the learned M0
I break the long text into window-size segments, feet it the first segment, and do a forward pass once, letting the memory be updated
Then, without reinitializing the memory I feed it the next segment, and so on
Finally I can ask it questions wrt to facts spread across all segments, and hopefully it retrieves them successfully (NIHS-style)
If I than want to go on to work on another long text then I need to init the memory back to the learned M0 and go again
Is that correct?

If so - does training work the same?
How do you do the next-token prediction on the long context?
Do you do the same? Feed it very long context and update the memory?
Is the memory initialized to the learned M0 value after each long text? When is M0 updated?
How do batch and chunk fit in here?

Finally - after training - do we expect the model to scale to texts longer then those it has seen in training?

I know this is all a noob set of questions but I have been through the papers and through the code and I can't figure it out...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relations between sequence, segment, batch and chunk #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Relations between sequence, segment, batch and chunk #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions