Skip to content

Relations between sequence, segment, batch and chunk #8

@embarrassing-noob-questions

Description

Thanks for putting this repo together! This is great work!
There is something I think I am completely missing somehow...
I am confused wrt to the relations between sequences, segments, batches and chunks

At inference, suppose I want to feed the model with a long text (>> context window size) and then ask it questions about the whole thing -
So IIUC I do the following:

I initialize the model and the memory to the learned M0
I break the long text into window-size segments, feet it the first segment, and do a forward pass once, letting the memory be updated
Then, without reinitializing the memory I feed it the next segment, and so on
Finally I can ask it questions wrt to facts spread across all segments, and hopefully it retrieves them successfully (NIHS-style)
If I than want to go on to work on another long text then I need to init the memory back to the learned M0 and go again
Is that correct?

If so - does training work the same?
How do you do the next-token prediction on the long context?
Do you do the same? Feed it very long context and update the memory?
Is the memory initialized to the learned M0 value after each long text? When is M0 updated?
How do batch and chunk fit in here?

Finally - after training - do we expect the model to scale to texts longer then those it has seen in training?

I know this is all a noob set of questions but I have been through the papers and through the code and I can't figure it out...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions