Thank for your excellent work!!!
I have some questions about user interaction with MDocAgent.
I think MDocAgent currently only works for some benchmark.
How can I use MDocAgent in real-world pdf RAG scenarios?
Suppose I have 500 PDFs, then preprocess them to get page text and page images, that's the offline part.
Then consider the online part!
- A user inputs a query.
- Retrieval tools should retrieve the topk images and topk texts.
- then the MDocAgent should generate the answer based on the retrieved text, retrieved images and the user's query.
- Finally, print the answer to the terminal or save as json, etc.
These are some issues I've met when trying to apply MDocAgent to my own pdf database for RAG.
It's possible I've missed something, but could you please offer some guidance about the interaction?