Description
We need to develop a robust pipeline to increase the volume of our training datasets. The goal is to take an existing dataset of LLM conversations (including system, user, assistant, and tool_call roles) and generate new, high-quality synthetic samples that maintain the same intent and distribution as the target data.