Skip to content

idiap/sdialog

Repository files navigation

SDialog Logo

Documentation Status CI codecov PyPI version Downloads License: MIT Open In Colab


SDialog is a modular Python toolkit for synthetic dialog generation, evaluation, and analysis. It standardizes a Dialog schema and offers personaโ€‘driven multiโ€‘agent simulation with LLMs, composable orchestration, builtโ€‘in metrics, and mechanistic interpretabilityโ€”so you can generate reliable, controllable dialog systems or data at scale.

Quick links: Docs โ€ข API โ€ข Demo (Colab) โ€ข Tutorials โ€ข Datasets (HF) โ€ข Issues

โœจ Key features

  • Standard dialog schema with JSON import/export (aiming to standardize dialog dataset formats with your help ๐Ÿ™)
  • Personaโ€‘driven multiโ€‘agent simulation with contexts, tools, and thoughts
  • Composable orchestration for precise control over behavior and flow
  • Builtโ€‘in evaluation (metrics + LLMโ€‘asโ€‘judge) for comparison and iteration
  • Native mechanistic interpretability (inspect and steer activations)
  • Easy creation of user-defined components by inheriting from base classes (personas, metrics, orchestrators, etc.)
  • Interoperability across OpenAI, Hugging Face, Ollama, AWS Bedrock, Google GenAI, Anthropic, and more.

If you are building conversational systems, benchmarking dialog models, producing synthetic training corpora, simulating diverse users to test or probe conversational systems, or analyzing internal model behavior, SDialog provides an endโ€‘toโ€‘end workflow.

โšก Installation

pip install sdialog

Alternatively, a ready-to-use Apptainer image (.sif) with SDialog and all dependencies is available on Hugging Face and can be downloaded here.

apptainer exec --nv sdialog.sif python3 -c "import sdialog; print(sdialog.__version__)"

Note

This Apptainer image also has the Ollama server preinstalled.

๐Ÿ Quickstart tour

Here's a short, handsโ€‘on example: a support agent helps a customer disputing a double charge. We add a small refund rule and two simple tools, generate three dialogs for evaluation, then serve the agent on port 1333 for Open WebUI or any OpenAIโ€‘compatible client.

import sdialog
from sdialog import Context
from sdialog.agents import Agent
from sdialog.personas import SupportAgent, Customer
from sdialog.orchestrators import SimpleReflexOrchestrator

# First, let's set our preferred default backend:model and parameters
sdialog.config.llm("openai:gpt-4.1", temperature=1, api_key="YOUR_KEY")  # or export OPENAI_API_KEY=YOUR_KEY
# sdialog.config.llm("ollama:qwen3:14b")  # etc.

# Let's define our personas (use built-ins like in this example, or create your own!)
support_persona = SupportAgent(name="Ava", politeness="high", communication_style="friendly")
customer_persona = Customer(name="Riley", issue="double charge", desired_outcome="refund")

# (Optional) Let's define two mock tools (just plain Python functions) for our support agent
def account_verification(user_id):
    """Verify user account by user id."""
    return {"user_id": user_id, "verified": True}
def refund(amount):
    """Process a refund for the given amount."""
    return {"status": "refunded", "amount": amount}

# (Optional) Let's also include a small rule-based orchestrator for our support agent
react_refund = SimpleReflexOrchestrator(
  condition=lambda utt: "refund" in utt.lower(),
  instruction="Follow refund policy; verify account, apologize, refund.",
)

# Now, let's create the agents!
support_agent = Agent(
  persona=support_persona,
  think=True,  # Let's also enable thinking mode
  tools=[account_verification, refund],
  name="Support"
)
simulated_customer = Agent(
  persona=customer_persona,
  first_utterance="Hi!",
  name="Customer"
)

# Since we have one orchestrator, let's attach it to our target agent
support_agent = support_agent | react_refund

# Let's generate 3 dialogs between them! (we can evaluate them later)
# (Optional) Let's also define a concrete conversational context for the agents in these dialogs
web_chat = Context(location="chat", environment="web", circumstances="billing")
for ix in range(3):
  dialog = simulated_customer.dialog_with(support_agent, context=web_chat)  # Generate the dialog
  dialog.to_file(f"dialog_{ix}.json")  # Save it
  dialog.print(all=True)  # And pretty print it with all its events (thoughts, orchestration, etc.)

# Finally, let's serve our support agent to interact with real users (OpenAI-compatible API)
#    Point Open WebUI or any OpenAI-compatible client to: http://localhost:1333
support_agent.serve(port=1333)

Tip

Note

๐Ÿงช Testing remote systems with simulated users

You can also use SDialog as a controllable test harness for any OpenAIโ€‘compatible system such as vLLM-based ones by roleโ€‘playing realistic or adversarial users against your deployed system:

  • Blackโ€‘box functional checks (Does the system follow instructions? Handle edge cases?)
  • Persona / useโ€‘case coverage (Different goals, emotions, domains)
  • Regression testing (Run the same persona batch each release; diff dialogs)
  • Safety / robustness probing (Angry, confused, or noisy users)
  • Automated evaluation (Pipe generated dialogs directly into evaluators - See Evaluation section below)

Core idea: wrap your system as an Agent using openai: as the prefix of your model name string, talk to it with simulated user Agents, and capture Dialogs you can save, diff, and score.

Below is a minimal example where our simulated customer interacts once with your hypothetical remote endpoint:

# Our remote system (your conversational backend exposing an OpenAI-compatible API)
system = Agent(
  model="openai:your/model",  # Model name exposed by your server
  openai_api_base="http://your-endpoint.com:8000/v1",  # Base URL of the service
  openai_api_key="EMPTY",  # Or a real key if required
  name="System"
)

# Let's make our simulated customer talk with the system
dialog = simulated_customer.dialog_with(system)
dialog.to_file("dialog_0.json")

๐Ÿ’พ Loading and saving dialogs

Dialogs are rich objects with helper methods (filter, slice, transform, etc.) that can be easily exported and loaded using different methods:

from sdialog import Dialog

# Load from JSON (generated by SDialog using `to_file()`)
dialog = Dialog.from_file("dialog_0.json")

# Load from HuggingFace Hub datasets
dialogs = Dialog.from_huggingface("sdialog/Primock-57")

# Create from plain text files or strings - perfect for converting existing datasets!
dialog_from_txt = Dialog.from_str("""
Alice: Hello there! How are you today?
Bob: I'm doing great, thanks for asking.
Alice: That's wonderful to hear!
""")
# Or, equivalently if the content is in a txt file
dialog_from_txt = Dialog.from_file("conversation.txt")

# Load from CSV files with custom column names
dialog_from_csv = Dialog.from_file("conversation.csv",
                                   csv_speaker_col="speaker",
                                   csv_text_col="value",)

# All Dialog objects have rich manipulation methods
dialog.filter("Alice").rename_speaker("Alice", "Customer").upper().to_file("processed.json")
avg_words_turn = sum(len(turn) for turn in dialog) / len(dialog)

See Dialog section in the documentation for more information.

๐Ÿ“Š Evaluate and compare

Dialogs can be evaluated using the different components available inside the sdialog.evaluation module. Use builtโ€‘in metrics (readability, flow, linguistic features, LLM judges) or easily create new ones, then aggregate and compare datasets (sets of dialogs) via DatasetComparator.

from sdialog.evaluation import LLMJudgeRealDialog, LinguisticFeatureScore
from sdialog.evaluation import FrequencyEvaluator, MeanEvaluator
from sdialog.evaluation import DatasetComparator

reference = [...]   # list[Dialog]
candidate = [...]   # list[Dialog]

judge  = LLMJudgeRealDialog()
flesch = LinguisticFeatureScore(feature="flesch-reading-ease")

comparator = DatasetComparator([
  FrequencyEvaluator(judge, name="Realistic dialog rate"),
  MeanEvaluator(flesch, name="Mean Flesch Reading Ease"),
])

results = comparator({"reference": reference, "candidate": candidate})

# Plot results for each evaluator
comparator.plot()

๐Ÿง  Mechanistic interpretability

Attach Inspectors to capture perโ€‘token activations and optionally steer (add/ablate directions) to analyze or intervene in model behavior.

import sdialog
from sdialog.interpretability import Inspector
from sdialog.agents import Agent

sdialog.config.llm("huggingface:meta-llama/Llama-3.2-3B-Instruct")

agent = Agent(name="Bob")
inspector = Inspector(target="model.layers.16.post_attention_layernorm")
agent = agent | inspector

agent("How are you?")
agent("Cool!")

# Let's get the last response's first token activation vector!
act = inspector[-1][0].act # [response index][token index]

Steering intervention (subtracting a direction):

import torch
anger_direction = torch.load("anger_direction.pt")  # A direction vector (e.g., PCA / difference-in-mean vector)
agent_steered = agent | inspector - anger_direction  # Ablate the anger direction from the target activations

agent_steered("You are an extremely upset assistant")  # Agent "can't get angry anymore" :)

Tip

See the tutorial on using SDialog to remove the refusal capability from LLaMA 3.2.

๐Ÿ“– Documentation and tutorials

  • Demo notebook
  • Tutorials
  • API reference
  • Documentation
  • Documentation for AI coding assistants like Copilot is also available at https://sdialog.readthedocs.io/en/latest/llm.txt following the llm.txt specification. In your Copilot chat, simply use:
    #fetch https://sdialog.readthedocs.io/en/latest/llm.txt
    
    Your prompt goes here...(e.g. Write a python script using sdialog to have an agent for
    criminal investigation, define its persona, tools, orchestration...)
    

๐ŸŒ Project Vision & Community Call

To accelerate open, rigorous, and reproducible conversational AI research, SDialog invites the community to collaborate and help shape the future of open dialog generation.

๐Ÿค How You Can Help

  • ๐Ÿ—‚๏ธ Dataset Standardization: Help convert existing dialog datasets to SDialog format. Currently, each dataset stores dialogs in different formats, making cross-dataset analysis and model evaluation challenging. Converted datasets are made available as Hugging Face datasets in the SDialog organization for easy access and integration.
  • ๐Ÿ”ง Component Development: Create new personas, orchestrators, evaluators, generators, or backend integrations
  • ๐Ÿ“Š Evaluation & Benchmarks: Design new metrics, evaluation frameworks, or comparative studies
  • ๐Ÿง  Interpretability Research: Develop new analysis tools, steering methods, or mechanistic insights
  • ๐Ÿ“– Documentation & Tutorials: Improve guides, add examples, or create educational content
  • ๐Ÿ› Issues & Discussions: Report bugs, request features, or share research ideas and use cases

Note

Example: Check out Primock-57, a sample dataset already available in SDialog format on Hugging Face.

If you have a dialog dataset you'd like to convert to SDialog format, need help with the conversion process, or want to contribute in any other way, please open an issue or reach out to us. We're happy to help and collaborate!

๐Ÿ’ช Contributing

See CONTRIBUTING.md. We welcome issues, feature requests, and pull requests. If you want to contribute to the project, please open an issue or submit a PR, and help us make SDialog better ๐Ÿ‘

This project follows the all-contributors specification. All-contributors list:

Sergio Burdisso
Sergio Burdisso

๐Ÿ’ป ๐Ÿค” ๐Ÿ“– โœ…
Labrak Yanis
Labrak Yanis

๐Ÿ’ป ๐Ÿค”
Sรฉverin
Sรฉverin

๐Ÿ’ป ๐Ÿค” โœ…
Ricard Marxer
Ricard Marxer

๐Ÿ’ป ๐Ÿค”
Thomas Schaaf
Thomas Schaaf

๐Ÿ’ป
David Liu
David Liu

๐Ÿ’ป
ahassoo1
ahassoo1

๐Ÿค” ๐Ÿ’ป
Pawel Cyrta
Pawel Cyrta

๐Ÿ’ป ๐Ÿค”
ABCDEFGHIJKL
ABCDEFGHIJKL

๐Ÿ’ป

๐Ÿ™ Acknowledgments

This work was supported by the European Union Horizon 2020 project ELOQUENCE (grant number 101070558).

The initial development of this project began in preparation for the 2025 Jelinek Memorial Summer Workshop on Speech and Language Technologies (JSALT 2025) as part of the "Play your Part" research group.

๐Ÿ“ License

MIT License
Copyright (c) 2025 Idiap Research Institute

About

Synthetic Dialog Generation and Analysis with LLMs

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages