📺 TVWorld: Foundations for Remote-Control TV Agents

⭐️ Introduction

We introduce 📺 TVWorld, an offline graph-based abstraction of real-world TV navigation that enables reproducible and deployment-free evaluation. On this basis, we derive two complementary benchmarks that comprehensively assess TV-use capabilities: TVWorld-N for topology-aware navigation and TVWorld-G for focus-aware grounding. These benchmarks expose a key limitation of existing agents: insufficient topology awareness for focus-based, long-horizon TV navigation. Motivated by this finding, we propose a Topology-Aware Training framework that injects topology awareness into LVLMs. Using this framework, we develop TVTheseus, a foundation model specialized for TV navigation.

📕 Core Contributions

🕹️ TVWorld-N is an offline interactive TV navigation environment for evaluating agents' topology-aware planning under focus-based remote-control, supporting both textual and visual goals. Operating purely on static graph assets, it is fully replayable and deployment-free (e.g., no VMs/emulators), and enables millisecond-level interaction, avoiding the instability and overhead of online GUI benchmarks.
🎯 TVWorld-G evaluates focus-aware grounding by requiring the agent to localize the currently highlighted element within the global screen layout using bounding-box annotations, directly reflecting the focus-based nature of TV control.
🤖 TVTheseus is a foundation model trained with the Topology-Aware Training framework, designed for robust and generalizable TV control by leveraging structured UI topology and focus-driven interaction.

💫 Open-Source Release

We open-source our complete pipeline to support further research in this area. All codes, models, and datasets are publicly available:

🤗 Model	🤗 Graph Resources	🤗 Benchmark
TVTheseus	TVWorld	TVWorld-N & TVWorld-G

📝 Training pipeline

Dependable TV navigation requires TV-use agents to reason over focus-based UI transitions in a goal-directed manner, while remaining robust to navigation errors such as detours and stalled states. We collectively refer to this interaction-level competence as topology awareness. To embed this latent capability into TV-use agents, we introduce a two-stage training approach that first injects topology-aware inductive biases via topology-priming supervised fine-tuning, and then progressively consolidates them through topology-augmented reinforcement learning. Through this training paradigm, we obtain TVTheseus, a foundation model specialized for robust and generalizable TV control.

🧪 Experiments

🕹️ TVWorld-N

🎯 TVWorld-G

📦 Installation

We recommend following the official VeRL installation guide. Below are the key package versions used in our setup:

python == 3.12
CUDA == 12.8

accelerate == 1.11.0
deepspeed == 0.18.2
flash_attn == 2.8.1
flashinfer-python == 0.3.1
ray == 2.51.1
torch == 2.8.0
transformers == 4.57.1
vllm == 0.11.0
xformers == 0.0.32.post1
xgrammar == 0.1.25

🚀 Quick Start

Please refer to this.

🎓 Acknowledgements

This repository is built on SWIRL. We gratefully acknowledge the open-source projects that made this work possible: VeRL, Qwen2.5-VL, vLLM.

🖊️ Citation

If you feel TVWorld useful in your project or research, please kindly use the following BibTeX entry to cite our paper and give us a star. Thanks!

@article{ma2026tvworld,
  title={TVWorld: Foundations for Remote-Control TV Agents},
  author={Ma, Zhantao and Lu, Quanfeng and Zhong, Shuai and Yu, Dahai and Luo, Ping and Ng, Michael K},
  journal={arXiv preprint arXiv:2601.13142},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
TVWorld		TVWorld
assets		assets
docker		docker
examples		examples
qwen-vl-finetune		qwen-vl-finetune
recipe		recipe
scripts		scripts
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-cuda.txt		requirements-cuda.txt
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📺 TVWorld: Foundations for Remote-Control TV Agents

⭐️ Introduction

📕 Core Contributions

💫 Open-Source Release

📝 Training pipeline

🧪 Experiments

🕹️ TVWorld-N

🎯 TVWorld-G

📦 Installation

🚀 Quick Start

🎓 Acknowledgements

🖊️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📺 TVWorld: Foundations for Remote-Control TV Agents

⭐️ Introduction

📕 Core Contributions

💫 Open-Source Release

📝 Training pipeline

🧪 Experiments

🕹️ TVWorld-N

🎯 TVWorld-G

📦 Installation

🚀 Quick Start

🎓 Acknowledgements

🖊️ Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages