Skip to content

[WIP] Copy tinker to skyrl-train to eliminate cross-repo dependencies#1012

Draft
tyler-griggs wants to merge 1 commit intotyler/fix-tinker-placement-groupfrom
tyler/tinker-in-skyrl-train
Draft

[WIP] Copy tinker to skyrl-train to eliminate cross-repo dependencies#1012
tyler-griggs wants to merge 1 commit intotyler/fix-tinker-placement-groupfrom
tyler/tinker-in-skyrl-train

Conversation

@tyler-griggs
Copy link
Member

Summary

Experimental/WIP: Copied entire tinker implementation from skyrl-tx into skyrl-train to test if eliminating cross-repository dependencies resolves Ray worker environment issues.

This PR is stacked on #1010 (placement group fixes).

What was copied

  • api.py - FastAPI server
  • engine.py - Background engine
  • db_models.py - SQLModel database models
  • types.py - Type definitions
  • config.py - Engine configuration
  • backends/backend.py - Abstract backend interface
  • backends/skyrl_train.py - SkyRL-Train backend (with placement group fixes from Fix placement group creation in SkyRL-Train backend #1010)
  • backends/utils.py - Utilities

Import updates

  • from tx.tinker.*from skyrl_train.tinker.*
  • from tx.utils.logfrom loguru
  • Inlined tx.utils.storage.download_file()
  • Made ExternalInferenceClient import optional

Dependencies added

Added tinker optional dependency group to pyproject.toml:

  • fastapi, uvicorn, sqlmodel, sqlalchemy, aiosqlite, cloudpathlib, httpx

Current Issue

Ray workers are still downloading all packages (vllm, torch, etc.) even though we're already in the correct environment. Investigating why the working gsm8k example doesn't hit this issue.

Logs show Ray is auto-packaging the working directory:

INFO packaging.py:588 -- Creating a file package for local module '/home/tyler/SkyRL/SkyRL-tinker/skyrl-train'.
(raylet) Creating virtual environment at: .venv
(raylet) Downloading vllm (452.9MiB)

Next Steps

  • Debug why gsm8k doesn't trigger Ray auto-packaging
  • Fix Ray worker environment to use existing venv
  • Test sl_loop.py end-to-end
  • Consider if this approach is worth keeping vs fixing cross-repo dependencies

🤖 Co-Authored-By: Claude Sonnet 4.5

Copied tinker API, engine, and skyrl_train backend from skyrl-tx into
skyrl-train to test if eliminating the cross-repository dependency resolves
Ray worker environment issues.

Changes:
- Copied tinker/{api,engine,db_models,types,config}.py to skyrl_train/tinker/
- Copied backends/{backend,skyrl_train,utils}.py to skyrl_train/tinker/backends/
- Updated all imports: tx.tinker.* -> skyrl_train.tinker.*
- Updated imports: tx.utils.log -> loguru
- Inlined tx.utils.storage.download_file() to remove dependency
- Made ExternalInferenceClient import optional
- Updated engine subprocess to use sys.executable instead of uv run
- Added tinker optional dependencies to pyproject.toml
- Set RAY_RUNTIME_ENV_LOCAL_DEV_MODE=0 to disable auto-packaging

Known issue: Ray still auto-packages working directory causing workers to
reinstall all dependencies. Investigating why gsm8k example doesn't hit this.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant