-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Summary
After analyzing the codebase, I found several discrepancies between what's documented in the README and what's actually implemented/integrated:
Issues Identified
- Real-World cost directory is unused
The Real-World cost/ directory contains code but is never imported or referenced by any other part of the codebase
No actual real-world evaluation pipeline exists
README correctly states "Support for the Real-World environment will be provided later" but doesn't clarify that the current code is non-functional - Simpler-Env trajectory evaluation is not integrated
Code exists: Data Collection/maniskill2_evaluator.py contains complete GCPG (Guided-cost Preference Generation) implementation with:
Stage-wise cost calculation (cal_cost())
External reward computation (external_score_option1/2())
Final trajectory scoring with R_self, R_ext, and I_success components
But not used: The actual evaluation in test_cmds.txt and Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py only returns simple success/failure boolean values
Manual integration required: README mentions "Overwrite ./Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py with ./Data Collection/maniskill2_evaluator.py" but this step appears to not be automated or integrated into the main workflow - LIBERO evaluation lacks trajectory scoring
Similar issue: Data Collection/libero_data_collect.py exists but standard LIBERO evaluation doesn't use trajectory-level preference generation - Installation instructions are confusing
The README installation section contains many manual file overwriting operations:
"Overwrite ./Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py with ./Data Collection/maniskill2_evaluator.py"
"Overwrite experiments/robot/libero/run_libero_eval.py with ./Data Collection/libero_data_collect.py"
"Overwrite the modeling_prismatic.py in your tpo-model's folder"
These manual steps are error-prone and make the setup process unclear
Expected vs Actual Behavior
Expected (from README):
Functional real-world evaluation capabilities
Integrated trajectory evaluation with GCPG rewards in simulation environments
Preference generation during evaluation
Actual:
Real-world code is placeholder/incomplete
Simulation evaluation uses standard success/failure metrics only
Trajectory evaluation code exists separately but requires manual file overwriting
Setup process involves many confusing manual file replacement steps
Suggestions
Clarify documentation: Update README to clearly indicate which features are fully implemented vs. planned
Streamline installation: Replace manual file overwriting with proper configuration options or installation scripts
Provide integration instructions: Add clear steps for enabling trajectory evaluation features
Consider automation: Create scripts to automatically set up the modified evaluators instead of manual file replacement
Fix or remove: Either fix the Real-World directory or clearly mark it as experimental/incomplete
Environment
Analyzed codebase structure and dependencies
Compared Data Collection modifications with original Simpler-Env evaluators
Checked actual usage in test commands and evaluation scripts
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels