Mismatch between documented features and actual implementation - Missing trajectory evaluation integration

Summary
After analyzing the codebase, I found several discrepancies between what's documented in the README and what's actually implemented/integrated:
Issues Identified
1. Real-World cost directory is unused
The Real-World cost/ directory contains code but is never imported or referenced by any other part of the codebase
No actual real-world evaluation pipeline exists
README correctly states "Support for the Real-World environment will be provided later" but doesn't clarify that the current code is non-functional
2. Simpler-Env trajectory evaluation is not integrated
Code exists: Data Collection/maniskill2_evaluator.py contains complete GCPG (Guided-cost Preference Generation) implementation with:
Stage-wise cost calculation (cal_cost())
External reward computation (external_score_option1/2())
Final trajectory scoring with R_self, R_ext, and I_success components
But not used: The actual evaluation in test_cmds.txt and Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py only returns simple success/failure boolean values
Manual integration required: README mentions "Overwrite ./Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py with ./Data Collection/maniskill2_evaluator.py" but this step appears to not be automated or integrated into the main workflow
3. LIBERO evaluation lacks trajectory scoring
Similar issue: Data Collection/libero_data_collect.py exists but standard LIBERO evaluation doesn't use trajectory-level preference generation
4. Installation instructions are confusing
The README installation section contains many manual file overwriting operations:
"Overwrite ./Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py with ./Data Collection/maniskill2_evaluator.py"
"Overwrite experiments/robot/libero/run_libero_eval.py with ./Data Collection/libero_data_collect.py"
"Overwrite the modeling_prismatic.py in your tpo-model's folder"
These manual steps are error-prone and make the setup process unclear
Expected vs Actual Behavior
Expected (from README):
Functional real-world evaluation capabilities
Integrated trajectory evaluation with GCPG rewards in simulation environments
Preference generation during evaluation
Actual:
Real-world code is placeholder/incomplete
Simulation evaluation uses standard success/failure metrics only
Trajectory evaluation code exists separately but requires manual file overwriting
Setup process involves many confusing manual file replacement steps
Suggestions
Clarify documentation: Update README to clearly indicate which features are fully implemented vs. planned
Streamline installation: Replace manual file overwriting with proper configuration options or installation scripts
Provide integration instructions: Add clear steps for enabling trajectory evaluation features
Consider automation: Create scripts to automatically set up the modified evaluators instead of manual file replacement
Fix or remove: Either fix the Real-World directory or clearly mark it as experimental/incomplete
Environment
Analyzed codebase structure and dependencies
Compared Data Collection modifications with original Simpler-Env evaluators
Checked actual usage in test commands and evaluation scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch between documented features and actual implementation - Missing trajectory evaluation integration #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mismatch between documented features and actual implementation - Missing trajectory evaluation integration #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions