Skip to content

Mismatch between documented features and actual implementation - Missing trajectory evaluation integration #20

@skyswordw

Description

@skyswordw

Summary
After analyzing the codebase, I found several discrepancies between what's documented in the README and what's actually implemented/integrated:
Issues Identified

  1. Real-World cost directory is unused
    The Real-World cost/ directory contains code but is never imported or referenced by any other part of the codebase
    No actual real-world evaluation pipeline exists
    README correctly states "Support for the Real-World environment will be provided later" but doesn't clarify that the current code is non-functional
  2. Simpler-Env trajectory evaluation is not integrated
    Code exists: Data Collection/maniskill2_evaluator.py contains complete GCPG (Guided-cost Preference Generation) implementation with:
    Stage-wise cost calculation (cal_cost())
    External reward computation (external_score_option1/2())
    Final trajectory scoring with R_self, R_ext, and I_success components
    But not used: The actual evaluation in test_cmds.txt and Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py only returns simple success/failure boolean values
    Manual integration required: README mentions "Overwrite ./Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py with ./Data Collection/maniskill2_evaluator.py" but this step appears to not be automated or integrated into the main workflow
  3. LIBERO evaluation lacks trajectory scoring
    Similar issue: Data Collection/libero_data_collect.py exists but standard LIBERO evaluation doesn't use trajectory-level preference generation
  4. Installation instructions are confusing
    The README installation section contains many manual file overwriting operations:
    "Overwrite ./Simpler-env/simpler_env/evaluation/maniskill2_evaluator.py with ./Data Collection/maniskill2_evaluator.py"
    "Overwrite experiments/robot/libero/run_libero_eval.py with ./Data Collection/libero_data_collect.py"
    "Overwrite the modeling_prismatic.py in your tpo-model's folder"
    These manual steps are error-prone and make the setup process unclear
    Expected vs Actual Behavior
    Expected (from README):
    Functional real-world evaluation capabilities
    Integrated trajectory evaluation with GCPG rewards in simulation environments
    Preference generation during evaluation
    Actual:
    Real-world code is placeholder/incomplete
    Simulation evaluation uses standard success/failure metrics only
    Trajectory evaluation code exists separately but requires manual file overwriting
    Setup process involves many confusing manual file replacement steps
    Suggestions
    Clarify documentation: Update README to clearly indicate which features are fully implemented vs. planned
    Streamline installation: Replace manual file overwriting with proper configuration options or installation scripts
    Provide integration instructions: Add clear steps for enabling trajectory evaluation features
    Consider automation: Create scripts to automatically set up the modified evaluators instead of manual file replacement
    Fix or remove: Either fix the Real-World directory or clearly mark it as experimental/incomplete
    Environment
    Analyzed codebase structure and dependencies
    Compared Data Collection modifications with original Simpler-Env evaluators
    Checked actual usage in test commands and evaluation scripts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions