Skip to content

missing code in ppo train #3

@shijunticc

Description

@shijunticc

在训练过程中生成和评估扩散曲线

if epoch % 10 == 0: # 每10个epoch生成一次扩散曲线
observations, actions, rewards, next_observations, terminals = self.synth_er_generator.sample(
num_samples=100000)
# 在此处使用生成的样本进行策略优化或其他评估
# 例如,可以使用这些样本更新PPO的经验池,或直接用于策略的评估和改进

Thank you for sharing your work, could you please add this part of the missing code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions