missing code in ppo train

# 在训练过程中生成和评估扩散曲线
if epoch % 10 == 0:  # 每10个epoch生成一次扩散曲线
    observations, actions, rewards, next_observations, terminals = self.synth_er_generator.sample(
        num_samples=100000)
    # 在此处使用生成的样本进行策略优化或其他评估
    # 例如，可以使用这些样本更新PPO的经验池，或直接用于策略的评估和改进

Thank you for sharing your work, could you please add this part of the missing code?