-
Notifications
You must be signed in to change notification settings - Fork 21
missing code in ppo train #3
Copy link
Copy link
Open
Description
在训练过程中生成和评估扩散曲线
if epoch % 10 == 0: # 每10个epoch生成一次扩散曲线
observations, actions, rewards, next_observations, terminals = self.synth_er_generator.sample(
num_samples=100000)
# 在此处使用生成的样本进行策略优化或其他评估
# 例如,可以使用这些样本更新PPO的经验池,或直接用于策略的评估和改进
Thank you for sharing your work, could you please add this part of the missing code?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels