Skip to content

Conversation

@Haichao-Zhang
Copy link
Contributor

@Haichao-Zhang Haichao-Zhang commented Dec 19, 2023

This PR implements the SMODICE algorithm.
SMODICE is an offline imitation algorithm. It is connected with GAIL algorithm as it also used the discriminator to learn the reward function, but is different from GAIL in that it does not requires on-policy samples.

Therefore, it more suitable for offline and off-policy cases.

Below is the return curve of smodice [orange] compared with sac [red]:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants