SMODICE Algorithm #1583

Haichao-Zhang · 2023-12-19T00:30:41Z

This PR implements the SMODICE algorithm.
SMODICE is an offline imitation algorithm. It is connected with GAIL algorithm as it also used the discriminator to learn the reward function, but is different from GAIL in that it does not requires on-policy samples.

Therefore, it more suitable for offline and off-policy cases.

Below is the return curve of smodice [orange] compared with sac [red]:

Smodice Algorithm

92e40ba

Haichao-Zhang requested a review from emailweixu December 19, 2023 00:30

Haichao-Zhang added 2 commits December 18, 2023 16:40

Add to train-play test

944496e

update docstr

82c59d1

Haichao-Zhang force-pushed the PR_smodice branch from 6763d36 to 82c59d1 Compare December 19, 2023 06:03

Haichao-Zhang added 2 commits January 6, 2024 00:11

smodice bipedal walker

679aa11

Add gradient penalty

7065cda

Haichao-Zhang force-pushed the PR_smodice branch from 33ede27 to 7065cda Compare January 9, 2024 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SMODICE Algorithm #1583

SMODICE Algorithm #1583

Uh oh!

Haichao-Zhang commented Dec 19, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SMODICE Algorithm #1583

Are you sure you want to change the base?

SMODICE Algorithm #1583

Uh oh!

Conversation

Haichao-Zhang commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Haichao-Zhang commented Dec 19, 2023 •

edited

Loading