Add backward logprobs and estimator_outputs in Trajectories #396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

hyeok9855 wants to merge 4 commits into master from backward-sampling

Collaborator

hyeok9855 commented Sep 25, 2025 •

edited

Loading

I've read the .github/CONTRIBUTING.md file
My code follows the typing guidelines
I've added appropriate tests
I've run pre-commit hooks locally

Description

Add backward_logprobs and backward_estimator_outputs in Trajectories, so that it doesn't have to be recomputed every time to get the loss.
We can input the precomputed log_rewards for backward sampling.
A few refactorings in Samplers
Remove is_backward flag from Transitions since we do not properly support it.

Discussion needed

We can't calculate loss with "backward" Trajectories. We always need to use .reverse_backward_trajectories before the loss calculation. The question is, is it worth supporting direct loss calculation using the backward Trajectories? This won't be too hard to implement.

This was also the case for the Transitions (although I removed the backward flag from Transitions; this can be easily reverted). If we want to support training with backward Trajectories, do we also want the same for the backward Transitions?

TODO

Currently, we can store only one of the PF or PB during sampling (i.e., we always need to call our NN module for loss calculation). My next PR will resolve this by calculating both PF and PB within a single sampling process.

hyeok9855 added 4 commits

September 25, 2025 17:15


          support backward_logprobs and backward_estimator_outputs in Trajectories

b3e5908


          support reuse of precomputed rewards for backward sampling

fe77661


          remove is_backward flag from Transitions since we don't actually supp…

05539e2

…ort it


          fix test

a12ce72

hyeok9855 requested review from josephdviviano and younik

September 25, 2025 17:17

hyeok9855 self-assigned this

josephdviviano requested changes

View reviewed changes

Collaborator

josephdviviano left a comment

Some questions / comments, but I do like this direction quite a bit! Thank you!

src/gfn/containers/trajectories.py

    
                      terminating_idx: torch.Tensor | None = None,

                      is_backward: bool = False,

                      log_rewards: torch.Tensor | None = None,

                      log_probs: torch.Tensor | None = None,

Collaborator

josephdviviano Sep 25, 2025

This is going to be an annoying change but I suspect we should be explicit about forward_log_probs and forward_estimator_outputs in the namespace as well.

src/gfn/containers/trajectories.py

    
                          self.backward_estimator_outputs.shape[: len(self.states.batch_shape)]

                          == self.actions.batch_shape

                          and self.backward_estimator_outputs.is_floating_point()

                      )

Collaborator

josephdviviano Sep 25, 2025

We could probably write a private _check_estimator_outputs() and check_log_probs() method which deduplicates this logic.

src/gfn/containers/trajectories.py

    
                  def terminating_states(self) -> States:

                      """The terminating states of the trajectories.

                      """The terminating states of the trajectories. If backward, the terminating states

                      are in 0-th position.

Collaborator

josephdviviano Sep 25, 2025

Worth explicitly stating whether these are s0 or s0+1?

Collaborator

josephdviviano Sep 25, 2025

Worth specifying the terminating state of the backward trajectory is s0, and the terminating state of the reversed forward trajectory is NOT s0.

src/gfn/containers/trajectories.py

Collaborator

josephdviviano Sep 25, 2025

We need to be careful about the definition of reversed and backward. A backward trajectory (evaluated under pb) runs in the opposite direction of a forward trajectory (evaluated under pf). These can be reversed, but the reverse of a forward trajectory still corresponds to pf, and a reversed backward trajectory still corresponds to pb. |

Can we add these definitions here, and ensure that the language is consistent throughout.

src/gfn/containers/trajectories.py

    
                          (max_len + 2, len(self)), device=states.device

                      )

                      # shape (max_len + 2, n_trajectories, *state_dim)

                      actions = self.actions  # shape (max_len, n_trajectories *action_dim)

Collaborator

josephdviviano Sep 25, 2025

Is it now possible to replace reverse_backward_trajectories() with simply reverse() which works on either forward or backward trajectories?

We could keep reverse_backward_trajectories() as essentially an alias:

def reverse_backward_trajectories(self):
    assert self.is_backward()
    self.reverse()

src/gfn/containers/transitions.py

Collaborator

josephdviviano Sep 25, 2025

i'm wondering if we should have a @property for both Transitions and Trajectories which is self.has_forward and self.has_backward for readability.

src/gfn/utils/prob_calculations.py

    
                  Raises:

                      ValueError: If backward transitions are provided.

                  """

                  if transitions.is_backward:

Collaborator

josephdviviano Sep 25, 2025

I think we need some kind of check here. For example, transitions.has_forward()

tutorials/examples/train_hypergrid_buffer.py

    
                      optimizer.zero_grad()

                      loss = gflownet.loss(env, buffer_trajectories, recalculate_all_logprobs=True)

                      loss = gflownet.loss(env, buffer_trajectories, recalculate_all_logprobs=False)

Collaborator

josephdviviano Sep 25, 2025

This is because the forward logprobs are already populated? How?

src/gfn/samplers.py

    
                          # terminating states (can be passed to log_reward fn)

                      if self.is_backward:

                          # [IMPORTANT ASSUMPTION] When backward sampling, all provided states are the

                          # *terminating* states (can be passed to log_reward fn)

Collaborator

josephdviviano Sep 25, 2025

Why do we need this assumption?

src/gfn/gflownet/detailed_balance.py

    
                          transition.

                      """

                      if transitions.is_backward:

                          raise ValueError("Backward transitions are not supported")

Collaborator

josephdviviano Sep 25, 2025

I'm confused about how this will work in practice. I suspect we still need some kind of assertion here.

josephdviviano self-assigned this

hyeok9855 marked this pull request as draft

October 27, 2025 13:47

hyeok9855 mentioned this pull request

Refactor Detailed Balance #432

Merged

4 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet