-
Couldn't load subscription status.
- Fork 21
Open
Description
The GC-DPR has two steps
- The first step did a full batch forward without gradient, to get the full batch contrastive learning loss and corresponding embedding gradient.
- The second step conduct mini-batch forward, and assign the embedding gradient, then do backward. The mini-batch will loop through the full batch to computing all gradient and accumulate.
However, during the computation, there might be one issues:
- The backbone model has randomized dropout process, the dropout will make the 1 & 2 to be inconsistent. 1's dropout process will be different from 2, so 1's gradient can not be directly applied to 2. 2's gradient shall be calculated again for every mini-batch. This bug can be fixed using some more sophisticated operation to make sure 1&2 to be consistent.
Metadata
Metadata
Assignees
Labels
No labels