Skip to content

Reference model for LoRA #4309

@SpaceHunterInf

Description

@SpaceHunterInf

Greetings,

Quick question about our LoRA setup: are we initializing a separate reference model during training (e.g., for KL/anchor comparisons), or are we reusing the frozen base model as the reference since the base is frozen under PEFT?

If a separate reference is currently spun up, is there a way to configure the pipeline to reuse the base model as the reference to save memory/compute? Pointers to the relevant config flags or code path would be much appreciated.

Thank you very much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ⚡ PEFTRelated to PEFT❓ questionSeeking clarification or more information

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions