Skip to content

Support composite calibration dataset#16

Open
ivanl-cerebras wants to merge 4 commits intomainfrom
il/composite_dataset
Open

Support composite calibration dataset#16
ivanl-cerebras wants to merge 4 commits intomainfrom
il/composite_dataset

Conversation

@ivanl-cerebras
Copy link
Contributor

@ivanl-cerebras ivanl-cerebras commented Mar 17, 2026

This PR adds:

  • Support for tool calling datasets (Salesforce/xlam-function-calling-60k, SWE-bench/SWE-smith-trajectories)
  • Support for composite dataset specification (format used: theblackcat102/evol-codealpaca-v1:128,Salesforce/xlam-function-calling-60k:128,open-r1/Mixture-of-Thoughts[code]:128,open-r1/Mixture-of-Thoughts[math]:128,open-r1/Mixture-of-Thoughts[science]:128,SWE-bench/SWE-smith-trajectories(tool):128)
  • Using batch size > 1 for calibration (handles padding masks construction and treatment when computing pruning metrics)

Eval results for a trial run

Model: Qwen/Qwen3-30B-A3B
Compression level: 0.25
Dataset: "theblackcat102/evol-codealpaca-v1:8,Salesforce/xlam-function-calling-60k:8,open-r1/Mixture-of-Thoughts[code]:8,open-r1/Mixture-of-Thoughts[math]:8,open-r1/Mixture-of-Thoughts[science]:8,SWE-bench/SWE-smith-trajectories(tool):8"
Batch size: 16
MSL: 2048

HumanEval / HumanEval+: 0.9207 / 0.8780
MBPP / MBPP+: 0.8492 / 0.7222

@ivanl-cerebras ivanl-cerebras marked this pull request as ready for review March 19, 2026 11:27
@ivanl-cerebras
Copy link
Contributor Author

ivanl-cerebras commented Mar 19, 2026

сс @nikolail-cerebras to review

Copy link
Contributor

@mklasby mklasby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor changes for cleaning up debugger and we can set attention mask in a context manager to make more robust

tokenizer.save_pretrained(merged_model_dir)
except Exception as e:
import pdb; breakpoint()
import pdb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's raise here instead

self._current_attention_mask: Optional[torch.Tensor] = None
super().__init__(model, hook_config)

def set_attention_mask(self, attention_mask: Optional[torch.Tensor]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite as context manager or manage in a preforwards hook attached to full model

logger.info("No previous data found @ %s", f_name)
for sample in tqdm(cat_data, desc=f"Processing {category} samples"):
model(sample.to(model.device))
attention_mask = sample.get("attention_mask", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
attention_mask = sample.get("attention_mask", None)
attn_mask = sample.get("attention_mask", None)
with observer.set_attention_mask(attn_mask)
...
model(**sample)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants