Skip to content

Update GBS32 RCPs#1

Closed
ShriyaRishab wants to merge 1 commit intomasterfrom
shriya/fix-gptoss-gbs32-rcps
Closed

Update GBS32 RCPs#1
ShriyaRishab wants to merge 1 commit intomasterfrom
shriya/fix-gptoss-gbs32-rcps

Conversation

@ShriyaRishab
Copy link
Owner

import json, os, glob

target = 3.34

for gbs in ['gbs32']:
    print(f'=== {gbs} ===')
    results = []
    for i in range(20):
        logfile = os.path.join(gbs, f'run_{i}.log')
        samples = None
        with open(logfile) as f:
            for line in f:
                if '\"key\": \"eval_accuracy\"' not in line:
                    continue
                # extract the JSON after :::MLLOG
                json_str = line.split(':::MLLOG ', 1)[-1]
                entry = json.loads(json_str)
                if entry['value'] <= target:
                    samples = entry['metadata']['samples_count']
                    break
        results.append(samples)
        print(f'  run_{i}: {samples}')
    print(f'\nAll values: {results}')
    print()

Running this parsing script from small_llm_moe_pretraining/primus/rcp_logs gives different samples to converge for GBS32 but this looks more realistic.

Given that GBS16 takes about 195k samples to converge
The old RCPs for GBS32 was converging in ~179k samples which doesn't make sense
The new RCPs for GBS32 converge in 235k which is a lot more reasonable and match the logs provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant