Skip to content

1B model can't memorize facts after about 134 token distance. #2

@Max042004

Description

@Max042004

Mamba architecture feature fixed memory usage. But this result in information loss.

I make a test script to test the capability of 1B model for memorizing facts. This will inject different length of unimportant context, to test whether 1B model can remember the secret code in the beginning.
Below is the result:

 SSM Long-Context Recall Test
 Model: ./bitmamba_1b.bin
 Temp: 0.0 (deterministic)
============================================================

[Group 1] Code type: common word (apple)
-----------------------------------------------------------
no noise   | expected="apple"      | got="apple. The secret code is appl" | PASS
short      | expected="apple"      | got="apple.The secret code is apple" | PASS
medium     | expected="apple"      | got="apple. The weather is sunny to" | PASS
long       | expected="apple"      | got="written in the sky. The sun ri" | FAIL
very long  | expected="apple"      | got="a riddle. The farmer planted a" | FAIL

[Group 2] Code type: invented word (zorbak)
-----------------------------------------------------------
no noise   | expected="zorbak"     | got="a code that is used to communi" | FAIL
short      | expected="zorbak"     | got="zorbak.The secret"            | PASS
medium     | expected="zorbak"     | got="zorbak.The children"          | PASS
long       | expected="zorbak"     | got="zhiryak. The secret code"     | FAIL
very long  | expected="zorbak"     | got="zhir. The farmer��"       | FAIL

[Group 3] Code type: short number (42)
-----------------------------------------------------------
no noise   | expected="42"         | got="a code that is used to identif" | FAIL
short      | expected="42"         | got="��I love you.��"      | FAIL
medium     | expected="42"         | got="a code.The children are playin" | FAIL
long       | expected="42"         | got="a code. The secret code is a" | FAIL
very long  | expected="42"         | got="a code. The farmer planted a n" | FAIL

[Group 4] Code type: long number (99234)
-----------------------------------------------------------
no noise   | expected="99234"      | got="a code that is used to identif" | FAIL
short      | expected="99234"      | got="��I am a bird.�"        | FAIL
medium     | expected="99234"      | got="a code.The children are playin" | FAIL
long       | expected="99234"      | got="written in the sky. The sun ri" | FAIL
very long  | expected="99234"      | got="a code. The farmer planted a n" | FAIL

============================================================
 Results: 5 passed, 15 failed out of 20 tests
 Recall accuracy: 25%
============================================================

As we can see, 1B model can not remember secret code after inject 'long' context (about 134 tokens). Moreover, for those secret code is rare in training dataset like "zorbak" or numbers. 1B model performs even worse.

provided test script:

#!/bin/bash
# SSM Long-Context Recall Test
#
# Tests whether the model can recall a secret code embedded in a long context.
# Varies: distance (noise tokens between code and query), code type, noise type.
#
# Usage: ./scripts/test_ssm_recall.sh
# Run from the bitmamba.cpp directory.

BINARY="./bitmamba"
MODEL="./bitmamba_1b.bin"
TEMP=0.0   # deterministic: argmax sampling
MAX_TOKENS=8

PASS=0
FAIL=0
TOTAL=0

# Filler sentences (unrelated to secret codes)
FILLER_SHORT="The weather is sunny today. Birds are singing outside."
FILLER_MEDIUM="The weather is sunny today. Birds are singing outside. A cat sat on the mat. The river flows quietly downstream. Children played in the park until evening. The baker prepared fresh bread early in the morning."
FILLER_LONG="The weather is sunny today. Birds are singing outside. A cat sat on the mat. The river flows quietly downstream. Children played in the park until evening. The baker prepared fresh bread early in the morning. Stars appeared one by one as darkness fell. The old clock on the wall ticked steadily. Leaves rustled in the gentle breeze outside the window. A fisherman cast his line into the still lake. The mountain trail wound upward through the pine forest. Smoke rose lazily from the chimney of the farmhouse. The librarian carefully arranged the returned books on the shelf. Rain tapped softly against the glass pane all night long."
FILLER_VERYLONG="The weather is sunny today. Birds are singing outside. A cat sat on the mat. The river flows quietly downstream. Children played in the park until evening. The baker prepared fresh bread early in the morning. Stars appeared one by one as darkness fell. The old clock on the wall ticked steadily. Leaves rustled in the gentle breeze outside the window. A fisherman cast his line into the still lake. The mountain trail wound upward through the pine forest. Smoke rose lazily from the chimney of the farmhouse. The librarian carefully arranged the returned books on the shelf. Rain tapped softly against the glass pane all night long. The engineer inspected each bridge support carefully. A dog barked twice then went silent. The farmer harvested wheat before the storm arrived. Candles flickered on the dinner table as wind entered through a crack. The student read the same paragraph three times before understanding it. An old man fed pigeons near the fountain every afternoon without fail."

run_test() {
    local label="$1"
    local prompt="$2"
    local expected="$3"

    # Run inference, capture only stdout (generated text), strip whitespace
    output=$("$BINARY" "$MODEL" "$prompt" tokenizer $TEMP 1.0 0.0 1.0 0 $MAX_TOKENS clean 2>/dev/null | tr -d '\n' | sed 's/^[[:space:]]*//')

    TOTAL=$((TOTAL + 1))

    # Check if expected value appears at start of output
    if echo "$output" | grep -qi "^${expected}"; then
        STATUS="PASS"
        PASS=$((PASS + 1))
    else
        STATUS="FAIL"
        FAIL=$((FAIL + 1))
    fi

    printf "%-10s | expected=%-12s | got=%-30s | %s\n" \
        "$label" "\"$expected\"" "\"${output:0:30}\"" "$STATUS"
}

echo "============================================================"
echo " SSM Long-Context Recall Test"
echo " Model: $MODEL"
echo " Temp: $TEMP (deterministic)"
echo "============================================================"
echo ""

# --- Test group 1: Common English word ---
echo "[Group 1] Code type: common word (apple)"
echo "-----------------------------------------------------------"
run_test "no noise"    "The secret code is apple. The secret code is"                                             "apple"
run_test "short"       "The secret code is apple. ${FILLER_SHORT} The secret code is"                            "apple"
run_test "medium"      "The secret code is apple. ${FILLER_MEDIUM} The secret code is"                           "apple"
run_test "long"        "The secret code is apple. ${FILLER_LONG} The secret code is"                             "apple"
run_test "very long"   "The secret code is apple. ${FILLER_VERYLONG} The secret code is"                         "apple"
echo ""

# --- Test group 2: Rare/invented word ---
echo "[Group 2] Code type: invented word (zorbak)"
echo "-----------------------------------------------------------"
run_test "no noise"    "The secret code is zorbak. The secret code is"                                            "zorbak"
run_test "short"       "The secret code is zorbak. ${FILLER_SHORT} The secret code is"                           "zorbak"
run_test "medium"      "The secret code is zorbak. ${FILLER_MEDIUM} The secret code is"                          "zorbak"
run_test "long"        "The secret code is zorbak. ${FILLER_LONG} The secret code is"                            "zorbak"
run_test "very long"   "The secret code is zorbak. ${FILLER_VERYLONG} The secret code is"                        "zorbak"
echo ""

# --- Test group 3: Short number ---
echo "[Group 3] Code type: short number (42)"
echo "-----------------------------------------------------------"
run_test "no noise"    "The secret code is 42. The secret code is"                                                "42"
run_test "short"       "The secret code is 42. ${FILLER_SHORT} The secret code is"                               "42"
run_test "medium"      "The secret code is 42. ${FILLER_MEDIUM} The secret code is"                              "42"
run_test "long"        "The secret code is 42. ${FILLER_LONG} The secret code is"                                "42"
run_test "very long"   "The secret code is 42. ${FILLER_VERYLONG} The secret code is"                            "42"
echo ""

# --- Test group 4: Long number ---
echo "[Group 4] Code type: long number (99234)"
echo "-----------------------------------------------------------"
run_test "no noise"    "The secret code is 99234. The secret code is"                                             "99234"
run_test "short"       "The secret code is 99234. ${FILLER_SHORT} The secret code is"                            "99234"
run_test "medium"      "The secret code is 99234. ${FILLER_MEDIUM} The secret code is"                           "99234"
run_test "long"        "The secret code is 99234. ${FILLER_LONG} The secret code is"                             "99234"
run_test "very long"   "The secret code is 99234. ${FILLER_VERYLONG} The secret code is"                         "99234"
echo ""

# --- Summary ---
echo "============================================================"
echo " Results: $PASS passed, $FAIL failed out of $TOTAL tests"
PERCENT=$(( PASS * 100 / TOTAL ))
echo " Recall accuracy: ${PERCENT}%"
echo "============================================================"

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions