Skip to content

Commit b9980d5

Browse files
unamedkrclaude
andcommitted
pillar1(R5): scripts/test_tokenizer.sh — BPE stale-entry regression guard
4-test regression preventing the R3 bug from being reintroduced: Qwen3-0.6B "Hello" → token [9707] (HF-verified) Qwen3-0.6B "The quick brown fox" → [785, 3974, 13876, 38835] Qwen3.5-4B "Hello" → 1 token (structural merge check) Qwen3.6-35B "Hello" → 1 token (structural merge check) If tokens[top.pos] < 0 staleness check is ever removed from the heap BPE merge loop (tq_tokenizer.c:1442), at least one of these will fail. Cheap, runs in <10s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 257c2f0 commit b9980d5

1 file changed

Lines changed: 81 additions & 0 deletions

File tree

scripts/test_tokenizer.sh

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
#!/usr/bin/env bash
2+
# Tokenizer-level regression: verifies our BPE output matches HF-known-good
3+
# token IDs for prompts that previously triggered the R3 BPE stale-entry bug.
4+
#
5+
# If this test fails, the heap-based BPE merge loop has likely regressed
6+
# (e.g., someone removed the `tokens[top.pos] < 0` staleness check at
7+
# tq_tokenizer.c:1442). See bench/results/2026-04-20_bpe_fix_proof.md.
8+
9+
set -u
10+
BIN="${BIN:-./build/quant}"
11+
MODELS_DIR="${MODELS_DIR:-models}"
12+
PASS=0
13+
FAIL=0
14+
SKIP=0
15+
16+
check_tokens() {
17+
local model="$1"
18+
local prompt="$2"
19+
local expected="$3" # space-separated token IDs
20+
local extra_env="${4:-}"
21+
if [[ ! -f "$MODELS_DIR/$model" ]]; then
22+
printf " %-40s [SKIP] not found\n" "$model"
23+
SKIP=$((SKIP+1))
24+
return
25+
fi
26+
local got
27+
got=$(env $extra_env TQ_DEBUG=1 "$BIN" "$MODELS_DIR/$model" -p "$prompt" -n 1 -T 0 2>&1 \
28+
| grep 'prompt tokens' | sed -E 's/.*prompt tokens \([0-9]+\): //' \
29+
| awk '{$1=$1; print}') # normalize whitespace
30+
if [[ "$got" == "$expected"* ]]; then
31+
printf " %-40s [PASS] %-20s → %s\n" "$model" "\"$prompt\"" "$got"
32+
PASS=$((PASS+1))
33+
else
34+
printf " %-40s [FAIL] \"%s\" expected:%s got:%s\n" "$model" "$prompt" "$expected" "$got"
35+
FAIL=$((FAIL+1))
36+
fi
37+
}
38+
39+
# Qwen3.5/3.6 share a 248320-token vocab different from Qwen3-0.6B's 151936.
40+
# We only assert the structural property: "Hello" merges to a SINGLE token
41+
# (not the pre-R3 broken pair [Hel, ll]=two tokens that decoded to "Helll").
42+
check_tokens_single() {
43+
local model="$1"
44+
local prompt="$2"
45+
local extra_env="${3:-}"
46+
if [[ ! -f "$MODELS_DIR/$model" ]]; then
47+
printf " %-40s [SKIP] not found\n" "$model"
48+
SKIP=$((SKIP+1))
49+
return
50+
fi
51+
local got
52+
got=$(env $extra_env TQ_DEBUG=1 "$BIN" "$MODELS_DIR/$model" -p "$prompt" -n 1 -T 0 2>&1 \
53+
| grep 'prompt tokens' | sed -E 's/.*prompt tokens \(([0-9]+)\).*/\1/')
54+
if [[ "$got" == "1" ]]; then
55+
printf " %-40s [PASS] \"%s\" → 1 token (merged)\n" "$model" "$prompt"
56+
PASS=$((PASS+1))
57+
else
58+
printf " %-40s [FAIL] \"%s\" → %s tokens (expected 1)\n" "$model" "$prompt" "$got"
59+
FAIL=$((FAIL+1))
60+
fi
61+
}
62+
63+
echo "=== Tokenizer regression (BPE stale-entry guard — Pillar 1 R3) ==="
64+
echo "Models dir: $MODELS_DIR"
65+
echo ""
66+
67+
# Qwen3 family — the originally broken path.
68+
# Expected token IDs verified against HF AutoTokenizer 2026-04-20.
69+
check_tokens "Qwen3-0.6B-Q4_K_M.gguf" "Hello" "9707" \
70+
"TQ_NO_METAL=1 TQ_NO_MLOCK=1"
71+
check_tokens "Qwen3-0.6B-Q4_K_M.gguf" "The quick brown fox" "785 3974 13876 38835" \
72+
"TQ_NO_METAL=1 TQ_NO_MLOCK=1"
73+
check_tokens_single "Qwen3.5-4B-Q4_K_M.gguf" "Hello" \
74+
"TQ_NO_METAL=1 TQ_NO_MLOCK=1"
75+
check_tokens_single "Qwen3.6-35B-A3B-UD-IQ4_XS.gguf" "Hello" \
76+
"TQ_NO_METAL=1 TQ_NO_MLOCK=1"
77+
78+
echo ""
79+
echo "--- Summary --- PASS=$PASS FAIL=$FAIL SKIP=$SKIP"
80+
[[ "$FAIL" -eq 0 ]] || exit 1
81+
exit 0

0 commit comments

Comments
 (0)