Triage of Tiny and Base model families (all quantization and language variants) to evaluate whether these smaller models are worth further benchmarking.

May 9, 2026Apple M4 Max / 36 GB / macOS 26.4.112 models50 samples per dataset
Accuracy (%) per language. Speed in ms per second of audio.
Model Samples Disk RAM Speed Avg overall Avg EN Avg multi EN EN noisy ES DA HU
Base full50142 MB334 MB57 ms69.1%91.2%54.4%91.8%90.6%88.4%39.5%35.3%
Base q8_05078 MB247 MB57 ms69.0%91.2%54.2%92.4%90.1%88.6%38.4%35.5%
Base q5_15057 MB219 MB57 ms68.7%90.3%54.3%90.8%89.9%88.9%37.6%36.4%
Tiny full5075 MB224 MB57 ms58.0%87.0%38.7%87.6%86.4%85.0%14.2%16.9%
Tiny q8_05042 MB174 MB57 ms57.6%87.0%38.0%87.2%86.8%85.0%11.2%17.9%
Tiny q5_15031 MB157 MB58 ms56.5%86.5%36.5%87.1%86.0%83.5%14.4%11.7%
Base q5_1 en5057 MB217 MB58 ms33.7%92.4%-5.3%92.7%92.1%3.8%-9.6%-10.2%
Base q8_0 en5078 MB247 MB58 ms33.7%92.1%-5.2%92.4%91.8%3.2%-7.9%-10.9%
Base full en50142 MB333 MB59 ms33.0%92.4%-6.7%93.1%91.8%0.8%-7.0%-13.8%
Tiny full en5075 MB224 MB59 ms26.5%88.4%-14.8%88.9%87.8%-1.2%-18.7%-24.4%
Tiny q5_1 en5031 MB157 MB59 ms25.2%88.2%-16.7%88.8%87.5%-1.0%-26.0%-23.2%
Tiny q8_0 en5042 MB173 MB58 ms24.7%88.6%-17.9%89.1%88.1%-2.3%-20.7%-30.7%

At a glance

Ratings computed from benchmark data, scaled 1 to 10. Accuracy is based on Word Error Rate (WER) and does not include punctuation yet.

Name Lang Translate Speed Accuracy
Base fullall95
Base q5_1all95
Base q8_0all95
Tiny fullall93
Tiny q8_0all93
Tiny q5_1all92
Base full enen81 (en: 9)
Base q5_1 enen91 (en: 9)
Base q8_0 enen91 (en: 9)
Tiny full enen81 (en: 9)
Tiny q5_1 enen81 (en: 9)
Tiny q8_0 enen81 (en: 9)

Charts

Bar chart showing average English, multilingual, and overall accuracy per model.
Average accuracy by group
Bar chart comparing transcription speed across models and test conditions.
Speed comparison across conditions
Bar chart comparing model accuracy across English, Spanish, Danish, and Hungarian benchmark conditions.
Accuracy by model and test condition