Initial model benchmarks to see if there are some models that we should leave out for a more extensive benchmark that will include way more samples. This does not include the Tiny/Base models which has been tested initially already

May 9, 2026Apple M4 Max / 36 GB / macOS 26.4.134 models50 samples per dataset
Accuracy (%) per language. Speed in ms per second of audio.
Model Samples Disk RAM Speed Avg overall Avg EN Avg multi EN EN noisy ES DA HU
Large V3503.0 GB4.0 GB183 ms92.7%95.0%91.1%96.3%93.8%96.5%87.3%89.5%
Large V3 q5_0501.1 GB2.0 GB146 ms92.3%94.9%90.6%96.3%93.6%96.5%86.9%88.3%
Large V3 Turbo q8_050834 MB1.1 GB108 ms91.6%95.2%89.2%95.5%94.8%96.7%83.2%87.5%
Large V3 Turbo q5_050574 MB800 MB101 ms91.5%94.9%89.3%95.1%94.6%96.8%84.5%86.5%
Large V3 Turbo501.6 GB1.9 GB109 ms91.5%95.1%89.1%95.4%94.7%96.7%83.2%87.4%
Large V2503.0 GB4.0 GB185 ms91.1%94.8%88.6%95.0%94.7%96.1%84.5%85.3%
Large V2 q8_0501.6 GB2.5 GB154 ms91.1%94.9%88.6%95.0%94.9%96.1%84.6%85.2%
Large V2 q5_0501.1 GB2.0 GB146 ms91.0%94.9%88.4%95.3%94.6%96.1%83.9%85.3%
Large V1503.0 GB4.0 GB184 ms89.4%94.2%86.3%94.7%93.7%95.8%80.7%82.2%
Parakeet TDT v350500 MB80 MB19 ms89.1%94.2%85.6%95.7%92.7%94.4%78.0%84.5%
Medium full501.5 GB2.1 GB116 ms88.0%94.1%83.9%94.8%93.4%95.2%78.2%78.3%
Medium q8_050766 MB1.4 GB106 ms87.9%94.2%83.6%95.0%93.4%95.2%77.9%77.9%
Medium q5_050514 MB1.1 GB99 ms87.8%94.3%83.5%95.3%93.3%95.2%77.1%78.2%
Small full50465 MB807 MB59 ms81.2%93.1%73.3%94.3%92.0%94.0%64.2%61.8%
Small q8_050252 MB558 MB58 ms81.0%93.1%73.0%94.1%92.1%94.0%63.9%61.0%
Small q5_150181 MB476 MB58 ms80.8%93.5%72.4%94.5%92.4%93.8%63.6%59.8%
Base full50142 MB334 MB57 ms69.1%91.2%54.4%91.8%90.6%88.4%39.5%35.3%
Base q8_05078 MB247 MB57 ms69.0%91.2%54.2%92.4%90.1%88.6%38.4%35.5%
Base q5_15057 MB219 MB57 ms68.7%90.3%54.3%90.8%89.9%88.9%37.6%36.4%
Tiny full5075 MB224 MB57 ms58.0%87.0%38.7%87.6%86.4%85.0%14.2%16.9%
Tiny q8_05042 MB174 MB57 ms57.6%87.0%38.0%87.2%86.8%85.0%11.2%17.9%
Tiny q5_15031 MB157 MB58 ms56.5%86.5%36.5%87.1%86.0%83.5%14.4%11.7%
Base q5_1 en5057 MB217 MB58 ms33.7%92.4%-5.3%92.7%92.1%3.8%-9.6%-10.2%
Base q8_0 en5078 MB247 MB58 ms33.7%92.1%-5.2%92.4%91.8%3.2%-7.9%-10.9%
Base full en50142 MB333 MB59 ms33.0%92.4%-6.7%93.1%91.8%0.8%-7.0%-13.8%
Medium full en501.5 GB2.1 GB116 ms32.1%94.6%-9.6%95.7%93.5%10.8%-13.9%-25.6%
Small full en50465 MB807 MB61 ms32.0%94.0%-9.4%95.3%92.8%9.0%-15.3%-21.9%
Small q8_0 en50252 MB558 MB60 ms31.2%94.0%-10.7%95.3%92.7%10.1%-19.3%-23.0%
Medium q8_0 en50766 MB1.4 GB103 ms31.0%94.3%-11.2%95.7%93.0%10.0%-16.3%-27.4%
Medium q5_0 en50514 MB1.1 GB96 ms30.9%94.4%-11.4%95.7%93.1%9.2%-11.4%-32.0%
Small q5_1 en50181 MB475 MB62 ms30.1%94.2%-12.6%95.4%93.0%5.7%-17.2%-26.3%
Tiny full en5075 MB224 MB59 ms26.5%88.4%-14.8%88.9%87.8%-1.2%-18.7%-24.4%
Tiny q5_1 en5031 MB157 MB59 ms25.2%88.2%-16.7%88.8%87.5%-1.0%-26.0%-23.2%
Tiny q8_0 en5042 MB173 MB58 ms24.7%88.6%-17.9%89.1%88.1%-2.3%-20.7%-30.7%

At a glance

Ratings computed from benchmark data, scaled 1 to 10. Accuracy is based on Word Error Rate (WER) and does not include punctuation yet.

Name Lang Translate Speed Accuracy
Large V3all510
Large V3 Turbo q5_0all79
Large V3 q5_0all69
Parakeet TDT v325109
Large V1all59
Large V2all59
Large V2 q5_0all69
Large V2 q8_0all69
Large V3 Turboall79
Large V3 Turbo q8_0all79
Medium fullall79
Medium q5_0all79
Medium q8_0all79
Small q5_1all87
Small fullall87
Small q8_0all87
Base fullall95
Base q5_1all95
Base q8_0all95
Tiny fullall93
Tiny q8_0all93
Tiny q5_1all92
Base full enen81 (en: 9)
Base q5_1 enen91 (en: 9)
Base q8_0 enen91 (en: 9)
Medium full enen71 (en: 10)
Medium q5_0 enen81 (en: 10)
Medium q8_0 enen71 (en: 10)
Small full enen81 (en: 10)
Small q5_1 enen81 (en: 10)
Small q8_0 enen81 (en: 10)
Tiny full enen81 (en: 9)
Tiny q5_1 enen81 (en: 9)
Tiny q8_0 enen81 (en: 9)

Charts

Bar chart showing average English, multilingual, and overall accuracy per model.
Average accuracy by group
Bar chart comparing transcription speed across models and test conditions.
Speed comparison across conditions
Bar chart comparing model accuracy across English, Spanish, Danish, and Hungarian benchmark conditions.
Accuracy by model and test condition