Comparing Google Cloud Speech-to-Text Models | Generated by AI

Home 2025.10

Here’s a clear breakdown of the differences between the various Speech-to-Text models in Google Cloud’s API V2—organized by model family and use case, without tables to keep it readable:


Chirp Family (Universal Speech Models - USM)

chirp

chirp_2

chirp_3


Legacy / General-Purpose Models

These are older architecture models maintained primarily for backward compatibility:


Summary of Model Differences by Use Case


Context from the Google Documentation

Google’s “Select a transcription model” page (updated September 4, 2025) includes all of the above model identifiers and descriptions. It also notes that the legacy models are not actively maintained but retained for backward compatibility.(Google Cloud)

And an earlier note about “latest_long” / “latest_short” refers to another set of conformer-based models—but these are outside the Chirp lineage and not relevant if you’re focused specifically on the Chirp or legacy models.(Google Cloud)


TL;DR—How to Pick:


Let me know if you’d like help choosing the right model based on your specific audio type, language, or need streaming vs batch processing!


Back

openai/gpt-5

Donate