ImageNet Classification Advances in 2025 | Generated by AI

Home 2025.11

The State-of-the-Art (SOTA) on ImageNet (specifically the ImageNet-1K validation set for image classification) has continued to advance steadily since the dramatic improvements post-2012. As of November 2025, the leading models achieve top-1 accuracies around 91%, corresponding to a top-1 error rate of ~9%. Top-5 accuracies are even higher, typically exceeding 99%, for a top-5 error rate below 1%.

Key SOTA Models (Top-5 from Papers With Code Leaderboard)

Here’s a snapshot of the current top performers (fine-tuned on ImageNet-1K), based on top-1 accuracy. Top-5 accuracies aren’t always explicitly re-reported for these very high-performing models (as they saturate near-perfect levels), but cross-referencing with similar recent architectures suggests top-5 errors under 1% for all:

Rank Model Top-1 Accuracy Est. Top-5 Accuracy Parameters Notes
1 CoCa (finetuned) 91.0% (9.0% error) ~99.5% (<0.5% error) 2.1B Multimodal image-text model; excels in zero-shot (86.3% top-1) and frozen-encoder settings (90.6% top-1).
2 Model Soups (BASIC-L) 90.98% (9.02% error) ~99.4% (<0.6% error) ~1B Ensemble averaging of fine-tuned models for improved robustness.
3 Model Soups (ViT-G/14) 90.94% (9.06% error) ~99.4% (<0.6% error) 1.8B ViT-based; strong generalization to out-of-distribution data.
4 DaViT-Giant 90.4% (9.6% error) ~99.3% (<0.7% error) 1.4B Dual-attention ViT; trained on 1.5B image-text pairs.
5 ConvNeXt V2-Huge 88.9% (11.1% error) ~99.0% (~1.0% error) 660M CNN revival with masked autoencoder pretraining; efficient for edge devices.

Clarification on the “<3% Today”

Progress has slowed in recent years due to dataset saturation—gains now come from massive pretraining (e.g., on ImageNet-21K or web-scale data) and techniques like model ensembling or distillation. Human expert performance is estimated at ~5% top-1 error (or 2-12% top-5, varying by study), so AI has surpassed humans on this benchmark.

ImageNet Benchmark (Papers With Code)
Image Classification: State-of-the-Art Models in 2025 (HiringNet)
CoCa: Contrastive Captioners are Image-Text Foundation Models (arXiv)


Back

x-ai/grok-4-fast

Donate