Recurrent Neural Networks Surprising Power | Generated by AI

Home 2025.11

This 2015 blog post by Andrej Karpathy is a classic introduction to Recurrent Neural Networks (RNNs), highlighting their surprising power in handling sequences like text generation. Karpathy shares his enthusiasm after training RNNs for tasks like image captioning, which produced coherent outputs with minimal tuning. He releases open-source code for character-level language models using LSTMs (a type of RNN) and demonstrates their “magic” through various text-generation experiments. Below is a structured summary of the key sections.

Introduction

Karpathy describes RNNs as “unreasonably effective” for sequential data, contrasting them with traditional feedforward networks that handle fixed-size inputs/outputs. He trains simple RNNs on text corpora to predict and generate characters, questioning how they capture language patterns so well. The post includes code on GitHub for replicating the demos.

Key Concepts: How RNNs Work

RNNs excel at sequences (e.g., sentences, videos) by maintaining an internal “state” (hidden vector) that carries information across time steps. Unlike static networks, they apply the same transformation repeatedly:

A simple Python snippet illustrates the step function:

def step(self, x):
    self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
    y = np.dot(self.W_hy, self.h)
    return y

Character-Level Language Modeling

The core example: Train on text to predict the next character (one-hot encoded), building probability distributions over a vocabulary (e.g., 65 chars for English). Generation works by sampling from predictions and feeding back. It learns context via recurrent connections—e.g., predicting ‘l’ after “hel” vs. “he”. Trained with mini-batch SGD and optimizers like RMSProp.

Demonstrations: RNN-Generated Text

All use the author’s char-rnn code on single text files, showing progression from gibberish to coherent output.

” Temperature” controls sampling: Low for repetitive but safe text; high for creative but erratic.

Challenges and Limitations

Conclusion

RNNs shine in NLP (translation, speech), vision (captioning), and beyond, evolving from nonsense to structured generation. Karpathy visualizes activations (e.g., neurons detecting quotes) and teases an RNN trained on the post itself, outputting meta-text like “I’ve the RNN with and works…”. He recommends further reading and code resources, predicting RNNs’ role in smarter AI.

The Unreasonable Effectiveness of Recurrent Neural Networks


Back

x-ai/grok-4-fast

Donate