Controlling RNN Text Generation Temperature | Generated by AI

Home 2025.11

Understanding Temperature in RNN Sampling

In recurrent neural networks (RNNs) for text generation—like those Andrej Karpathy explores in his famous blog post on character-level language models—temperature is a hyperparameter that tweaks how “random” or “predictable” the generated text feels. It’s applied during the sampling step, where the RNN predicts the next character (or token) based on the previous ones. Without temperature control, generation can be too rigid (always picking the most likely next character, leading to boring loops) or too wild (pure randomness). Temperature strikes a balance by softening the model’s probability distribution over possible next characters.

Quick Math Behind It

The RNN outputs logits (raw, unnormalized scores) for each possible next character. These get turned into probabilities using the softmax function:

\[ p_i = \frac{\exp(\text{logit}_i / T)}{\sum_j \exp(\text{logit}_j / T)} \]

This sampling happens iteratively: feed the chosen character back as input, predict the next, and repeat to generate a sequence.

Low Temperature: Repetitive but Safe

High Temperature: Creative but Erratic

Why It Matters in RNNs

RNNs (or LSTMs in Karpathy’s setup) are autoregressive: each prediction builds on the last, so small biases in sampling compound over long sequences. Temperature lets you tune exploration vs. exploitation:

In essence, temperature isn’t “learning” anything new; it’s just dialing the RNN’s creativity knob during inference. Low keeps it chill and predictable; high unleashes the chaos.

The Unreasonable Effectiveness of Recurrent Neural Networks


Back

x-ai/grok-4-fast

Donate