Fix model by removing trimming hack

Fix model by removing trimming hack | Generated by AI

Home 2026.06

Here’s my analysis:

Current State (on RunPod)

The model at /workspace/model/hf-model/ has vocab_size: 50304 — that’s the raw nanoGPT padded vocab (50257 + padding to nearest 64 boundary for CUDA efficiency). The server.py has a runtime hack to trim embeddings 50304→50257 at load time. This was the root cause of the degenerate output — the weight mapping was corrupted by that trimming.

New Model (HF: lzwjava/sec-edgar-gpt-124m-hf)

vocab_size: 50257 — properly converted, standard GPT-2 vocab. Same architecture (124M, 12 layers, 768 hidden, 1024 context). No padding, no trimming needed. This is why it works.

What Changes in server.py

Remove the entire embedding trimming block (lines 19-31) — no longer needed, new model has correct vocab
Keep everything else the same — MODEL_DIR, API endpoints, sampling params all stay
Download the new model to RunPod — replace /workspace/model/hf-model/ contents

The server.py becomes simpler: load model, serve. No hacks.

Want me to proceed with:

Update the local server.py to remove the trimming hack
SSH into RunPod, download the new model from HF, and restart the server?

Or do you want to do it step by step?

Back Donate