Serving Setup Complete, Model Flawed | Generated by AI

Home 2026.06

Summary

✅ HuggingFace Upload — DONE

All files uploaded to https://huggingface.co/lzwjava/sec-edgar-gpt-124m. Used Aliyun HK relay for the 475MB model.safetensors (Mac→HK 30min SCP, HK→HF 12min upload). MD5 verified identical.

✅ RunPod Serving Infrastructure — DONE

⚠️ GGUF Conversion — Solved (but not needed now)

Major learnings about gguf Python library’s shape convention:

❌ Model Quality Issue — NOT a serving problem

The model produces degenerate output (“ivableivable…” loops) regardless of:

Top-1 next-token probabilities are flat (~0.01-0.002) with no clear winner, and the top tokens are suffix fragments (“em”, “ivable”, “igned”). This indicates the model didn’t converge properly during training despite the reported 2.28 validation loss. The 1-epoch, 124M-parameter training on RTX 4070 may have had issues (learning rate, data preprocessing, etc.).

What to do next

The serving infrastructure is ready — once you have a better-trained model checkpoint, just:

# On RunPod:
cd /workspace/model
# Replace hf-model/ with new checkpoint, then:
python3 server.py  # OpenAI-compatible API on :8080

Back Donate