train/mfu metric incorrectly zero | Generated by AI
MLflow DB is healthy. Current state:
2 runs in the database:
1. RUNNING (started Jun 10 ~20:37 UTC)
2. FINISHED (started Jun 10 ~14:31 UTC)
Latest metrics at step 700:
train/loss = 3.707
train/tok_per_sec = 55,780
train/mfu = 0.0 <-- this is wrong, should be non-zero
train/dt = 1.175s per step
total_train_time = 810s (~13.5 min)
The DB is live and updating. The RUNNING run is actively logging. What’s the issue you’re seeing? Is it:
- The UI can’t connect to this DB?
- The
train/mfu = 0.0looks wrong? - Something else?