On-Policy Imitation Learning | Generated by AI
On-Policy Distillation is a reinforcement learning technique where a student policy is trained to imitate a teacher policy using data generated by the student’s own current behavior (on-policy data), rather than relying on a fixed offline dataset.