ComputerRL Framework for Desktop Automation Agents | Generated by AI
The “ComputerRL” paper is a recent research work titled “ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents”. (arXiv)
Here are the key ideas and contributions in a nutshell:
What is ComputerRL?
- It’s a framework aimed at letting autonomous agents interact with desktop environments (apps, GUIs, APIs) like a human does—clicking, typing, using menus, etc.—but also via APIs when possible. (arXiv)
- The idea is to combine both GUI-based interactions (for when API is not available or it’s more natural for a human) and programmatic API calls (more precise, robust, efficient) into what they call the API-GUI paradigm. (arXiv)
Why it’s different / what problems it solves
- One of the big challenges in training RL agents for desktop / GUI tasks is inefficiency & instability when booting up lots of virtual machines / simulating environments for long periods. ComputerRL deals with scaling by running many parallel desktop environments. (arXiv)
- Also, long RL runs tend to suffer entropy collapse (where the policy gets stuck being too deterministic too early, reducing exploration). They propose a training strategy called Entropulse, which alternates between reinforcement learning phases and supervised fine-tuning to keep the policy exploration alive. (arXiv)
Experiments & Results
- They applied it to open models like GLM-4-9B-0414 and Qwen2.5-14B. (arXiv)
- The evaluation benchmark is OSWorld, which tests agents in Ubuntu desktop-like environments. (arXiv)
- Their agent “AutoGLM-OS-9B” (based on GLM-4-9B-0414) achieves a new state-of-the-art accuracy (~48.1%) in those desktop automation tasks. (arXiv)
If you want, I can send you a summary of how this might compare with, say, AgentBench or Anthropic’s agents (in terms of stability, TPS etc.), if you’re curious.