ComputerRL Framework for Desktop Automation Agents

ComputerRL Framework for Desktop Automation Agents | Generated by AI

Home 2025.10

The “ComputerRL” paper is a recent research work titled “ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents”. (arXiv)

Here are the key ideas and contributions in a nutshell:

What is ComputerRL?

It’s a framework aimed at letting autonomous agents interact with desktop environments (apps, GUIs, APIs) like a human does—clicking, typing, using menus, etc.—but also via APIs when possible. (arXiv)
The idea is to combine both GUI-based interactions (for when API is not available or it’s more natural for a human) and programmatic API calls (more precise, robust, efficient) into what they call the API-GUI paradigm. (arXiv)

Why it’s different / what problems it solves

One of the big challenges in training RL agents for desktop / GUI tasks is inefficiency & instability when booting up lots of virtual machines / simulating environments for long periods. ComputerRL deals with scaling by running many parallel desktop environments. (arXiv)
Also, long RL runs tend to suffer entropy collapse (where the policy gets stuck being too deterministic too early, reducing exploration). They propose a training strategy called Entropulse, which alternates between reinforcement learning phases and supervised fine-tuning to keep the policy exploration alive. (arXiv)

Experiments & Results

They applied it to open models like GLM-4-9B-0414 and Qwen2.5-14B. (arXiv)
The evaluation benchmark is OSWorld, which tests agents in Ubuntu desktop-like environments. (arXiv)
Their agent “AutoGLM-OS-9B” (based on GLM-4-9B-0414) achieves a new state-of-the-art accuracy (~48.1%) in those desktop automation tasks. (arXiv)

If you want, I can send you a summary of how this might compare with, say, AgentBench or Anthropic’s agents (in terms of stability, TPS etc.), if you’re curious.

Back

openai/gpt-5

Donate