This might be pretty big. One of my biggest frustrations with smaller models (especially MoE) is their failure to track workflow state at a high level. I'm constantly reminding them what we decided on or asking them to revisit, and reminding them eats context.
Seems like this might make that a lot less painful. And if not off the bat, with some minimal tuning or even just good prompting.
I'm a fan of this direction. For me the most interesting use case for these world models isn't even training, it's verification. If this thing or some idealized version of it can actually reliably simulate state transitions, could you use it to verify an agent's execution path against hard constraints and replace/eclipse LLMs-as-a-judge?
A regular LLM acts as a "policy," mapping a current state to a specific action (states → actions). Their new LLM acts as a "world model," mapping a current state and a chosen action to a predicted future state ((states, actions) → subsequent states). Instead of deciding "what to do," its explicit objective is to predict the exact environment observation that will result from the interaction history and the agent's current action.
I assumed at first that it was trained on synthetic data, but they actually went and deployed real physical hosts and virtual machines (e.g. Ubuntu, macOS, and Android) and browsers. They ran agentic systems on these continuously and recorded the actual, real-world interactions.
So it's an LLM that infers next state, or outcome,as structured data e.g. literal HTML code, UI view hierarchies, or accessibility trees.
> Figure 1: Overview of Qwen-AgentWorld. Top: Qwen-AgentWorld is a unified native language world
model across seven domains. Bottom: We explore two complementary strategies for applying world
modeling to enhance language agents (mainly using the 35B-A3B model as agent): Decouple and Unify ,
where the world model serves as the environment simulator and agent foundation model, respectively.
The bars above the label "Infinite Real-World Envs" show growth for example from approx 42 to 55 but the red label says "+7.1". It's wrong for all of them.
Seems like this might make that a lot less painful. And if not off the bat, with some minimal tuning or even just good prompting.
I assumed at first that it was trained on synthetic data, but they actually went and deployed real physical hosts and virtual machines (e.g. Ubuntu, macOS, and Android) and browsers. They ran agentic systems on these continuously and recorded the actual, real-world interactions.
So it's an LLM that infers next state, or outcome,as structured data e.g. literal HTML code, UI view hierarchies, or accessibility trees.
> Figure 1: Overview of Qwen-AgentWorld. Top: Qwen-AgentWorld is a unified native language world model across seven domains. Bottom: We explore two complementary strategies for applying world modeling to enhance language agents (mainly using the 35B-A3B model as agent): Decouple and Unify , where the world model serves as the environment simulator and agent foundation model, respectively.
Where is the mistake?
The bars above the label "Infinite Real-World Envs" show growth for example from approx 42 to 55 but the red label says "+7.1". It's wrong for all of them.
https://github.com/QwenLM/Qwen-AgentWorld
https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B