Next Embedding Prediction Makes World Models Stronger

NE-Dreamer is a new model-based reinforcement learning agent that uses temporal transformers to predict future encoder embeddings directly, bypassing pixel reconstruction. This decoder-free approach achieves superior performance on DeepMind Control Suite and DMLab benchmarks, particularly in partially observable environments requiring complex memory and spatial reasoning. The method establishes a scalable framework for advanced MBRL by focusing on temporal predictive alignment in latent space.

Next Embedding Prediction Makes World Models Stronger

NE-Dreamer: A New Model-Based Reinforcement Learning Agent Achieves State-of-the-Art Performance

A new model-based reinforcement learning (MBRL) agent, NE-Dreamer, has been introduced, demonstrating superior performance in complex, partially observable environments. The agent's core innovation is its use of a temporal transformer to predict future encoder embeddings directly, bypassing traditional pixel reconstruction and focusing on temporal predictive alignment in the latent space. This decoder-free approach allows the model to learn coherent and predictive state representations more efficiently, establishing a new scalable framework for advanced MBRL.

Core Methodology: Next-Embedding Prediction with Temporal Transformers

Unlike previous MBRL agents that rely on decoding latent states back to image pixels, NE-Dreamer operates in a decoder-free paradigm. Its architecture centers on a temporal transformer that processes sequences of latent states to predict the next-step encoder embeddings. This method directly optimizes for temporal consistency and predictive power within the representation space itself.

By eliminating the need for reconstruction losses or auxiliary supervision tasks, the model dedicates its capacity solely to learning the dynamics of the environment. This focus on temporal predictive alignment is critical for handling partial observability, where understanding the sequence of states is more important than reconstructing any single observation.

Benchmark Performance and Results

Researchers evaluated NE-Dreamer on two challenging benchmark suites. On the DeepMind Control Suite, a standard for continuous control, the agent matched or exceeded the performance of the established DreamerV3 and other leading decoder-free agents.

More impressively, on a demanding subset of DMLab tasks that require complex memory and spatial reasoning, NE-Dreamer achieved substantial performance gains. These results, detailed in the preprint arXiv:2603.02765v1, underscore the agent's strength in environments where long-term temporal dependencies are key to success.

Why This Matters for AI and Robotics

The success of NE-Dreamer represents a significant shift in how agents can learn world models. Its performance validates a more streamlined, representation-focused approach to model-based RL.

  • Scalability for Complex Tasks: The decoder-free, temporal transformer framework is highly scalable, proving effective in high-dimensional, partially observable domains common in real-world robotics and AI.
  • Efficient Representation Learning: By directly optimizing latent states for prediction, the model learns more efficiently than methods burdened by pixel-level reconstruction, a potential pathway to more sample-efficient AI.
  • Advancing Embodied AI: Superior performance on memory and reasoning tasks in DMLab indicates progress toward agents that can plan and act in complex, human-like environments, a core challenge for embodied intelligence.

The introduction of NE-Dreamer establishes next-embedding prediction as a powerful and effective principle, setting a new direction for research in model-based reinforcement learning.

常见问题