New AI Framework Tackles Multiple, Urgent Questions in 3D Environments
Researchers have introduced a novel challenge for AI agents: answering multiple, asynchronous questions with varying levels of urgency while navigating a 3D space. This new problem, termed Embodied Questions Answering (EQsA), moves beyond the classical single-question paradigm to better reflect real-world deployment needs. The team has also proposed ConEQsA, an agentic framework designed to tackle this challenge through concurrent scheduling and shared memory, and released a new benchmark called CAEQs for evaluation.
From Single to Multiple: The EQsA Challenge
Classical Embodied Question Answering (EQA) tasks an AI agent with answering one question by actively exploring a simulated environment, like a house. However, this single-threaded approach fails in scenarios where an agent, such as a home assistant robot, must handle a stream of questions—like "Where are my keys?" followed later by "Is the stove off?"—each with different priorities. The EQsA formulation explicitly models this realistic, multi-question workload where questions arrive asynchronously and carry human-annotated urgency labels.
To provide a standard testbed, the researchers created the Concurrent Asynchronous Embodied Questions (CAEQs) benchmark. It comprises 40 diverse indoor scenes with five questions each, totaling 200 question sets. This benchmark is crucial for fairly comparing different AI systems' abilities to manage concurrent inquiries, a significant step toward more practical embodied AI.
The ConEQsA Framework: Shared Memory and Priority Planning
The proposed solution, ConEQsA, is an agentic system built for efficiency and responsiveness. Its core innovation is a shared group memory that allows information gathered while answering one question to be reused for others, drastically reducing redundant exploration and saving time. Furthermore, it employs a priority-planning method that dynamically schedules which question to address next based on its urgency and the agent's current context and location.
This concurrent, urgency-aware approach allows the agent to interleave actions for different questions, making it far more efficient than naive sequential baselines. For instance, while moving to a location to answer a high-urgency query, the agent can simultaneously gather visual data relevant to a lower-priority question queued for later.
New Metrics for a New Problem: DAR and NUWL
Evaluating performance in the EQsA setting requires new metrics beyond simple accuracy. The researchers propose two key performance indicators. The Direct Answer Rate (DAR) measures the percentage of questions answered correctly without requiring new exploration, highlighting the system's ability to leverage its shared memory. The Normalized Urgency-Weighted Latency (NUWL) is a composite score that penalizes delays in answering urgent questions more heavily, ensuring the evaluation protocol rewards systems that correctly prioritize critical tasks.
Empirical results on the CAEQs benchmark demonstrate that ConEQsA consistently outperforms strong sequential baselines. The findings confirm that urgency-aware, concurrent scheduling is not just beneficial but essential for creating embodied agents that are both responsive and efficient under realistic conditions. The code for ConEQsA is publicly available for further research and development.
Why This Matters for the Future of Embodied AI
- Bridges the Simulation-to-Reality Gap: EQsA moves AI testing closer to real-world applications where agents must juggle multiple, time-sensitive requests, a critical step for deploying useful home or service robots.
- Introduces Critical Resource Management: The problem framework forces AI to consider not just "what" to answer, but "when" and "in what order," introducing essential concepts of computational and temporal resource allocation.
- Establishes a New Benchmark Standard: The CAEQs benchmark and the DAR/NUWL metrics provide the community with the first standardized tools to measure progress in multi-question embodied reasoning, enabling more meaningful comparisons between future AI systems.