AI Agent
自主智能体、AI助手、工具调用与规划推理等 AI Agent 领域前沿动态。
SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling
SaFeR is a novel AI framework for generating safety-critical autonomous driving test scenarios that balances adversarial...
SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling
SaFeR is a novel AI framework for generating safety-critical test scenarios for autonomous vehicles that balances advers...
Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters
The Sim2Sea framework enables successful zero-shot transfer of AI navigation policies from simulation to real-world auto...
Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters
The Sim2Sea framework enables zero-shot transfer of AI navigation policies from simulation to real-world maritime vessel...
Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters
Sim2Sea is a comprehensive framework that successfully bridges the simulation-to-reality gap for autonomous maritime nav...
Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters
Sim2Sea is a novel framework that successfully enables autonomous maritime navigation systems trained entirely in simula...
Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Researchers developed a novel framework for online Continual Reinforcement Learning (CRL) that enables robots to autonom...
Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Researchers have developed a novel Continual Reinforcement Learning (CRL) framework that enables autonomous robots to de...
Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Researchers have developed a novel Continual Reinforcement Learning framework that enables AI-powered robots to autonomo...
Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Researchers have developed a novel online continual reinforcement learning framework that enables autonomous robots to d...
Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Researchers developed a novel Continual Reinforcement Learning framework that enables robotic agents to autonomously det...
Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Researchers have developed a reactive mission design framework combining Probabilistic Mission Design (ProMis) with Reac...
Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Researchers developed a reactive mission design framework combining Probabilistic Mission Design (ProMis) with Reactive ...
Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Researchers developed a novel reactive reasoning framework that enables autonomous vehicles like drones to perform real-...
Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Researchers developed a reactive mission design framework combining Probabilistic Mission Design (ProMis) with Reactive ...
Right in Time: Reactive Reasoning in Regulated Traffic Spaces
This research introduces a novel reactive framework combining Probabilistic Mission Design (ProMis) with Reactive Circui...
GIPO: Gaussian Importance Sampling Policy Optimization
Gaussian Importance Sampling Policy Optimization (GIPO) is a novel reinforcement learning method that addresses data ine...
GIPO: Gaussian Importance Sampling Policy Optimization
GIPO (Gaussian Importance Sampling Policy Optimization) is a novel reinforcement learning method that addresses data eff...
GIPO: Gaussian Importance Sampling Policy Optimization
GIPO (Gaussian Importance Sampling Policy Optimization) is a novel reinforcement learning technique that addresses ineff...
GIPO: Gaussian Importance Sampling Policy Optimization
Gaussian Importance Sampling Policy Optimization (GIPO) is a novel reinforcement learning technique that addresses data ...
GIPO: Gaussian Importance Sampling Policy Optimization
GIPO (Gaussian Importance Sampling Policy Optimization) is a novel reinforcement learning technique that replaces PPO's ...
RVN-Bench: A Benchmark for Reactive Visual Navigation
RVN-Bench (Reactive Visual Navigation Benchmark) is a standardized evaluation framework for safe, vision-based indoor ro...
RVN-Bench: A Benchmark for Reactive Visual Navigation
RVN-Bench (Reactive Visual Navigation Benchmark) is a standardized evaluation framework for collision-aware indoor visua...
RVN-Bench: A Benchmark for Reactive Visual Navigation
RVN-Bench is a new benchmark for reactive visual navigation that evaluates AI agents' ability to navigate unseen indoor ...
RVN-Bench: A Benchmark for Reactive Visual Navigation
RVN-Bench is a new benchmark for reactive visual navigation that addresses collision avoidance in unseen indoor environm...
RVN-Bench: A Benchmark for Reactive Visual Navigation
RVN-Bench (Reactive Visual Navigation Benchmark) is the first standardized benchmark for collision-aware indoor visual n...
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
A new study exposes critical flaws in AI role-playing agent evaluation, showing models rely on character names rather th...
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
A new study exposes critical flaws in AI role-playing agent evaluation, showing models like GPT-4 and Claude perform wel...
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
A new study reveals fundamental flaws in AI role-playing agent evaluation, showing models rely on pre-existing knowledge...
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
New research exposes a critical flaw in AI role-playing agent evaluation, showing models rely on character name recognit...
IROSA: Interactive Robot Skill Adaptation using Natural Language
IROSA (Interactive Robot Skill Adaptation) is a novel framework that enables robots to adapt skills using natural langua...
IROSA: Interactive Robot Skill Adaptation using Natural Language
IROSA is a novel framework that combines large language models with imitation learning to enable robots to adapt their s...
IROSA: Interactive Robot Skill Adaptation using Natural Language
IROSA (Interactive Robot Skill Adaptation) is a novel framework that combines foundation models with imitation learning ...
IROSA: Interactive Robot Skill Adaptation using Natural Language
IROSA (Interactive Robot Skill Adaptation) is a novel framework that enables robots to adapt skills through natural lang...
IROSA: Interactive Robot Skill Adaptation using Natural Language
IROSA (Interactive Robot Skill Adaptation) is a novel framework that combines large language models with imitation learn...
On the Suitability of LLM-Driven Agents for Dark Pattern Audits
Researchers have developed an LLM-driven auditing agent capable of autonomously navigating and evaluating 456 data broke...
On the Suitability of LLM-Driven Agents for Dark Pattern Audits
A new study demonstrates that LLM-driven agents can autonomously audit websites for manipulative dark patterns, specific...
On the Suitability of LLM-Driven Agents for Dark Pattern Audits
Researchers developed an LLM-driven agent to autonomously audit 456 data broker websites for dark patterns within CCPA d...
On the Suitability of LLM-Driven Agents for Dark Pattern Audits
A new study demonstrates an LLM-driven agent designed to systematically audit 456 data broker websites for dark patterns...
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
SWE-CI is a novel benchmark that evaluates AI agents' ability to manage long-term software evolution through Continuous ...
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
SWE-CI is a novel benchmark for evaluating AI agents on long-term software maintenance within real development cycles. I...
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
SWE-CI is a novel benchmark designed to evaluate AI agents on long-term software maintenance within real-world continuou...
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
SWE-CI is a novel benchmark that evaluates AI agents on their ability to manage long-term software evolution through con...
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
SWE-CI is a novel benchmark designed to evaluate AI-powered coding agents on their ability to manage long-term software ...
MACC: Multi-Agent Collaborative Competition for Scientific Exploration
MACC (Multi-Agent Collaborative Competition) is a novel institutional architecture designed to automate scientific disco...
MACC: Multi-Agent Collaborative Competition for Scientific Exploration
The MACC (Multi-Agent Collaborative Competition) framework is a novel institutional architecture designed to study how A...
MACC: Multi-Agent Collaborative Competition for Scientific Exploration
The MACC (Multi-Agent Collaborative Competition) framework is a novel institutional architecture for multi-agent AI syst...
MACC: Multi-Agent Collaborative Competition for Scientific Exploration
The MACC (Multi-Agent Collaborative Competition) framework is a novel institutional architecture designed to study how A...
MACC: Multi-Agent Collaborative Competition for Scientific Exploration
Researchers have introduced the MACC (Multi-Agent Collaborative Competition) framework, a novel institutional architectu...
Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Researchers have introduced a novel three-layer cognition-to-control (C2C) architecture to bridge the gap between high-l...
Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
The cognition-to-control (C2C) framework is a three-layer AI architecture designed for human-robot collaboration, integr...
Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Researchers have developed a novel hierarchical AI architecture called cognition-to-control (C2C) that enables sophistic...
Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Researchers have developed a novel three-layer cognition-to-control (C2C) architecture that enables sophisticated human-...
Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Researchers developed the Cognition-to-Control (C2C) framework, a three-layer architecture that explicitly bridges high-...
Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Researchers developed ALTERNATING-MARL, a novel multi-agent reinforcement learning framework for large-scale systems whe...
Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Researchers from Stanford University and Google DeepMind developed ALTERNATING-MARL, a novel multi-agent reinforcement l...
Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Researchers developed ALTERNATING-MARL, a novel algorithmic framework for cooperative multi-agent reinforcement learning...
Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing
The arXiv paper 'Networking Foundations for Agentic Peer-to-Peer Networks' proposes a new architecture for Client-Side A...
Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing
Agentic Peer-to-Peer Networks represent a paradigm shift from cloud-based AI to persistent local agents that exchange dy...
Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing
This research paper introduces the first formal networking framework for Agentic Peer-to-Peer Networks, where Client-Sid...