AI Agent

自主智能体、AI助手、工具调用与规划推理等 AI Agent 领域前沿动态。

SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling
Agent

SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling

SaFeR is a novel AI framework for generating safety-critical autonomous driving test scenarios that balances adversarial...

SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling
Agent

SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling

SaFeR is a novel AI framework for generating safety-critical test scenarios for autonomous vehicles that balances advers...

Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters
Agent

Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters

The Sim2Sea framework enables successful zero-shot transfer of AI navigation policies from simulation to real-world auto...

Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters
Agent

Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters

The Sim2Sea framework enables zero-shot transfer of AI navigation policies from simulation to real-world maritime vessel...

Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters
Agent

Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters

Sim2Sea is a comprehensive framework that successfully bridges the simulation-to-reality gap for autonomous maritime nav...

Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters
Agent

Sim2Sea: Sim-to-Real Policy Transfer for Maritime Vessel Navigation in Congested Waters

Sim2Sea is a novel framework that successfully enables autonomous maritime navigation systems trained entirely in simula...

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Agent

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback

Researchers developed a novel framework for online Continual Reinforcement Learning (CRL) that enables robots to autonom...

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Agent

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback

Researchers have developed a novel Continual Reinforcement Learning (CRL) framework that enables autonomous robots to de...

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Agent

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback

Researchers have developed a novel Continual Reinforcement Learning framework that enables AI-powered robots to autonomo...

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Agent

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback

Researchers have developed a novel online continual reinforcement learning framework that enables autonomous robots to d...

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback
Agent

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback

Researchers developed a novel Continual Reinforcement Learning framework that enables robotic agents to autonomously det...

Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Agent

Right in Time: Reactive Reasoning in Regulated Traffic Spaces

Researchers have developed a reactive mission design framework combining Probabilistic Mission Design (ProMis) with Reac...

Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Agent

Right in Time: Reactive Reasoning in Regulated Traffic Spaces

Researchers developed a reactive mission design framework combining Probabilistic Mission Design (ProMis) with Reactive ...

Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Agent

Right in Time: Reactive Reasoning in Regulated Traffic Spaces

Researchers developed a novel reactive reasoning framework that enables autonomous vehicles like drones to perform real-...

Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Agent

Right in Time: Reactive Reasoning in Regulated Traffic Spaces

Researchers developed a reactive mission design framework combining Probabilistic Mission Design (ProMis) with Reactive ...

Right in Time: Reactive Reasoning in Regulated Traffic Spaces
Agent

Right in Time: Reactive Reasoning in Regulated Traffic Spaces

This research introduces a novel reactive framework combining Probabilistic Mission Design (ProMis) with Reactive Circui...

GIPO: Gaussian Importance Sampling Policy Optimization
Agent

GIPO: Gaussian Importance Sampling Policy Optimization

Gaussian Importance Sampling Policy Optimization (GIPO) is a novel reinforcement learning method that addresses data ine...

GIPO: Gaussian Importance Sampling Policy Optimization
Agent

GIPO: Gaussian Importance Sampling Policy Optimization

GIPO (Gaussian Importance Sampling Policy Optimization) is a novel reinforcement learning method that addresses data eff...

GIPO: Gaussian Importance Sampling Policy Optimization
Agent

GIPO: Gaussian Importance Sampling Policy Optimization

GIPO (Gaussian Importance Sampling Policy Optimization) is a novel reinforcement learning technique that addresses ineff...

GIPO: Gaussian Importance Sampling Policy Optimization
Agent

GIPO: Gaussian Importance Sampling Policy Optimization

Gaussian Importance Sampling Policy Optimization (GIPO) is a novel reinforcement learning technique that addresses data ...

GIPO: Gaussian Importance Sampling Policy Optimization
Agent

GIPO: Gaussian Importance Sampling Policy Optimization

GIPO (Gaussian Importance Sampling Policy Optimization) is a novel reinforcement learning technique that replaces PPO's ...

RVN-Bench: A Benchmark for Reactive Visual Navigation
Agent

RVN-Bench: A Benchmark for Reactive Visual Navigation

RVN-Bench (Reactive Visual Navigation Benchmark) is a standardized evaluation framework for safe, vision-based indoor ro...

RVN-Bench: A Benchmark for Reactive Visual Navigation
Agent

RVN-Bench: A Benchmark for Reactive Visual Navigation

RVN-Bench (Reactive Visual Navigation Benchmark) is a standardized evaluation framework for collision-aware indoor visua...

RVN-Bench: A Benchmark for Reactive Visual Navigation
Agent

RVN-Bench: A Benchmark for Reactive Visual Navigation

RVN-Bench is a new benchmark for reactive visual navigation that evaluates AI agents' ability to navigate unseen indoor ...

RVN-Bench: A Benchmark for Reactive Visual Navigation
Agent

RVN-Bench: A Benchmark for Reactive Visual Navigation

RVN-Bench is a new benchmark for reactive visual navigation that addresses collision avoidance in unseen indoor environm...

RVN-Bench: A Benchmark for Reactive Visual Navigation
Agent

RVN-Bench: A Benchmark for Reactive Visual Navigation

RVN-Bench (Reactive Visual Navigation Benchmark) is the first standardized benchmark for collision-aware indoor visual n...

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
Agent

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

A new study exposes critical flaws in AI role-playing agent evaluation, showing models rely on character names rather th...

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
Agent

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

A new study exposes critical flaws in AI role-playing agent evaluation, showing models like GPT-4 and Claude perform wel...

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
Agent

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

A new study reveals fundamental flaws in AI role-playing agent evaluation, showing models rely on pre-existing knowledge...

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
Agent

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

New research exposes a critical flaw in AI role-playing agent evaluation, showing models rely on character name recognit...

IROSA: Interactive Robot Skill Adaptation using Natural Language
Agent

IROSA: Interactive Robot Skill Adaptation using Natural Language

IROSA (Interactive Robot Skill Adaptation) is a novel framework that enables robots to adapt skills using natural langua...

IROSA: Interactive Robot Skill Adaptation using Natural Language
Agent

IROSA: Interactive Robot Skill Adaptation using Natural Language

IROSA is a novel framework that combines large language models with imitation learning to enable robots to adapt their s...

IROSA: Interactive Robot Skill Adaptation using Natural Language
Agent

IROSA: Interactive Robot Skill Adaptation using Natural Language

IROSA (Interactive Robot Skill Adaptation) is a novel framework that combines foundation models with imitation learning ...

IROSA: Interactive Robot Skill Adaptation using Natural Language
Agent

IROSA: Interactive Robot Skill Adaptation using Natural Language

IROSA (Interactive Robot Skill Adaptation) is a novel framework that enables robots to adapt skills through natural lang...

IROSA: Interactive Robot Skill Adaptation using Natural Language
Agent

IROSA: Interactive Robot Skill Adaptation using Natural Language

IROSA (Interactive Robot Skill Adaptation) is a novel framework that combines large language models with imitation learn...

On the Suitability of LLM-Driven Agents for Dark Pattern Audits
Agent

On the Suitability of LLM-Driven Agents for Dark Pattern Audits

Researchers have developed an LLM-driven auditing agent capable of autonomously navigating and evaluating 456 data broke...

On the Suitability of LLM-Driven Agents for Dark Pattern Audits
Agent

On the Suitability of LLM-Driven Agents for Dark Pattern Audits

A new study demonstrates that LLM-driven agents can autonomously audit websites for manipulative dark patterns, specific...

On the Suitability of LLM-Driven Agents for Dark Pattern Audits
Agent

On the Suitability of LLM-Driven Agents for Dark Pattern Audits

Researchers developed an LLM-driven agent to autonomously audit 456 data broker websites for dark patterns within CCPA d...

On the Suitability of LLM-Driven Agents for Dark Pattern Audits
Agent

On the Suitability of LLM-Driven Agents for Dark Pattern Audits

A new study demonstrates an LLM-driven agent designed to systematically audit 456 data broker websites for dark patterns...

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Agent

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

SWE-CI is a novel benchmark that evaluates AI agents' ability to manage long-term software evolution through Continuous ...

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Agent

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

SWE-CI is a novel benchmark for evaluating AI agents on long-term software maintenance within real development cycles. I...

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Agent

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

SWE-CI is a novel benchmark designed to evaluate AI agents on long-term software maintenance within real-world continuou...

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Agent

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

SWE-CI is a novel benchmark that evaluates AI agents on their ability to manage long-term software evolution through con...

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Agent

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

SWE-CI is a novel benchmark designed to evaluate AI-powered coding agents on their ability to manage long-term software ...

MACC: Multi-Agent Collaborative Competition for Scientific Exploration
Agent

MACC: Multi-Agent Collaborative Competition for Scientific Exploration

MACC (Multi-Agent Collaborative Competition) is a novel institutional architecture designed to automate scientific disco...

MACC: Multi-Agent Collaborative Competition for Scientific Exploration
Agent

MACC: Multi-Agent Collaborative Competition for Scientific Exploration

The MACC (Multi-Agent Collaborative Competition) framework is a novel institutional architecture designed to study how A...

MACC: Multi-Agent Collaborative Competition for Scientific Exploration
Agent

MACC: Multi-Agent Collaborative Competition for Scientific Exploration

The MACC (Multi-Agent Collaborative Competition) framework is a novel institutional architecture for multi-agent AI syst...

MACC: Multi-Agent Collaborative Competition for Scientific Exploration
Agent

MACC: Multi-Agent Collaborative Competition for Scientific Exploration

The MACC (Multi-Agent Collaborative Competition) framework is a novel institutional architecture designed to study how A...

MACC: Multi-Agent Collaborative Competition for Scientific Exploration
Agent

MACC: Multi-Agent Collaborative Competition for Scientific Exploration

Researchers have introduced the MACC (Multi-Agent Collaborative Competition) framework, a novel institutional architectu...

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Agent

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

Researchers have introduced a novel three-layer cognition-to-control (C2C) architecture to bridge the gap between high-l...

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Agent

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

The cognition-to-control (C2C) framework is a three-layer AI architecture designed for human-robot collaboration, integr...

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Agent

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

Researchers have developed a novel hierarchical AI architecture called cognition-to-control (C2C) that enables sophistic...

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Agent

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

Researchers have developed a novel three-layer cognition-to-control (C2C) architecture that enables sophisticated human-...

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Agent

Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

Researchers developed the Cognition-to-Control (C2C) framework, a three-layer architecture that explicitly bridges high-...

Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Agent

Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling

Researchers developed ALTERNATING-MARL, a novel multi-agent reinforcement learning framework for large-scale systems whe...

Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Agent

Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling

Researchers from Stanford University and Google DeepMind developed ALTERNATING-MARL, a novel multi-agent reinforcement l...

Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Agent

Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling

Researchers developed ALTERNATING-MARL, a novel algorithmic framework for cooperative multi-agent reinforcement learning...

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing
Agent

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing

The arXiv paper 'Networking Foundations for Agentic Peer-to-Peer Networks' proposes a new architecture for Client-Side A...

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing
Agent

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing

Agentic Peer-to-Peer Networks represent a paradigm shift from cloud-based AI to persistent local agents that exchange dy...

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing
Agent

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing

This research paper introduces the first formal networking framework for Agentic Peer-to-Peer Networks, where Client-Sid...