AI Agent
自主智能体、AI助手、工具调用与规划推理等 AI Agent 领域前沿动态。
Mastercard’s AI payment demo points to agent-led commerce
A recent demonstration from Mastercard suggests that payment systems may be heading toward a future where software agent...
Deploying agentic finance AI for immediate business ROI
Agentic finance AI improves business efficiency and ROI only when deployed with strict governance and clear return on in...
Nokia and AWS pilot AI automation for real-time 5G network slicing
Telecom networks may soon begin adjusting themselves in real time, as operators test systems that allow AI agents to man...
Trace raises $3M to solve the AI agent adoption problem in enterprise
Trace is launching with $3 million in seed funding, including investment from Y Combinator, Zeno Ventures, Transpose Pla...
诺诚健华BCL2抑制剂联合奥布替尼3期临床完成患者入组
36氪获悉,生物医药高科技公司诺诚健华今天宣布,公司自主研发的新型BCL2抑制剂mesutoclax(ICP-248) 联合BTK抑制剂奥布替尼一线治疗慢性淋巴细胞白血病/小淋巴细胞淋巴瘤(CLL/SLL)的注册性III期临床试验已经完成患...
美媒:AI巨头将签署自主供电承诺
据美国阿克西奥斯新闻网站25日报道,美国多家技术巨头企业代表拟于下周前往白宫面见总统特朗普,其间将签署书面文件,承诺自行供应或购买人工智能(AI)数据中心所需电力。据报道,已有多家美国技术巨头承诺采取措施,避免消费者因人工智能技术发展而遭遇...
Anthropic acquires computer-use AI startup Vercept after Meta poached one of its founders
Seattle-based Vercept developed complex agentic tools, including a computer-use agent that could complete tasks inside a...
A Comparative Analysis of Social Network Topology in Reddit and Moltbook
arXiv:2602.13920v3 Announce Type: replace-cross Abstract: Recent advances in agent-mediated systems have enabled a new p...
Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks
arXiv:2602.05066v2 Announce Type: replace-cross Abstract: As AI agents automate critical workloads, they remain vulnerab...
Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation
arXiv:2602.02007v2 Announce Type: replace-cross Abstract: Agent memory systems often adopt the standard Retrieval-Augmen...
RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind
arXiv:2601.15715v3 Announce Type: replace-cross Abstract: Although artificial intelligence (AI) has become deeply integr...
Stabilizing Off-Policy Training for Long-Horizon LLM Agent via Turn-Level Importance Sampling and Clipping-Triggered Normalization
arXiv:2511.20718v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) algorithms such as PPO and GRPO ar...
SPACeR: Self-Play Anchoring with Centralized Reference Models
arXiv:2510.18060v2 Announce Type: replace-cross Abstract: Developing autonomous vehicles (AVs) requires not only safety ...
FML-bench: Benchmarking Machine Learning Agents for Scientific Research
arXiv:2510.10472v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have sparked growing interest in ...
ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale Inference
arXiv:2509.14537v2 Announce Type: replace-cross Abstract: Capturing professionals' decision-making in creative workflows...
Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management
arXiv:2401.12455v2 Announce Type: replace-cross Abstract: Life-cycle management of large-scale transportation systems re...
OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents
arXiv:2602.19439v2 Announce Type: replace Abstract: Supply chain optimization models frequently become infeasible becaus...
OR-Agent: Bridging Evolutionary Search and Structured Research for Automated Algorithm Discovery
arXiv:2602.13769v2 Announce Type: replace Abstract: Automating scientific discovery in complex, experiment-driven domain...
OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage
arXiv:2602.13477v2 Announce Type: replace Abstract: As Large Language Model (LLM) agents become more capable, their coor...
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
arXiv:2601.10402v4 Announce Type: replace Abstract: The advancement of artificial intelligence toward agentic science is...
InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis
arXiv:2507.14899v3 Announce Type: replace Abstract: Non-destructive testing (NDT), particularly X-ray inspection, is vit...
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
arXiv:2602.22190v1 Announce Type: cross Abstract: Open-source native GUI agents still lag behind closed-source systems o...
SWE-Prot\'eg\'e: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents
arXiv:2602.22124v1 Announce Type: cross Abstract: Small language models (SLMs) offer compelling advantages in cost, late...
Training Generalizable Collaborative Agents via Strategic Risk Aversion
arXiv:2602.21515v1 Announce Type: cross Abstract: Many emerging agentic paradigms require agents to collaborate with one...
Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG
arXiv:2602.21447v1 Announce Type: cross Abstract: Current stateless defences for multimodal agentic RAG fail to detect a...
The Headless Firm: How AI Reshapes Enterprise Boundaries
arXiv:2602.21401v1 Announce Type: cross Abstract: The boundary of the firm is determined by coordination cost. We argue ...
Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration
arXiv:2602.21368v1 Announce Type: cross Abstract: Given a black-box AI system and a task, at what confidence level can a...
A General Equilibrium Theory of Orchestrated AI Agent Systems
arXiv:2602.21255v1 Announce Type: cross Abstract: We establish a general equilibrium theory for systems of large languag...
AgenticTyper: Automated Typing of Legacy Software Projects Using Agentic AI
arXiv:2602.21251v1 Announce Type: cross Abstract: Legacy JavaScript systems lack type safety, making maintenance risky. ...
Budget-Aware Agentic Routing via Boundary-Guided Training
arXiv:2602.21227v1 Announce Type: cross Abstract: As large language models (LLMs) evolve into autonomous agents that exe...
Field-Theoretic Memory for AI Agents: Continuous Dynamics for Context Preservation
arXiv:2602.21220v1 Announce Type: cross Abstract: We present a memory system for AI agents that treats stored informatio...
2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support
arXiv:2602.21889v1 Announce Type: new Abstract: Across a growing number of fields, human decision making is supported by...
Power and Limitations of Aggregation in Compound AI Systems
arXiv:2602.21556v1 Announce Type: new Abstract: When designing compound AI systems, a common approach is to query multip...
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
arXiv:2602.21534v1 Announce Type: new Abstract: Agentic reinforcement learning (ARL) has rapidly gained attention as a p...
Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information
arXiv:2602.21496v1 Announce Type: new Abstract: While defenses for structured PII are mature, Large Language Models (LLM...
A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives
arXiv:2602.21351v1 Announce Type: new Abstract: The rapid accumulation of Earth science data has created a significant s...
Google and Samsung just launched the AI features Apple couldn’t with Siri
Google just announced that Gemini will soon be able to take care of some multistep tasks on your phone, like ordering fo...
OpenClaw creator’s advice to AI builders is to be more playful and allow yourself time to improve
Peter Steinberger talks about the creation of his viral AI agent OpenClaw and how being more "playful" makes for a bette...
遭谷歌制裁,OpenClaw创始人怒怼:Anthropic会先打电话,你们直接封号
编辑|泽南、杨文最近频频登上新闻头条的 OpenClaw,终于被「制裁」了一回。本周一,谷歌宣布限制部分开发者使用旗下 vibe Coding 平台 Antigravity,并指控他们「恶意使用」,此举在社交平台上引发了争议。 W...
The Metaphysics We Train: A Heideggerian Reading of Machine Learning
arXiv:2602.19028v2 Announce Type: replace-cross Abstract: This paper offers a phenomenological reading of contemporary m...
ST-EVO: Towards Generative Spatio-Temporal Evolution of Multi-Agent Communication Topologies
arXiv:2602.14681v3 Announce Type: replace-cross Abstract: LLM-powered Multi-Agent Systems (MAS) have emerged as an effec...
AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
arXiv:2602.07906v2 Announce Type: replace-cross Abstract: Autonomous Machine Learning Engineering (MLE) requires agents ...
Repurposing Synthetic Data for Fine-grained Search Agent Supervision
arXiv:2510.24694v2 Announce Type: replace-cross Abstract: LLM-based search agents are increasingly trained on entity-cen...
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
arXiv:2510.23587v2 Announce Type: replace-cross Abstract: The rapid advancement of large language models (LLMs) has spur...
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
arXiv:2510.22620v2 Announce Type: replace-cross Abstract: AI agents powered by large language models (LLMs) are being de...
Towards Scalable Oversight via Partitioned Human Supervision
arXiv:2510.22500v2 Announce Type: replace-cross Abstract: As artificial intelligence (AI) systems approach and surpass e...
Performance Asymmetry in Model-Based Reinforcement Learning
arXiv:2505.19698v3 Announce Type: replace-cross Abstract: Recently, Model-Based Reinforcement Learning (MBRL) have achie...
BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents
arXiv:2602.12876v2 Announce Type: replace Abstract: Multimodal large language models (MLLMs), equipped with increasingly...
STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
arXiv:2602.03022v2 Announce Type: replace Abstract: The proliferation of Large Language Models (LLMs) in function callin...
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
arXiv:2510.07172v3 Announce Type: replace Abstract: Large language models are emerging as powerful tools for scientific ...
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
arXiv:2509.25609v2 Announce Type: replace Abstract: Environments built for people are increasingly operated by a new cla...
DS-STAR: Data Science Agent for Solving Diverse Tasks across Heterogeneous Formats and Open-Ended Queries
arXiv:2509.21825v4 Announce Type: replace Abstract: While large language models (LLMs) have shown promise in automating ...
TASER: Table Agents for Schema-guided Extraction and Recommendation
arXiv:2508.13404v4 Announce Type: replace Abstract: Real-world financial filings report critical information about an en...
A Survey on the Optimization of Large Language Model-based Agents
arXiv:2503.12434v2 Announce Type: replace Abstract: With the rapid development of Large Language Models (LLMs), LLM-base...
"Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems
arXiv:2602.21127v1 Announce Type: cross Abstract: Large language model (LLM) agents are rapidly becoming trusted copilot...
Cooperative-Competitive Team Play of Real-World Craft Robots
arXiv:2602.21119v1 Announce Type: cross Abstract: Multi-agent deep Reinforcement Learning (RL) has made significant prog...
Toward an Agentic Infused Software Ecosystem
arXiv:2602.20979v1 Announce Type: cross Abstract: Fully leveraging the capabilities of AI agents in software development...
See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
arXiv:2602.20951v1 Announce Type: cross Abstract: Despite recent advances in diffusion models, AI generated images still...
Some Simple Economics of AGI
arXiv:2602.20946v1 Announce Type: cross Abstract: For millennia, human cognition was the primary engine of progress on E...
Airavat: An Agentic Framework for Internet Measurement
arXiv:2602.20924v1 Announce Type: cross Abstract: Internet measurement faces twin challenges: complex analyses require e...