Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

Uni-Skill is a novel AI framework that enables robots to autonomously evolve their skill libraries through a closed-loop system called SkillFolder. This hierarchical knowledge engine, inspired by VerbNet, organizes robotic actions from high-level verbs to fine-grained trajectories using millions of annotated video demonstrations. The system achieves state-of-the-art performance with strong zero-shot generalization across novel tasks, representing a significant advancement toward general-purpose robotic agents.

Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

Uni-Skill: A Breakthrough Framework for Self-Evolving Robotic Skills

A new AI framework called Uni-Skill is poised to overcome a major bottleneck in robotic learning by enabling machines to autonomously evolve their own skill libraries. Developed by researchers and detailed in a new paper (arXiv:2603.02623v1), this unified, skill-centric system moves beyond static, pre-programmed capabilities, allowing robots to request, retrieve, and implement new skills on the fly when faced with novel tasks. This shift from manual, demonstration-heavy training to efficient, offline structural retrieval represents a significant leap toward adaptable and general-purpose robotic agents.

Traditional skill-centric methods, which often use Vision-Language Models (VLMs) for planning, are constrained by fixed libraries. When a robot encounters a task it cannot complete with its existing skills, the process typically halts, requiring human engineers to manually design and code a new capability. Uni-Skill breaks this cycle by introducing a closed-loop system where a planning module can identify missing skills and automatically request implementations, leading to a self-augmented skill library that grows and adapts over time.

SkillFolder: The Hierarchical Knowledge Engine

The core innovation enabling this automatic skill acquisition is SkillFolder, a novel, VerbNet-inspired repository constructed from large-scale, unstructured robotic videos. Unlike a simple database, SkillFolder organizes robotic knowledge using a hierarchical skill taxonomy that captures actions at multiple levels of abstraction—from high-level verbs like "pour" to fine-grained spatial trajectories.

This taxonomy is populated with millions of automatically annotated video demonstrations. When Uni-Skill's planner identifies a need for a new skill—for instance, "twist open a jar"—it queries SkillFolder. The system then retrieves relevant, structured examples that provide both semantic supervision for the behavior pattern and precise references for movement trajectories. This process eliminates the need for inefficient, task-specific manual annotation at deployment time, enabling few-shot skill inference directly from retrieved examples.

Superior Performance in Simulation and Reality

The research team validated Uni-Skill through comprehensive experiments in simulated and real-world robotic settings. The results demonstrate state-of-the-art performance compared to existing VLM-based skill-centric approaches. Uni-Skill exhibited advanced reasoning capabilities in compositional task planning and showed strong zero-shot generalization across a wide spectrum of novel tasks that were not present during its initial training phase.

This performance highlights the framework's practical viability. By decoupling skill planning from a fixed set of implementations and grounding new requests in a vast, structured repository of observed behavior, Uni-Skill provides a more robust and scalable pathway toward robots that can operate in dynamic, unstructured environments.

Why This Matters: The Future of Robotic Autonomy

The development of Uni-Skill marks a critical step forward in AI and robotics for several key reasons:

  • Eliminates Manual Bottlenecks: It shifts the paradigm from human-in-the-loop skill creation to automated, offline skill retrieval, dramatically accelerating a robot's ability to learn and adapt.
  • Enables Continuous Learning: The self-augmenting library allows robots to perpetually expand their capabilities without complete retraining, a cornerstone of long-term autonomy.
  • Improves Generalization: By leveraging a hierarchical taxonomy of skills learned from diverse video data, the system achieves superior zero-shot performance on unseen tasks, a holy grail in machine learning.
  • Provides a Scalable Architecture: The separation of the planning module (Uni-Skill) from the knowledge repository (SkillFolder) creates a flexible framework that can be improved by scaling up the video data or enhancing the planner independently.

In essence, Uni-Skill transforms the robot from a tool with a limited manual into an apprentice with a continuously expanding textbook, capable of looking up and learning how to perform new actions autonomously. This research lays a foundational architecture for the next generation of generalist robots.

常见问题