
Factory floors, households and disaster-recovery sites may never be the same if a robotics project at think tank SRI is successful in getting machines to talk to humans.
The project would help robot teams better navigate unfamiliar environments — such as factories, homes and public spaces — through enhanced robot-to-robot communication.
“Before large language models, robots could move and perform tasks, but they couldn’t explain what they saw and how they did things,” says Han-Pang Chiu, technical director in SRI’s Vision and Robotics Laboratory, said in a Zoom interview on Tuesday. “We’re moving from command-based robotics to conversational collaboration. That’s a fundamental shift — and it’s going to change how we work with machines.”
Chiu’s Shared Understanding for Wide-Area human-robot Collaboration (SUWAC) project is attempting to apply emerging capabilities of Large Language Models (LLMs) to advance collaboration between humans and robots. He is equipping a team of robots with an LLM-based framework that lets researchers communicate with the robots in natural spoken or typed language.
SUWAC, he said, is the first use of an LLM for wide-area robotic search. It draws on state-of-the-art perception systems (based on technologies like LiDAR, stereoscopic vision, and object recognition) developed by SRI’s Center for Vision Technologies.
This could eventually lead to real-world uses such as teams of search-and-rescue robots that explore scenes of natural disasters, household robots whose owners can speak commands like “wash the dishes,” or factories of robots who chat as they work.
Case in point: In one demo, a canine-like quadruped robot (Robodog) and a wheeled robot (Roborover) are in a room that looks like a small high school theater. The researcher asks, “I don’t remember where I left my backpack and laptop. Can you two find them?” The robots exchange information about the room, and divide the work. The quadruped knows that the wheeled robot will find the stairs challenging and offers to search the dais. The wheeled robot offers to search the rest of the room. Soon, both the backpack and the laptop are found.
In another demo, a robot searches for a hiding human amid a large room stuffed with couches and chairs. Through common sense reasoning, the robot first looks behind furniture big enough to conceal a person rather than waste its time peeking behind a small chair. “Common sense is what separates humans from most AI,” Chiu said. “We aim to bring that ability into robotic planning.”
A key technology breakthrough, Chiu said, is a “3D scene graph,” a more efficient way of capturing and categorizing information than the data-rich “point clouds” often used in vision-based robotics. The scene graph, he said, provides a “missing link” for robots to categorize and label visual data in a way that’s easy to exchange with other robots and explain to human operators. Scene graphs are readily interpretable by LLMs, enabling the robots to understand what is nearby and what actions might be appropriate.
“Instead of relying on humans to describe every detail, we let the robot perceive, interpret, and explain its world in language,” Chiu said. “That saves time and allows for much more natural collaboration. It’s unlike anything before.”
SRI envisions teams of variously capable robots — on foot, on wheels or tracks, in the air, and even under the waves — being directed by humans from great distances using everyday language. Recently, SRI licensed the SUWAC technology to robotics startup Avsr AI with the aim of commercializing SUWAC’s novel capabilities.
“This gives robots a better understanding of what is around them; the possibilities are endless in hotels, hospitals, restaurants, manufacturing,” Avsr AI CEO Vikrant Tomar said in an interview. He expects to see the first applications of the robots within a year or so. One possible early use he foresees is a team of robots working within an assisted-living project, where they help residents find misplaced medication or personal items.
“Talking to a static machine is different from talking to a mobile robot,” Chiu said. “We’re demonstrating that generative AI, already adept at answering chat and voice prompts on computer systems, has a massive role to play in robotics and 3D navigation.”