In a significant breakthrough, researchers at the University of Texas have developed an innovative system to enhance the object recognition capabilities of robots. The cutting-edge technology, developed by a team of computer scientists at UT Dallas, allows robots to interact with objects multiple times, collecting a sequence of images that aids in object recognition. This represents a significant departure from previous methods which relied on a single interaction for object recognition.
The research was recently presented at the Robotics: Science and Systems conference held in Daegu, South Korea. The conference is a prestigious platform for showcasing advancements in robotics, with papers selected based on novelty, technical quality, potential impact, and clarity.
While we may still be some distance away from having robots cook dinner or clean up after us, this research is a significant step towards enhancing the ability of robots to identify and remember objects. Dr. Yu Xiang, senior author of the paper and assistant professor of computer science in the Erik Jonsson School of Engineering and Computer Science, emphasizes the importance of this ability for practical applications.
“In order for a robot to perform tasks such as fetching a mug or a bottle of water, it needs to be able to recognize these objects,” explains Xiang. The technology developed by the UTD researchers is designed to help robots detect a wide variety of objects found in everyday environments, such as homes, and generalize similar versions of common items like water bottles that come in varied shapes or sizes.
The lab’s robot, named Ramp, is trained using toy packages of common foods such as spaghetti, ketchup, and carrots. Ramp is a mobile manipulator robot from Fetch Robotics, standing about 4 feet tall with a mechanical arm featuring seven joints and a two-fingered square “hand” for object manipulation.
The new system involves the robot interacting with each item 15 to 20 times, as opposed to previous methods that relied on a single interaction. This allows the robot to take more detailed images using its RGB-D camera, which includes a depth sensor, thereby reducing the potential for errors. This process of recognizing, differentiating, and remembering objects is known as segmentation, a critical function for robots to perform tasks effectively.
“To the best of our knowledge, this is the first system that leverages long-term robot interaction for object segmentation,” says Xiang. This approach mimics how children learn to interact with toys and helps improve the algorithm that guides the robot’s decision-making process.
Ninad Khargonkar, a computer science doctoral student involved in the project, describes the real-world application of the algorithm as a key learning experience. “It’s one thing to develop an algorithm and test it on an abstract data set; it’s another thing to test it out on real-world tasks,” Khargonkar said.
The next phase for the researchers is to enhance other functions such as planning and control, which could enable tasks like sorting recycled materials. The research team also included computer science graduate student Yangxiao Lu; computer science seniors Zesheng Xu and Charles Averill; Kamalesh Palanisamy MS’23; Dr. Yunhui Guo, assistant professor of computer science; and Dr. Nicholas Ruozzi, associate professor of computer science. Dr. Kaiyu Hang from Rice University also participated in the project.
This groundbreaking research was supported in part by the Defense Advanced Research Projects Agency as part of its Perceptually-enabled Task Guidance program. This program focuses on developing AI technologies that can help users perform complex physical tasks by providing task guidance with augmented reality to expand their skill sets and reduce errors. This development is exciting news for everyone involved in electronics and programming languages, as it signals significant advancements in coding for artificial intelligence and robotics.