We’re trying to train AI systems to make all kinds of meaningful decisions in fields ranging from robotics to medicine to political science. For example, AI systems can be used to intelligently control traffic in congested cities, helping drivers reach their destination faster while improving safety and sustainability.
Unfortunately, teaching AI systems to make good decisions is not an easy task.
The reinforcement learning models underlying these AI decision-making systems still often fail when faced with small differences in the tasks they were trained to perform. In the case of traffic, models can struggle to control a series of intersections with different speed limits, number of lanes, and traffic patterns.
To increase the reliability of reinforcement learning models for complex tasks with variability, researchers at MIT have introduced a more efficient algorithm for training reinforcement learning models.
The algorithm strategically selects the best tasks to train an AI agent to effectively perform all tasks within a set of related tasks. For traffic light control, each task is one intersection in a task space that includes all intersections in the city.
This method maximizes performance while keeping training costs low by focusing on the few intersections that contribute most to the overall effectiveness of the algorithm.
The researchers found that their method was 5 to 50 times more efficient than standard approaches on a range of simulated tasks. This increased efficiency allows algorithms to learn better solutions faster, ultimately improving the performance of AI agents.
“By thinking outside the box, we were able to see amazing performance gains with very simple algorithms. Less complex algorithms are easier to implement and easier for others to understand. therefore, they are more likely to be adopted by the community,” said lead author Kathy Wu, Thomas D. and Virginia W. Cabot Associate Professor of Career Development. He holds a PhD in Civil and Environmental Engineering (CEE) and the Institute for Data, Systems and Society (IDSS), and is a member of the Institute for Information and Decision Systems (LIDS).
CEE graduate student and lead author Jung-Hoon Cho also contributed to the paper. Vindula Jayawardana, graduate student in the Department of Electrical Engineering and Computer Science (EECS). and IDSS graduate student Shirui Li. This research will be presented at the Neural Information Processing Systems Conference.
find a compromise
To train algorithms that control traffic lights at many intersections in a city, engineers typically choose one of two main approaches. She can train one algorithm for each intersection individually using data from only that intersection, or she can train a larger algorithm using data from all intersections and apply it to each intersection. You can also.
However, each approach also has drawbacks. Training a separate algorithm for each task (such as a particular intersection) is a time-consuming process that requires huge amounts of data and computation, whereas training one algorithm for all tasks can improve performance. is often substandard.
Wu and his colleagues looked for a sweet spot between these two approaches.
Their method selects a subset of tasks and trains one algorithm for each task separately. The key is to strategically select individual tasks that are most likely to improve the overall performance of the algorithm for all tasks.
These leverage a common trick in the field of reinforcement learning called zero-shot transfer learning, where an already trained model is applied to a new task without further training. Using transfer learning, models often perform very well on new neighbor tasks.
“I know that it would be ideal to train on all tasks, but wouldn’t it be possible to see performance improvements if I trained on a subset of those tasks and then applied the results to all tasks?” I wondered if there was,” says Wu.
To identify which tasks to choose to maximize expected performance, researchers developed an algorithm called model-based transfer learning (MBTL).
The MBTL algorithm has two parts. One is to model how well each algorithm performs when trained independently on one task. Next, we model how much each algorithm degrades in performance when transferred to other tasks. This is a concept known as generalization performance.
By explicitly modeling generalization performance, MBTL can estimate the value of training on new tasks.
MBTL does this sequentially, first selecting the task that leads to the highest performance improvement, and then selecting additional tasks that yield the largest marginal improvement in overall performance.
Because MBTL focuses only on the most promising tasks, it can significantly improve the efficiency of the training process.
Reduce training costs
When the researchers tested the technology on simulated tasks such as controlling traffic lights, managing real-time speed advisories, and performing several classic control tasks, they found that it was faster than other methods. It was twice as efficient.
This means you can reach the same solution by training with much less data. For example, with a 50x increase in efficiency, the MBTL algorithm can be trained with just two tasks and achieve the same performance as a standard method using data from 100 tasks.
“From the point of view of the two main approaches, this means that either we didn’t need the data from the other 98 tasks, or that training on all 100 tasks confuses the algorithm, so it ends up performing worse than ours.” It means becoming,” says Wu.
With MBTL, even a small amount of additional training time can significantly improve performance.
In the future, the researchers plan to design MBTL algorithms that can be extended to more complex problems, such as higher-dimensional task spaces. They are also interested in applying their approach to real-world problems, especially next-generation mobility systems.
Funding for this research was provided in part by the National Science Foundation CAREER Award, the Sekii Educational Foundation Doctoral Scholarship Program, and the Amazon Robotics Doctoral Fellowship.