Home robots trained to execute home tasks at the factory have a new environment different from the training space, so they can scrub the sinks effectively and remove trash when deployed in the user’s kitchen. You may not be able to do it.
In order to avoid this, engineers often try to match the simulated training environment as close as possible to the real world where the agent is developed.
However, MIT and other researchers have found that, despite this conventional wisdom, training in completely different environments can create a higher -performance artificial intelligence agent.
Their results are trained in the same noisy world used to train AI agents simulated in a world with little uncertainty or “noise” and test both agents. It shows that it has enabled better performance than competing AI agents.
Researchers call this unexpected phenomenon the indoor training effect.
“If you learn tennis in an indoor environment without noise, you may be able to master various shots more easily. Then, if you move to a noisy environment like a windy tennis court, the wind. The reputation of an indoor training effect is more likely to be played better than if you start learning in a strong environment.
Researchers studied this phenomenon by training AI agents and playing ATARI games. They were surprised that the indoor training effect was consistently caused by ATARI games and games.
They want these results to promote additional research to develop a better training method for AI agents.
“This is a completely new axis to be considered. According to Spandan Madan, a graduate student at Harvard University, said:
For Bono and Madan, ISHAAN Grover, a graduate student of MIT, has joined the paper. He is a graduate student from Yale University, Professor of Cynthia Breazeal, Media Arts and Sciences, and the leader of Mit Media Lab’s personal robotics group. Hanspontator Pister, Professor of Harvard University Computer Science. Gabriel Climn, a professor at Harvard University School of Medicine. This study will be published by the Advanced Council of Artificial Intelligence.
Training trouble
Researchers have begun to explore why the reinforcement learning agent tends to have such a gloomy performance when tested in an environment different from the training space.
Enhanced learning is a trial and error method for agents to explore the training space and learn to execute actions that maximize rewards.
The team has developed a method that explicitly adds a certain amount of noise to one element of reinforcement learning, called a transition function. The transition function defines the probability of moving from the state where the agent is, based on the action to be selected.
If the agent is playing PAC-Man, the transition function may define the probability of moving to the ghost on the gameboard, lower, left, or right. In standard enhanced learning, AI is trained and tested using the same transition function.
Researchers added noise to transition functions with this conventional approach and hurt the agent’s PAC-Man performance as expected.
However, when researchers trained agents in noise noise and tested in an environment in which noise was injected into the transition function, they performed better than the agents trained in noisy games.
“The rule of thumb is that it is necessary to capture transition functions under deployment conditions and try to reach as much as possible during training. We could not believe in themselves. Madan really tested the insight to die. “
Injecting a variety of noise into transition functions, the researchers tested many environments, but did not create a real game. The more noise in the PAC-Man, the more likely the ghost will randomly teleport on various squares.
In order to confirm that the indoor training effect occurred in a normal PAC-Man game, they adjusted the fundamental probability, so the ghosts were moving normally, but they can move up and down instead of left or right. The sex was high. AI agents, trained in the noise -free environment, still performed excellent performance in these realistic games.
“It wasn’t just for adding noise to create an ad hoc environment. This seems to be the characteristic of reinforced learning, and it was even more surprising,” says Bonon. 。
Explanation of exploration
When researchers deeply dig deeper in search of explanations, they saw some correlations on how AI agents explore the training space.
When both AI agents explore almost the same area, it is easy for agents to learn the rules of the game without noise interference, improving the performance of agents trained in a non -noise environment.
If the search pattern is different, the performance of the trained agents in a noisy environment tends to improve. This can happen because the agent needs to understand patterns that cannot be learned in an environment without noise.
“You will not only learn to play tennis in a noise environment, but also play backhand even in a noisy environment, and you will not play in an environment without noise,” says Bonon. 。
In the future, researchers want to find out how indoor training effects will occur in more complex enhanced learning environments, or in other methods such as computer vision and natural language processing. We also want to build a training environment designed to utilize indoor training effects. This improves the performance of AI agents in uncertain environment.