Public beta of toolkit for developing machine learning for robots and games released

April 27, 2016

Make a three-dimensional bipedal robot walk forward as fast as possible, without falling over (credit: OpenAI Gym)

OpenAI (a non-profit AI research company sponsored by Elon Musk and others) has released the public beta of OpenAI Gym, a toolkit for developing and comparing algorithms for reinforcement learning (RL), a type of machine learning.

OpenAI Gym consists of a growing suite of environments (from simulated robots to Atari games), and a site for comparing and reproducing results. OpenAI Gym is compatible with algorithms written in any framework, such as Tensorflow and Theano. The environments are initially written in Python (other languages planned).

If you’d like to dive in right away, you can work through a tutorial, and you help out while learning by reproducing a result.

What is reinforcement learning?

Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It’s exciting for two reasons, according to OpenAI’s Greg Brockman and John Schulman:

  • RL is very general, encompassing all problems that involve making a sequence of decisions: for example, controlling a robot’s motors so that it’s able to run and jump, making business decisions like pricing and inventory management, or playing video games and board games. RL can even be applied to supervised learning problems with sequential or structured outputs.
  • RL algorithms have started to achieve good results in many difficult environments. RL has a long history, but until recent advances in deep learning, it required lots of problem-specific engineering. DeepMind’s Atari resultsBRETT from Pieter Abbeel’s group, and AlphaGo all used deep RL algorithms, which did not make too many assumptions about their environment, and thus can be applied in other settings.

However, RL research is also slowed down by two factors:

  • The need for better benchmarks. In supervised (human-managed) learning, progress has been driven by large labeled datasets like ImageNet. In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don’t have enough variety, and they are often difficult to even set up and use.
  • Lack of standardization of environments used in publications. Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.

OpenAI Gym is an attempt to fix both problems.

Partners include:

More information, including enviroments (Atari games, 2D and 3D robots, and toy text, for example) is available here.

“During the public beta, we’re looking for feedback on how to make this into an even better tool for research,” says the OpenAI team. “If you’d like to help, you can try your hand at improving the state-of-the-art on each environment, reproducing other people’s results, or even implementing your own environments. Also please join us in the community chat!

John Schulman | hopper