Blog Post (Part III): Deep Reinforcement Learning with OpenAI Gym
Apr 27, 2016
Apr 27, 2016
This is part 3 of a blog series on deep reinforcement learning. See “Part 1: Demystifying Deep Reinforcement Learning” for an introduction to the topic and “Part 2: Deep Reinforcement Learning with Neon” for the original implementation in Simple-DQN.
In this blog post we will extend a Simple-DQN to work with OpenAI Gym, a new toolkit for developing and comparing reinforcement learning algorithms. Read more about the release on their blog. We will cover how to train and test an agent with the new environment using Neon.
Update: Code has been updated and is now at https://github.com/tambetm/simple_dqn.
Figure 1. Agent Environment Loop
OpenAI Gym provides a simple interface for interacting with the environment. Given an observation of previous state and reward, an agent chooses an action to perform on the environment to provide the next state and reward.
observation, reward, done, info = environment.step(action)
In our case, the environment is an Atari game, the observation is a game screen, and the reward is the score obtained from that action. Since OpenAI Gym uses a different interface (atari_py) to the Arcade Learning Environment (ALE), we can create a wrapper class, GymEnvironment, around the OpenAI Gym environment to work with the Simple-DQN training code. Before, Simple-DQN retrieved the screen and terminal state directly from the ALE environment after performing an action whereas the OpenAI Gym environment returns this data each time the agent acts on the environment. So we can instead store these variables as fields in our wrapper and use them as needed. Creating an environment also differs slightly in that we specify which game to use with an environment id such as “Breakout-v0” instead of loading directly from a rom file.
def __init__(self, env_id, args):
self.gym = gym.make(env_id)
self.obs = None
self.terminal = None
self.obs = None
self.terminal = None
def act(self, action):
self.obs, reward, self.terminal, _ = self.gym.step(action)
assert self.obs is not None
assert self.terminal is not None
To train with OpenAI Gym instead of ALE, we just specify the environment (OpenAI Gym or ALE) and the game. OpenAI Gym returns the full RGB screen (210, 160) that we then convert to grayscale and resize to (84, 84).
./train.sh Breakout-v0 –environment gym
This will train a model using the OpenAI Gym environment and save model snapshots every epoch.
To test a trained model on OpenAI Gym, we will first create a GymAgent that
def __init__(self, env, net, memory, args):
self.env = env
self.net = net
self.memory = memory
self.history_length = args.history_length
self.exploration_rate_test = args.exploration_rate_test
def add(self, observation):
self.memory[0, :-1] = self.memory[0, 1:]
self.memory[0, -1] = np.array(observation)
def get_action(self, t, observation):
if t &amp;amp;amp;lt; self.history_length or random.random() &amp;amp;amp;lt; self.exploration_rate_test:
action = env.action_space.sample()
qvalues = net.predict(memory)
action = np.argmax(qvalues)
Then we can simply instantiate the agent with the environment and saved model and call get_action during the test loop described here to find the optimal action to play during each time step.
agent = GymAgent(env, net, memory, args)
num_episodes = 10
for i_episode in xrange(num_episodes):
observation = env.reset()
for t in xrange(10000):
action = agent.get_action(t, observation)
observation, reward, done, info = env.step(action)
This code for testing is all in this script which can be run with
python src/test_gym.py Breakout-v0 <output_folder> –load_weights <saved_model_pkl>
This will log the testing results and record videos to the specified output_folder which we can then upload to OpenAI Gym for evaluation. It is also recommended to upload a gist describing how to reproduce your results.
Figure 2. Evaluation Results on OpenAI Gym
An example video of an agent playing several episodes:
To train a model on Nervana Cloud, first install and configure ncloud. ncloud is a command line client to help you use and manage Nervana’s deep learning cloud.
Assuming the necessary dependencies are installed, we can run training with:
ncloud train src/main.py –args ”Breakout-v0 –environment gym” –custom_code_url https://github.com/tambetm/simple_dqn
and testing with:
ncloud train src/test_gym.py –args ”Breakout-v0 –load_weights <saved_model_pkl>” ——custom_code_url https://github.com/tambetm/simple_dqn
To find out more about Nervana Cloud, visit Nervana’s Products page.
OpenAI Gym provides a nice toolkit for training and testing reinforcement learning algorithms. Extending Simple-DQN to work with OpenAI Gym was relatively straightforward to implement and hopefully others can easily extend this work to develop better learning algorithms.
We are excited to release the neon™ 2.6.0 framework, which features improvements for CPU inference path on a VGG-16 based Single Shot multibox Detector (SSD) neural network. These updates, along with the training optimizations released in neon 2.5.0, show that neon is gaining significant boosts in both training and inference performance. (Granular configuration details, as well…
Since the release of Coach a couple of months ago, we have been working hard to push it into new frontiers that will improve its usability for real world applications. In this release, we are introducing several new features that will move Coach forward in this direction. Imitation Learning First, we added several convenient tools…
We are excited to announce the release of neon™ 2.3.0. It ships with significant performance improvements for Deep Speech 2 (DS2) and VGG models running on Intel® architecture (IA). For the DS2 model, our tests show up to 6.8X improvement1,4 with the (Intel® MKL) backend over the NumPy CPU backend with neon™ 2.3.0, and more…
Get the latest from Intel AI