Reinforcement Learning What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?

Question

Question

Reinforcement Learning What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?

Reinforcement Learning

What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Asynchronous Advantage Actor-Critic

But what comes after? The same company who is responsible for the DQN, DeepMind, introduced the A3C architecture more recently. It’s supposed to be faster, simpler and more robust than the DQN and also able to achieve better results. But how?

You can figure out the biggest difference by looking at the name of this mysterious architecture: Asynchronous Advantage Actor-Critic. In DQN, a single agent (or so-called worker) interacts with a single environment, generating training data. The A3C launches several workers asynchronously (as much as your CPU can handle) and lets them all interact with their own instance of the environment. They also train their own copy of the network and share their results at the end of the simulation.

The Advantage

So why exactly is this better than a traditional DQN? There are multiple reasons for that. First, by asynchronously launching more workers, you are essentially going to collect as much more training data, which makes the collection of the data faster.

Since every single worker instance also has their own environment, you are going to get more diverse data, which is known to make the network more robust and generates better results!

Deep Q-Network

DQN is introduced in 2 papers, Playing Atari with Deep Reinforcement Learning on NIPS in 2013 and Human-level control through deep reinforcement learning on Nature in 2015. Interestingly, there were only few papers about DRN between 2013 and 2015. I guess that the reason was people couldn’t reproduce DQN implementation without information in Nature version.

0*35Fy2ZkW6zTJLGO-.

DQN agent playing Breakout

DQN overcomes unstable learning by mainly 4 techniques.

Experience Replay
Target Network
Clipping Rewards
Skipping Frames

I explain each technique one by one.

Experience Replay

Experience Replay is originally proposed in Reinforcement Learning for Robots Using Neural Networks in 1993. DNN is easily overfitting current episodes. Once DNN is overfitted, it’s hard to produce various experiences. To solve this problem, Experience Replay stores experiences including state transitions, rewards and actions, which are necessary data to perform Q learning, and makes mini-batches to update neural networks. This technique expects the following merits.

reduces correlation between experiences in updating DNN
increases learning speed with mini-batches
reuses past transitions to avoid catastrophic forgetting

Target Network

In TD error calculation, target function is changed frequently with DNN. Unstable target function makes training difficult. So Target Network technique fixes parameters of target function and replaces them with the latest network every thousands steps.

1*Gqg5g7PxlpHv35MchecWiA.png

target Q function in the red rectangular is fixed

Clipping Rewards

Each game has different score scales. For example, in Pong, players can get 1 point when wining the play. Otherwise, players get -1 point. However, in SpaceInvaders, players get 10~30 points when defeating invaders. This difference would make training unstable. Thus Clipping Rewards technique clips scores, which all positive rewards are set +1 and all negative rewards are set -1.

Skipping Frames

ALE is capable of rendering 60 images per second. But actually people don’t take actions so much in a second. AI doesn’t need to calculate Q values every frame. So Skipping Frames technique is that DQN calculates Q values every 4 frames and use past 4 frames as inputs. This reduces computational cost and gathers more experiences.

Performance

All of above techniques enables DQN to achieve stable training.

1*mo7vMyOHEj5TVr3mRjGVpg.png

DQN overwhelms naive DQN

In Nature version, it shows how much Experience Replay and Target Network contribute to stability.

1*V81as-UDBknULyI9onPC0A.png

Performance with and without Experience Replay and Target Network

Experience Replay is very important in DQN. Target Network also increases its performance.

Add a comment

Answer 2

Reinforcement Learning What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?

Homework Answers

Add Answer to:
Reinforcement Learning What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?

Post as a guest

Earn Coins

What are some of the advantages and disadvantages of changing current prescription drugs to over-the-counter drugs?

What are some potential advantages and disadvantages of operating with a high leverage and debt ratio?...

Assignment What are the similarities,differences,advantages and disadvantages of the following over the other; Data Mining and...

1.What are some advantages and disadvantages of the Ames test over tests for carcinogens that use...

What are some examples of Internet based advertising? What are the advantages of this type of...

what are the advantages and disadvantages of marijuana use and what are possible ethical issues? please...

What are some advantages and disadvantages with using a HRIS system?

Talk about the advantages and disadvantages of telehealth/telemedicineas it pertains to older adults What are some...

What are some advantages of the EHR, what are some disadvantages, and how do those pros...

what would be in your opinion some of the advantages and disadvantages of such a Union/partnership...

Reinforcement Learning What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?

Homework Answers

Add Answer to: Reinforcement Learning What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?

Post as a guest

Earn Coins

Add Answer to:
Reinforcement Learning What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?