Reinforcement Learning
What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?
Asynchronous Advantage Actor-Critic
But what comes after? The same company who is responsible for the DQN, DeepMind, introduced the A3C architecture more recently. It’s supposed to be faster, simpler and more robust than the DQN and also able to achieve better results. But how?
You can figure out the biggest difference by looking at the name of this mysterious architecture: Asynchronous Advantage Actor-Critic. In DQN, a single agent (or so-called worker) interacts with a single environment, generating training data. The A3C launches several workers asynchronously (as much as your CPU can handle) and lets them all interact with their own instance of the environment. They also train their own copy of the network and share their results at the end of the simulation.
The Advantage
So why exactly is this better than a traditional DQN? There are multiple reasons for that. First, by asynchronously launching more workers, you are essentially going to collect as much more training data, which makes the collection of the data faster.
Since every single worker instance also has their own environment, you are going to get more diverse data, which is known to make the network more robust and generates better results!
Deep Q-Network
DQN is introduced in 2 papers, Playing Atari with Deep Reinforcement Learning on NIPS in 2013 and Human-level control through deep reinforcement learning on Nature in 2015. Interestingly, there were only few papers about DRN between 2013 and 2015. I guess that the reason was people couldn’t reproduce DQN implementation without information in Nature version.
DQN agent playing Breakout
DQN overcomes unstable learning by mainly 4 techniques.
I explain each technique one by one.
Experience Replay
Experience Replay is originally proposed in Reinforcement Learning for Robots Using Neural Networks in 1993. DNN is easily overfitting current episodes. Once DNN is overfitted, it’s hard to produce various experiences. To solve this problem, Experience Replay stores experiences including state transitions, rewards and actions, which are necessary data to perform Q learning, and makes mini-batches to update neural networks. This technique expects the following merits.
Target Network
In TD error calculation, target function is changed frequently with DNN. Unstable target function makes training difficult. So Target Network technique fixes parameters of target function and replaces them with the latest network every thousands steps.
target Q function in the red rectangular is fixed
Clipping Rewards
Each game has different score scales. For example, in Pong, players can get 1 point when wining the play. Otherwise, players get -1 point. However, in SpaceInvaders, players get 10~30 points when defeating invaders. This difference would make training unstable. Thus Clipping Rewards technique clips scores, which all positive rewards are set +1 and all negative rewards are set -1.
Skipping Frames
ALE is capable of rendering 60 images per second. But actually people don’t take actions so much in a second. AI doesn’t need to calculate Q values every frame. So Skipping Frames technique is that DQN calculates Q values every 4 frames and use past 4 frames as inputs. This reduces computational cost and gathers more experiences.
Performance
All of above techniques enables DQN to achieve stable training.
DQN overwhelms naive DQN
In Nature version, it shows how much Experience Replay and Target Network contribute to stability.
Performance with and without Experience Replay and Target Network
Experience Replay is very important in DQN. Target Network also increases its performance.
Reinforcement Learning What are some advantages and disadvantages of A3C over DQN? What are some potential issues that can be caused by asynchronous updates in A3C?
What are some of the advantages and disadvantages of changing current prescription drugs to over-the-counter drugs?
What are some potential advantages and disadvantages of operating with a high leverage and debt ratio? (7) Explain why P/E ratio and EPS are used in determining health of a stock? Is it good to have a low EPS ratio? Why or why not? (8)
Assignment What are the similarities,differences,advantages and disadvantages of the following over the other; Data Mining and Warehousing, Machine learning and Deep learning, Artificial Intelligence and Expert system.
1.What are some advantages and disadvantages of the Ames test over tests for carcinogens that use animal models? 2.Define the following terms: a.Mutagen b.Carcinogen c.Auxotroph d.Prototroph e.Back-mutation
What are some examples of Internet based advertising? What are the advantages of this type of advertising? What are some disadvantages? What are potential ethical issues that might occur when using digital advertising? in 250-500 words please
what are the advantages and disadvantages of marijuana use and what are possible ethical issues? please use philosophers views based on ethics
What are some advantages and disadvantages with using a HRIS system?
Talk about the advantages and disadvantages of telehealth/telemedicineas it pertains to older adults What are some of the safety/security issues we may face with telemedicine? How should we deal with cost when it comes to telemedicine? (Discuss in relation to older adults)
What are some advantages of the EHR, what are some disadvantages, and how do those pros and cons compare to paper charting?
what would be in your opinion some of the advantages and disadvantages of such a Union/partnership for citizens who are part of it? What aspect particularly impressed/or shocked you? Can the U.S. learn anything from the E.U? I to 2 paragraphs