Homepage
Programming
Reinforcement Learning Final Project: Value-based RL and policy-based RL.

Reinforcement Learning Final Project: Value-based RL and policy-based RL.

Engage in a Conversation

Reinforcement Learning Final Project CourseNana.COM

2023-05-11 CourseNana.COM

1 Introduction The goal of the final project is to implement two kinds of model-free RL methods: value-based RL and policy-based RL. In this project, you are free to choose RL methods to solve two benchmark environments. 2 Review 2.1 Value-Based Reinforcement Learning Value-based methods strive to t action value function or state value function, e.g. Monte-Carlo, TD learning for model-free policy evaluation and SARSA, Q- learning for model-free control. O-policy training mode is easy to implement in value-based method. DQN achieves remarkable performance under o-policy. In DQN (Silver, 2015), past experiences that stored in experience buer can be used to train the deep Q network. In many transfer algorithms for DQN, expert's experiences are often used to t the current value function. Hence value-based methods are often more sample ecient. Although value-based RL like DQN and its variants achieve remarkable per- formance in some task, e.g. atari games, the inherent drawbacks hinder its development. First, action selection in value-based methods is according to the action values, which is inherently unsuited to continuous action space. Second, non-linear value function approximation like neural network is unstable and brittle with respect to their hyperparameters. 2.2 Policy-Based Reinforcement Learning In original policy gradient rlog(st;at)vt, returnvtis the unbiased esti- mation of expected long-term value Q(s;a) following a policy (s) (Actor). However, original policy gradient suers from high variance. Actor-Critic al- gorithm uses Q value function Qw(s;a), named Critic, to estimate Q(s;a). Though Critic may introduce bias, it can dramatically reduce variance and proper chose of function approximation may avoid it. The biggest drawback for policy gradient methods is sample ineciency: since policy gradients are estimated from rollouts. Although actor-critic methods use value approximators (Critic) instead of rollouts, its on-policy style remains sam- ple inecient. Prior works, such as DDPG (Lillicrap, 2015), Soft Actor-Critic (Haarnoja, 2018) strive to introduce o-policy mode to Actor-Critic. 3 Experiment Environments and Requirements OpenAI provides benchmark environments toolkit gym' to facilitate the de- velopment of reinforcement learning. 8 types of experiment environments are 2 available (access https://gym.openai.com/envs/#atari for more). In our project, you are required to train agents over Atari and MuJuCo. You should choose appropriate and eective RL methods to achieve high scores in the en- vironments as possible as you can. To get started with gym, refer to https: //github.com/openai/gym . 3.1 Atari Games Environment Description The Atari 2600 is a home video game console developed in 1977. Dozens of games are provided bygym'. In our project, we limit the choice of environment to the following: VideoPinball-ramNoFrameskip-v4 BreakoutNoFrameskip-v4 PongNoFrameskip-v4 BoxingNoFrameskip-v4 You should at least choose one environment to test your value-based method. 3.2 MuJuCo Continuous Control Environment Descrip- tion MuJuCo stands for Multi-Joint dynamics with Contact, which is originally de- signed for model-based control methods test. Now, MuJoCo simulator is a commonly adopted benchmark for continuous control. We narrow down the choice of environments to the following: Hopper-v2 Humanoid-v2 HalfCheetah-v2 Ant-v2 You should at least choose one environment to test your policy-based method. 3.3 Requirements Here is the experiment content: You are required to choose and implement value-based RL algorithms and test them on at least one of the Atari game listed above. You are required to choose and implement policy-based RL algorithms and test them on at least one of the MuJuCo simulator listed above. 3 The algorithms you choose in the scope of value-based and policy-based are non-limited. For the ease of running your submitted codes and grading, we have some limitations in this project. Programming language: python3 The nal results should use the experiment Name like following: python run.py - -env name BreakoutNoFrameskip-v4 4 Report and Submission 4.1 About Submission You are required to accomplish this project individually. Your report and source code should be zipped and named after "Name StuID". Besides, README le that shows the instructions to run your code should be included inside the zip le. The submission deadline is June 8, 2023. 4.2 About Report The report should cover but not be limited to the following sections: The description of the algorithms you use. The performance of the algorithms you achieve in selected environments The analysis about the algorithms. 4.3 Bonus Modication of the algorithms that achieves better performance. Test your algorithms on more than one environment. Excellent analysis about the algorithms. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信)

Last: EECS 2030 Advanced Object-Oriented Programming Lab 1: Java Concepts and JUnit

Next: 159.341 Programming Languages, Algorithms and Concurrency - Assignment 3: N-body simulator

Reinforcement Learning代写,Value-based RL代写,Policy-based RL代写,Monte-Carlo代写,Q- learning代写,Python代写,Reinforcement Learning代编,Value-based RL代编,Policy-based RL代编,Monte-Carlo代编,Q- learning代编,Python代编,Reinforcement Learning代考,Value-based RL代考,Policy-based RL代考,Monte-Carlo代考,Q- learning代考,Python代考,Reinforcement Learninghelp,Value-based RLhelp,Policy-based RLhelp,Monte-Carlohelp,Q- learninghelp,Pythonhelp,Reinforcement Learning作业代写,Value-based RL作业代写,Policy-based RL作业代写,Monte-Carlo作业代写,Q- learning作业代写,Python作业代写,Reinforcement Learning编程代写,Value-based RL编程代写,Policy-based RL编程代写,Monte-Carlo编程代写,Q- learning编程代写,Python编程代写,Reinforcement Learningprogramming help,Value-based RLprogramming help,Policy-based RLprogramming help,Monte-Carloprogramming help,Q- learningprogramming help,Pythonprogramming help,Reinforcement Learningassignment help,Value-based RLassignment help,Policy-based RLassignment help,Monte-Carloassignment help,Q- learningassignment help,Pythonassignment help,Reinforcement Learningsolution,Value-based RLsolution,Policy-based RLsolution,Monte-Carlosolution,Q- learningsolution,Pythonsolution,