1. Homepage
  2. Programming
  3. COMP3702 Artificial Intelligence (Semester 2, 2022) Assignment 2: HexBot MDP

COMP3702 Artificial Intelligence (Semester 2, 2022) Assignment 2: HexBot MDP

Engage in a Conversation
AustraliaUniversity of QueenslandCOMP3702Artificial IntelligenceHexBot MDPPythonMarkov Decision Process

COMP3702 Artificial Intelligence (Semester 2, 2022) Assignment 2: HexBot MDP CourseNana.COM

Key information: CourseNana.COM

  • This assignment will assess your skills in developing algorithms for solving MDPs.
  • Assignment 2 contributes 20% to your final grade.
  • This assignment consists of two parts: (1) programming and (2) a report.
  • This is an individual assignment.
  • Both code and report are to be submitted via Gradescope (https://www.gradescope.com/). You can find instructions on how to register for the COMP3702 Gradescope site on Blackboard.
  • Your program (Part 1, 60/100) will be graded using the Gradescope code autograder, using testcases similar to those in the support code provided at https://gitlab.com/3702-2022/a2-support.
  • Your report (Part 2, 40/100) should fit the template provided, be in .pdf format and named according to the format a2-COMP3702-[SID].pdf, where SID is your student ID. Reports will be graded by the teaching team.

The HexBot Robot AI Environment CourseNana.COM

You have been tasked with developing a planning algorithm for automatically controlling HexBot, a multi- purpose robot which operates in a hexagonal environment, and has the capability to push, pull and rotate ‘Widgets’ in order to reposition them to target locations. To aid you in this task, we have provided support code for the HexBot robot environment which you will interface with to develop your solution. To optimally solve a level, your AI agent must efficiently find a sequence of actions so that every Target cell is occupied by part of a Widget, while incurring the minimum possible action cost. CourseNana.COM

Levels in HexBot are composed of a Hexagonal grid of cells, where each cell contains a character representing the cell type. An example game level is shown in Figure 1. CourseNana.COM


CourseNana.COM

Figure 1: Example game level of HexBot CourseNana.COM

Environment representation CourseNana.COM

Hexagonal Grid CourseNana.COM

The environment is represented by a hexagonal grid. Each cell of the hex grid is indexed by (row, column) coordinates. The hex grid is indexed top to bottom, left to right. That is, the top left corner has coordinates (0, 0) and the bottom right corner has coordinates (nrows 1, ncols 1). Even numbered columns (starting from zero) are in the top half of the row, and odd numbered columns are in the bottom half of the row. An example is shown in Figure 2. CourseNana.COM

Robot and its Actions CourseNana.COM

The HexBot robot occupies a single cell in the hex grid. In the visualisation, the robot is represented by the cell marked with the character ‘R’. The side of the cell marked with ‘*’ represents the front of the robot. The state of the robot is defined by its (row, column) coordinates and its orientation (i.e. the direction its front side is pointing towards). CourseNana.COM

At each time step, the agent is prompted to select an action. The robot has 4 nominal actions: CourseNana.COM

  • Forward move to the adjacent cell in the direction of the front of the robot (keeping the same

orientation) CourseNana.COM

  • Reverse move to the adjacent cell in the opposite direction to the front of the robot (keeping the same orientation)
  • Spin Left rotate left (relative to the robot’s front, i.e. counterclockwise) by 60 degrees (staying in the same cell)
  • Spin Right rotate right (i.e. clockwise) by 60 degrees (staying in the same cell)

Each time the robot selects an action, there is a fixed probability (given as a parameter of each testcase) for the robot to ‘drift’ by 60 degrees in a clockwise or counterclockwise direction (separate probabilities for each drift direction) before the selected nominal action is performed. The probability of drift occurring de- pends on which nominal action is selected, with some actions more likely to result in drift. Drifting CW and CCW are mutually exclusive events. Drift occurring does not cause any additional cost/reward penalty to be incurred except in the case where the movement direction resulting from drift causes a collision to occur. CourseNana.COM

Action Costs CourseNana.COM

Each action has an associated cost, representing the amount of energy used by performing that action.
If the robot moves without pushing or pulling a widget, the cost of the action is given by a base action cost,
CourseNana.COM

Obstacles CourseNana.COM

Some cells in the hex grid are obstacles. In the visualisation, these cells are filled with the character ‘X’. Any action which causes the robot or any part of a Widget to enter an obstacle cell results in collision, causing a penalty value (given as a parameter of each testcase) to be subtracted from the received reward. The outside boundary of the hex grid behaves in the same way as an obstacle. CourseNana.COM

Widgets CourseNana.COM

Widgets are objects which occupy multiple cells of the hexagonal grid, and can be rotated and translated by Page 3 CourseNana.COM

the HexBot robot. The state of each widget is defined by its centre position (row, column) coordinates and its orientation. Widgets have rotational symmetries - orientations which are rotationally symmetric are considered to be the same. CourseNana.COM

In the visualisation, each Widget in the environment is assigned a unique letter ‘a’, ‘b’, ‘c’, etc. Cells which are occupied by a widget are marked with the letter assigned to that widget (surrounded by round brackets). The centre position of the widget is marked by the uppercase version of the letter, while all other cells occupied by the widget are marked with the lowercase. CourseNana.COM

Interactive mode CourseNana.COM

A good way to gain an understanding of the game is to play it. You can play the game to get a feel for how it works by launching an interactive game session from the terminal with the following command: CourseNana.COM

$ python play.py <input_file>.txt
where <input_file>.txt is a valid testcase file (from the support code, with path relative to the current CourseNana.COM

directory), e.g. testcases/ex1.txt. CourseNana.COM

HexBot as an MDP CourseNana.COM

In this assignment, you will write the components of a program to play HexBot, with the objective of finding a high-quality solution to the problem using various sequential decision-making algorithms based on the Markov decision process (MDP) framework. This assignment will test your skills in defining a MDP for a practical problem and developing effective exact algorithms for MDPs. CourseNana.COM

What is provided to you CourseNana.COM

We will provide supporting code in Python, in the form of:
1. A class representing
HexBot game map and a number of helper functions CourseNana.COM

2. A parser method to take an input file (testcase) and convert it into a HexBot map 3. A tester
4. Testcases to test and evaluate your solution
5. A script to allow you to play
HexBot interactively CourseNana.COM

6. A solution file template CourseNana.COM

Your assignment task CourseNana.COM

Your task is to develop algorithms for computing policies, and to write a report on your algorithms’ perfor- mance. You will be graded on both your submitted program (Part 1, 60%) and the report (Part 2, 40%). These percentages will be scaled to the 20% course weighting for this assessment item. CourseNana.COM

1. Value Iteration (VI) 2. Policy Iteration (PI) CourseNana.COM

Individual testcases specify which strategy (value iteration/policy iteration) will be applied, but you may modify the strategy specified in your local copy of the test cases for the purpose of comparing the perfor- mance of your two algorithms. Note that any local changes you make to the test cases will not modify the test cases on Gradescope (against which your programming component will be graded). The difficulty of higher level testcases will be designed to require a more advanced solution (e.g. linear algebra policy iteration). CourseNana.COM

Part 1 — The programming task CourseNana.COM

We now provide you with some details explaining how your code will interact with the testcases and the autograder (with special thanks to Nick Collins for his efforts making this work seamlessly). CourseNana.COM

Implement your solution using the supplied solution.py Template file. You are required to fill in the following method stubs: CourseNana.COM

init (game env) CourseNana.COM

vi initialise() CourseNana.COM

vi is converged() CourseNana.COM

vi iteration() CourseNana.COM

vi plan offline() CourseNana.COM

You can add additional helper methods and classes (either in solution.py or in files you create) if you wish. To ensure your code is handled correctly by the autograder, you should avoid using any try-except blocks in your implementation of the above methods (as this can interfere with our time-out handling). Refer to the documentation in solution.py for more details. CourseNana.COM

Part 2 — The report CourseNana.COM

The report tests your understanding of MDP algorithms and the methods you have used in your code, and contributes 40/100 of your assignment mark. CourseNana.COM

Question 1. MDP problem definition (15 marks) CourseNana.COM

a)  Define the State space, Action space, Transition function, and Reward function components of the HexBox MDP as well as where these are represented in your code. (10 marks) CourseNana.COM

b)  Describe the purpose of a discount factor in MDPs. (2.5 marks) CourseNana.COM

c)  State what the following dimensions of complexity are of your agent in the HexBot MDP. (See CourseNana.COM

https://artint.info/html/ArtInt_12.html for definitions) (2.5 marks) CourseNana.COM

Planning Horizon
• Sensing Uncertainty CourseNana.COM

Question 2. Comparison of algorithms and optimisations (15 marks) CourseNana.COM

This question requires a comparison of your implementation of value iteration (VI) and policy iteration (PI). If you did not implement PI, you may receive partial marks for this question by providing insightful relevant comments on your implementation of VI. For example, if you tried standard VI and asynchronous VI, you may compare these two approaches for partial marks. CourseNana.COM

  1. a)  Describe your implementations of value iteration and policy iteration in one sentence each. Include details such as whether you used asynchronous updates, and how you handled policy evaluation in PI. (2 marks)
  2. b)  Pick three representative testcases and compare the performance of Value Iteration (VI) and Policy Iter- ation (PI) according to the following measures. Please include the numerical values of your experiments in your comparisons.

Time to converge to the solution. (3 marks) Number of iterations to converge to the solution. (3 marks) CourseNana.COM

In order to do this, you’ll need to modify the # solver type in your local copy of the test cases (the text files in the testcases directory). CourseNana.COM

Question 3. “Risky Business”: Optimal policy variation based on probabilities & costs (10 marks) CourseNana.COM

One consideration in the solution of a Markov Decision Process (i.e. the optimal policy) is the trade off between a risky higher reward vs a lower risk lower reward, which depends on the probabilities of non-deterministic dynamics of the environment and the rewards associated with certain states and actions. CourseNana.COM

If you did not implement PI, you may change the solver type to VI in order to answer this question. CourseNana.COM

  1. a)  How do you expect the optimal path to change as the hazard penalty and transition probabilities change? Use facts about the algorithms to justify why you expect these changes to have such effects.(5 marks)
  2. b)  Picking three values for hazard penalty, and three sets of values for the transition probabilities, explore how the optimal policy changes over the 9 combinations of these factors. Do the experimental results align with what you thought should happen? If not, why? (5 marks)

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
Australia代写,University of Queensland代写,COMP3702代写,Artificial Intelligence代写,HexBot MDP代写,Python代写,Markov Decision Process 代写,Australia代编,University of Queensland代编,COMP3702代编,Artificial Intelligence代编,HexBot MDP代编,Python代编,Markov Decision Process 代编,Australia代考,University of Queensland代考,COMP3702代考,Artificial Intelligence代考,HexBot MDP代考,Python代考,Markov Decision Process 代考,Australiahelp,University of Queenslandhelp,COMP3702help,Artificial Intelligencehelp,HexBot MDPhelp,Pythonhelp,Markov Decision Process help,Australia作业代写,University of Queensland作业代写,COMP3702作业代写,Artificial Intelligence作业代写,HexBot MDP作业代写,Python作业代写,Markov Decision Process 作业代写,Australia编程代写,University of Queensland编程代写,COMP3702编程代写,Artificial Intelligence编程代写,HexBot MDP编程代写,Python编程代写,Markov Decision Process 编程代写,Australiaprogramming help,University of Queenslandprogramming help,COMP3702programming help,Artificial Intelligenceprogramming help,HexBot MDPprogramming help,Pythonprogramming help,Markov Decision Process programming help,Australiaassignment help,University of Queenslandassignment help,COMP3702assignment help,Artificial Intelligenceassignment help,HexBot MDPassignment help,Pythonassignment help,Markov Decision Process assignment help,Australiasolution,University of Queenslandsolution,COMP3702solution,Artificial Intelligencesolution,HexBot MDPsolution,Pythonsolution,Markov Decision Process solution,