Assignments
This course has a strong practical component, consisting of three graded assignments:
Tabular reinforcement learning (individual)
Deep value-based reinforcement learning (group 0f 3)
Deep policy-based reinforcement learning (group of 3)
Together, your average grade for these three assignments determines 50% of your final grade (see Grading).
Grading scheme & Template
You can find the full grading scheme here.
Use template for AS1, template for AS2/3 for your reports with a page limit of 8.
Assignment PDF & Code
You get the assignment pdf and starting code from Brightspace (under Assignments).
Note: you first need to enroll in a groups for the specific assignment to actually see the Assignment appear in your list.
Start your assignment early (at least come to every practical session!). From previous years we know you need this time to explore the assignment, and you will not be able to finish it if you start in the last week.
Rules
Make sure you stick to the below rules for the course report:
Template: Write your report in this Latex template.
Do not alter the style file or page limits.
You do not need to follow all the specific formatting advise in the template (since it is very detailed and meant for a conference). We only use the template to make sure your reports are 1) neat, 2) comparable and 3) do not get too long. Therefore, simply enter your names and student numbers as authors, think of a nice title, and include you text, figures and tables as you would usually do in a latex document. You may include an abstract, but this is not mandatory.
Note that you need to replace \bibliography{example_paper} on line 545 of the template with \bibliography{main} if you want to use the main.bib file for your references (this is a slight inconsistency in the public template).
Page limit: Your report has a page limit of 8 pages (excl. references). We will not grade any material beyond the page limit.
You may include additional results/explanation, but put them in an Appendix. (It is up to the grader to read these: you are in principle graded based on your main report.)
Groups: You take the second and third assignment in groups of three.
If you can't find a partner for your team, try to get in touch with eachother through the Brightspace forum.
Contributions: Make sure you indicate (at the end of your report) what each team member contributed.
Important: you should do all work together, and learn together. Splitting the assignment is not allowed.
If we ask you to explain your assignment, both of you should be able to explain all aspects.
Code: Include your code with the report.
Make sure your code runs from the command line.
When you use new packages, indicate your dependencies.
Deadline
Each assignment has a clear deadline for handing in (see Schedule). You submit your assignment through the Brightspace page of the course.
Being late: We deduct a full point of your assignment grade for every full day (24 hours) that you hand it in too late.
Yes, this indeed means that the first 23:59 hours late are for 'free'.
For example: Your submission, which scores a grade of 8.0, was submitted 60 hours after the deadline. We deduct 2 points for being more than 48 hours (2 days) late, and you receive a 6.0.
Guidelines
Study these lecture slides first!
Then carefully read the guidelines on report structure and experimentation. Most importantly:
Report structure: Give your report a decent structure, with at least an Introduction, Methodology, Results, Discussion and Conclusion section.
You may also split up the methodology and results sections per topic, i.e., have something like:
Sec. 1: Introduction
Sec. 2: Topic I
2a Methodology
2b Results
Sec. 3: Topic II
3a Methodology
3b Results
Sec. 4: Discussion (incl. Conclusion)
Understanding: Show your understanding! (See explanation on the bottom of this page)
In the introduction/discussion/conclusion, show that your understand the bigger picture of the assignment (why are we doing this, what is the problem, how do the studied methods differ, etc.).
In the methodology section, explain why certain design choices in algorithms are made, what their benefits and problems are, etc.
In the result section, interpret what you see. Describe what you conclude from a figure, what your find interesting, what could be possible explanations, etc.
Statistical practice: Run repetitions of your experiment. Do not tune the seed. Average your results over repetitions. If possible, include confidence intervals.
Hyperparameter tuning: Always tune (some) hyperparameters. Start with the most important ones, such as the learning rate and and exploration parameter.
Equations: Write equations to explain your method. Keep notation consistent. Introduce every new symbol.
Captions: Write captions with all your figures and tables. Make the captions self-contained (describe the setting, what you conclude, and how the figure/table was produced).
Debugging
The teaching assistants are there to answer content questions, and (unfortunately) not to help you debug your code. This is something your really need to learn to do yourself.
For example, we cannot help you with questions like: "I get this error message" or "My code is not working, can you find my mistake?".
Remember: coding is not only about typing actual code, but also about verifying what your code is effectively doing. When you ask a question, make sure you understand what your code is doing.
Use a debugging tool (or at least a print statement) to find out what a variable stores, how it changes during algorithm execution, whether that matches your expectations of the algorithm, etc.