STAT5002 Assignment
Steph Stammel 2024-05-03
This assignment is due May 31 at 23:59. You will upload a .pdf document that is a rendered .Rmd file. The total marks assigned to this assessment is 8% of your total grade. The marks assigned to each part of
the assignment are displayed below.
This assignment uses assignment-data.csv. It shows the year-on-year percentage changes in residential house pricing in different countries over time. You can find the original data here.
Part 1: Choose a single country (40% of grade) Part 1a (10 marks)
Using the data provided, choose one country and create an informative chart describing housing prices. You may choose the time period to draw data from as well as the country.
You may explain the chart in no more than five sentences. What does it mean? What story does it tell about your chosen country and residential price?
Part 1b (10 marks)
Calculate the correlation coefficient between your chosen country and time.
Please interpret this correlation coefficient and explain its meaning, considering your answer to the previous question.
Is there a true relationship, or could something else be creating the appearance of a relationship? What do you think?
Part 1c (5 marks)
Create a model predicting residential housing price growth for your country over time. The only exogenous variable you should use is time.
Please show your model in a neat, readable way. Calculate the AIC, BIC and R2 of this model.
Part 1d (10 marks)
Create another model, this time using ln(time) as a variable instead of time. Please show your model in a neat, readable way.
-
Interpret the coefficent on ln(time)
-
Calculate the AIC, BIC of this model
-
Compare the two models and describe each their strengths and weaknesses. Choose one as the better
model and explain why.
1
• For both models please provide a prediction of residential house price growth in 2025. What do you think about the quality of these forecasts?
Part 1e (5 marks)
• Plot the historic data and model predictions for each model annually for the next 5 quarters • Compare them to the historic data, what do you think of each of the model’s forecast?
Part 2: International analysis (30% of grade) Part 2a (10 marks)
• Create an informative chart (or charts) that describes the change in residential house prices for three countries of your choice. You may choose the time period to draw data from as well as the countries.
You may explain the chart in no more than five sentences. What does it mean? What story does it tell about your chosen countries and residential price?
Part 2b (10 marks)
Using the three countries you chose above, calculate the correlation coefficients for each combination of countries.
Are these correlations strong or weak? Why do you think this is so?
Add a new set of correlations: each country’s change in residential price with time. Is this correlation strong or weak? Why do you think this is so?
Part 2c (10 marks)
• Create a model that predicts the change in residential prices between your first chosen country, using your second and third countries and time.
• You may use any form of model you wish. Please present the output of the model in a neat informative way
Part 2d (10 marks)
Considering the model calculated in part 2c:
• What do you notice about the significance of the coefficients on country variables? Why do you think this is?
• What do you notice about the significance of the coefficient of time? What do you think this means?
• What does this mean for the correlation coefficients you calculated in the last part?
• Describe the similarities and differences between each country’s changes in residential house price. You may include charts for explanation if you wish.
Part 3: Thinking about our models (30% of grade)
You may choose either of the two models you have calculated in previous parts for this question.
Part 3a (10 marks)
Describe potential flaws in your model, how could you test or look for them?
2
Part 3b (10 marks)
Implement the tests and measures you suggested above and interpret them. Please provide your results and conclusions neatly and concisely.
Part 3c (10 marks)
What are the consequences of the outcomes from part 3b on your model?
Appendix: resources and instructions to help you create a great assignment!
General
-
You need not display your code
-
The assignment should be no more than ten pages long, including any graphics or tables
-
In this assignment, answers can be presented visually, in text or in a combination of both. As long as it
is informative, you may choose any combination.
-
A really good chart is worth more marks than a poorly written paragraph
-
Any chart presented in the assignment should use at least 11 point font
-
You should consider that charts that are too small are not informative and budget for the size of the
charts required to be informative
-
You should not print out the data or any unnecessary output
-
Assignments that are longer than the page limit will be penalised
-
You should not require the full page count to get an excellent mark. More words are not necessarily
better!
Data visulisation:
An excellent response to a data visualisation question will have:
• A clear and informative chart that explains the key points.
• All labels, titles and captions will be informative and present as required. • The chart will be of appropriate size to enable easy reading of the material. • See below for some resources.Books
“The Visual Display of Quantitative Information” by Edward Tufte
“Storytelling with Data: A Data Visualization Guide for Business Professionals” by Cole Nussbaumer Knaflic “Data Points: Visualization That Means Something” by Nathan YauWebsites that might give you some ideas
Flowing Data
Information is beautiful
Visualising data
Storytelling with Data - Youtube Channel Data visualisation checklist
3
More reading on linear regression
Learning R with Statistics by Danielle Navarro, especially Chapter 15.
R Markdown
You DO NOT have to prepare this assignment in Rmarkdown. You may use any software you wish, as long as the assignment is readable and neat.
Some people may choose to use Rmarkdown, however. If so: Rmarkdown cookbook
Making your model outputs neat and tidy
broom
Hints and tips for this assignment
-
More words is not going to mean higher marks. Answers should be provided succinctly. If you have a choice between answering in one sentence or five - choose one sentence
-
No answer should take more than 1 paragraph to write (maybe 2 at the outside)
-
Many of you are doing this course in a language that is not your native language. You’re much cleverer
than I am!
-
This assignment does not require perfect English to get good marks
-
If you do not feel confident in your writing skills, you may answer in bullet points instead
-
As long as your answers are clear and understandable, this is fine
-
Imperfect grammar with clear and correct answers will not be penalised
Some writing and presentation conventions
-
In business, few decimal points are included, but in academia, many. I aim for 2 or less in business writing and about 4 in academia. For this assignment, I recommend using 2 decimal places
-
For students learning to write formally for the first time (I did not start this until I was a Ph.D. student), using present tense is often a good choice for clarity of discussion
-
Many people write sentences or bullet points that are too long. You can test this by reading your sentence or bullet point out loud. If you have to take a breath before you are finished, it is too long! Other things to be observant of is multiple clauses (use of , :, ; and parentheses) in a sentence. This is another indication it may be too long
-
Your charts should be quite large - this is why the page limit is long, even if I only expect a few sentences for most questions
-
I find that half a page is a good choice for size of charts
4