- To apply univariate and bivariate statistical techniques to explore associ - ated variables’ distributions and relationships.
- To assess the spatial processes and patterns in event (point) data and interpret the findings.
- To measure global and local spatial autocorrelation in real-world areal data and interpret the findings.
Introduction In the lectures you have been introduced to various statistical strategies in spatial analysis. Statistics is important in the process of spatial analysis as a way of investigating geographic patterns in spatial data and the relationships between features. You have also been introduced to techniques to analyse the spatial processes and patterns in point data. In this assignment, you will explore the distribution of Melbourne house sales in 2016 and 2017 using summary statistics and frequency histograms. You will also investigate the relationship between house sales and Census data using scatterplots and correlation coeﬀicients, and you will interpret your results and discuss their significance. You will identify geographic patterns to develop your understanding of how geographic phenomena behave, specifically, quantifying patterns in point data using a range of techniques and interpreting the results. The last part of this assignment will ask you to quantify and interpret the spatial autocorrelation properties of an areal dataset, specifically, aggregated data from the Australian censu s.
Preparatory tasks To assist you with this assignment, three practicals have been developed: • geom90006_practical_2_1.ipynb • geom90006_practical_2_2.ipynb • geom90006_practical_2_3.ipynb
Data The required data is provided in the student repository on GitLab. • Melbourne house sales dataset: Melbourne house sales for A2.csv – Note - this dataset is different to what you used in Assignment 1. Make sure you use the A2 version . 2
• ABS 2021 Census table G02 (Selected Medians and Averages) data for Victoria at SAL (Suburb and Locality) level: Census 2021 table G02 Victoria.gpkg • ABS 2021 Greater Melbourne GCCSA boundary polygon: Greater Mel - bourne (ABS) study area.shp
Outline of tasks In a Jupyter notebook, do the foll owing tasks by either writing Python code annotated with comments and/or Markdown blocks, or using Markdown blocks to respond to discussion, explanation or interpretation questions.
Part 1: Essential programming and statistics This section of the as - signment is designed to exercise your programming muscles. It may be more challenging for those with no coding background!
- Load the Melbourne house sales data.
- Generate the following text (with the missing values calculated using Python): The Melbourne house sales data has rows and fields. In this dataset, the suburb of Brunswick is frequently repre - sented ( houses sold), but the suburb with the most sales is ( houses sold). Hint: Please refer to preparatory practical six.
- Generate summary statistics (mean, standard deviation, median, mini - mum, maximum) for the land size and sale price fields. Make sure your output is human -readable: for example, values should be presented with appropriate numbers of decimal places and correct units of measurement.
- Create a histogram for the sale price field. This should be well -presented with a title, clear axis labelling, a suitable number of bins, and round numbers used where possible (e.g. 100, 200, 300 instead of 89.3, 178.6, 267.9).
- The sale price field is not normally distributed. Using appropriate sta - tistical terminology, briefly discuss how this field varies from the normal distribution. Explain how we can make these conclusions from the his - togram and, separately, from the summary statistics.
Part 2: Basic spatial skills The entire discipline of spatial analysis is founded on the First Law of Geography, which states that “everything is related to everything else, but near things are more related than distant things”. This law obviously relies on the notion of “distance”. Let’s begin the spatial analysis component of this assignment by calculating distances. 3
- Using the longitude and latitude values provided in the house sales data, calculate the distance from Flinders Street Station (longitude 144.9671°E, latitude 37.8182°S) to each house.
- Are these distances accurate and realistic? Explain.
- Transform the latitudes and longitudes to the GDA2020 MGA Zone 55 coordinate system ( EPSG 7855 ) and repea t step 6 using your transformed coordinates. Don’t forget to transform the coordinates of Flinders Street Station as well!
- Calculate the mean house price of houses 0 -5 km from Flinders Street Station, 5 -10 km from Flinders Street Station, and so on, up to 30+ km from Flinders Street Station, and offer your insights into your results.
- Challenge: Display your results from the previous question on a column chart.
Part 3: Spatial joins and bivariate statistics
- Load the ABS Census data from its GeoPackage.
- Perform a spatial join of the Census locality polygons onto the house sales dataset.
- Generate scatterplots for the following pairs of variables: sale price – median age sale price – average household size sale price – total household income
- Determine the Pearson correlation coeﬀicient of each of the relationships you generated a scatterplot for.
- Taking into consideration the scatterplots and correlation coeﬀicients, give a real-world interpretation of your results from the previous two questions.
- (a) In this section of the assignment, why did you need to perform a join? (b) The house sales dataset already contains suburb names. Suggest why we might be performing a spatial join, as opposed to a traditional column -based join where the two datasets are matched using the variable they have in common.
Part 4: Making maps Displaying your data and your analysis results as a map is an essential skill for a spatial data analyst. Using Python, it is easy to generate a rudimentary map with default settings (see example) but for inclu - sion in professional reports, a high standard of cartography with cartographic elements is expected. A professional map should include cartographic elements such as: • a border or box around the map area (only necessary if the map content extends to the very edge of the map area); 4
• an orientation (north arrow or graticule) if this is not obvious from the map itself; • a clear legend explaining all colours and symbols used on the map, other than obvious ones like roads and rivers; • a title providing a clear indication to the readers of the map’s topic; • a scale (this may be a scale bar or a grid or graticule); and • a caption or label showing the source of data. The mnemonic BOLTSS is used to remember these items. When using Python it is also important to remove unnecessary “chart junk”. For instance, most maps do not have axis labels or default grid lines. You can use the CartoPy and matplotlib libraries to help you produce standard maps with cartographic elements. For the following point pattern analysis, we will focus on the following three localities of Melbourne as our study areas: • Carlton – named Carlton (Vic.) in the Census dataset • Doncaster East • Keilor East
- Make a professional -quality map of the house price points in one of these study areas. The study area boundary should be included. Use a relevant Python library to include an appropriate basemap that helps the map audience to put the data into context.
Part 5: Point pattern analysis – house sales
- For the house sales data in each of the three study area localities (sepa - rately), calculate the Average Nearest Neighbour distance and the z-score. For Challenge marks, do this by writing your own code using (for example) GeoPandas directly, instead of using a point pattern analysis library. Hint: The formula for the z-score is on this page .
- Calculate the P-values correspond ing to the z-scores in the previous ques - tion.
- Display a graph of the K or L function (y -axis) against distance (x -axis) for each of the study area localities.
- Give an interpretation of your results from the previous two questions. Describe in detail (a) what these results tell you – and don’t tell you – about the point patterns and about your data, and (b) what you have learned about these two analysis techniques.
- Challenge : The K function is subject to “edge effects”. In your own words, explain what this means and why this is a problem, giving an example. Then suggest how the K function could be adapted to account for edge effects. 5
Part 6: Spatial autocorrelation analysis Let’s leave aside the house sales data for now and focus on the Census polygons. We will perform a spatial autocorrelation analysis to measure the extent to which the Census variables exhibit spatial dependence.
- The provided Census data is for the whole state of Victoria. For ease of analysis, we will focus on Greater Melbourne as defined by the ABS. “Clip” your Census polygons to the Greater Melbourne (ABS) study area.shp shapefile provided. Hint: GeoPandas provides a clip method that you can use.
- The Census data contains several attributes about the population in each locality. Select one of these attributes for your analysis.
- Draw a well -presented choropleth map of this attribute for Greater Mel - bourne.
- Using a spatial weights/neighbourhoods system of your choice, compute the global Moran’s I index for this attribute and give your interpretation of the results.
- Display the Moran’s scatterplot for this attribute. Focusing on one par - ticular locality (that is, one data point on your scatterplot), give your interpretation of what that suburb’s position on the scatterplo t tells you about that suburb and its surrounding areas.
- Compute the local Moran’s I index for all polygons in your study area. Present your results as a map and give your interpretation.
Submission Submit a Jupyter notebook file (.ipynb) to Canvas. As you are using only the data provided by us, there is no need to submit any data. Your notebook must load the files from a data subfolder. The files must have exactly the same name as they do in this repository. For example: import pandas as pd house_sales_df = pd.read_csv( 'data/Melbourne house sales for A2.csv') import geopandas as gpd census_polygons = gpd.read_file( 'data/Census 2021 table G02 Victoria.gpkg' ) If this is not adhered to, the markers will have diﬀiculty running your code and you will lose marks. Please speak to your tutor if you would like to use libraries that are not in the provided spanalytics.yml environment. This is an individual assignment . Read about academic integrity here . 6
Marking criteria • Correctness of output – 20% • Code structure – 10% • Theoretical and conceptual understanding of fundamentals – 20% • Quality of analytical insights and interpretation – 30% • Professional standard (spelling, layout, presentation of charts/maps) – 10% • Challenge questions – 10%