ITEC3040 Introduction to Data Analytics
Assignment 1 Due: June 14, 11:55pm, 2023
Submission Instructions:
This is individual assignment. Use eClass to submit your work. At the top of the each file introduce your name and student number. You may use software (for example, R, SAS, MATLAB and Python). No Excel allowed.
- Show ALL your work!!!
- Submit ALL your program(s) along with your solutions(including comments, results and
graphs).
Evaluation is based on the work you submitted.
1. Commercial properties.
A commercial real estate company evaluates vacancy rates, square footage, rental rates, and operating expenses for commercial properties in a large metropolitan area in order to provide clients with quantitative information upon which to make rental decisions. The data below are taken from suburban commercial properties that are the newest, best located, most attractive, and expensive for five specific geographic areas. First column is rental rates, second column is the age of properties, third column is the operating expenses and taxes, fourth column is the vacancy rates and last column is the total square footage.
(Data are sampled comes from: Applied Linear Regression Models, written by Michael H. Kutner, Christopher J. Nachtsheim & John Neter) (a) What is response variable? What are the predictors? (b) Calculate the mean, median, standard deviation of rental rates. Comment on your calculation. (c) Draw boxplot for age of properties, operating expenses and taxes,vacancy rates and total square footage, what are your findings based on the boxplot? (d) Draw histogram for age of properties, operating expenses and taxes,vacancy rates and total square footage, what are your findings based on the histogram? (e) Is age of properties normally distributed? Verify your answer. (f) Is there a correlation between response variable and predictors? Draw scatter plot between response variable and each of predictors, and calculate correlation coefficient. What are your findings? (g) Normalize the age of properties based on z-score normalization. (h) Normalize the age of properties by z-score normalization using mean absolute deviation instead of standard deviation. (i) Comment on the methods you used in (g) and (h).