1. Homepage
  2. Programming
  3. STAT350 Introduction to Statistics Homework1: Bike project

STAT350 Introduction to Statistics Homework1: Bike project

Engage in a Conversation
USPurdue UniversitySTAT350Introduction to StatisticsBike projectR

Homework 1 (40 pts) CourseNana.COM

  CourseNana.COM

Part 1: Introduction on the data

 

This homework assignment is based on a project to study the impact of weather conditions on bike sharing demand, the Bike project. CourseNana.COM

Background CourseNana.COM

Biking as an alternative transportation mode can provide numerous benefits not only to individuals health but also to the whole community by alleviating some relevant issues found in big cities where traffic congestion, insufficient parking facilities, air and noise pollution are a daily burden. Some of these benefits include the flexibility of traveling short distances easily. Likewise, it also allows the flexibility of traveling long distances by relying on it to cover the first/last mile from/to the transit stations, thereby shortening travel time. Moreover, biking provides great convenience because it offers door-to-door service where bike parking facilities are usually next door, which makes it available for riders any time. CourseNana.COM

All these great benefits have created several gaps where different bike sharing systems have emerged where an individual does not have to own a bike to ride one. However, as beneficial as bike sharing is, the demand is greatly impacted by weather conditions. and the trip purpose whether the trip is a work trip or not. Past studies have shown that weather conditions such as temperature, humidity, and wind speed have a significant impact on usage demand of bike sharing (Fuller et al. 2013; Gebhart and Noland 2014; Heinen et al. 2010). CourseNana.COM

  CourseNana.COM

Data Source CourseNana.COM

Bike sharing usage data from 2011 to 2012 in this study was collected by Capital Bikeshare (CaBi) - one of the largest companies providing bike sharing systems in the United States with more than 2500 bikes distributed across 300 stations in Washington, DC and Arlington, VA. (Capital Bikeshare 2012). The whole dataset including the bike sharing usage data, weather data, and holiday schedule was obtained from the University of California Irvine Center for Machine Leaning website https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset. CourseNana.COM

  CourseNana.COM

Data Description CourseNana.COM

 In this study, a literature review was conducted to choose relevant and adequate variables to evaluate how bike sharing usage is impacted by weather conditions. Ultimately, the following continuous and categorical variables were chosen: CourseNana.COM

  CourseNana.COM

Variable Notation CourseNana.COM

Variable name CourseNana.COM

 Variable Type CourseNana.COM

Description CourseNana.COM

X1 CourseNana.COM

Temperature CourseNana.COM

Continuous CourseNana.COM

Normalized temperature in Celsius. The values are divided by 41 (max). CourseNana.COM

X2 CourseNana.COM

Humidity CourseNana.COM

Continuous CourseNana.COM

Normalized humidity. The values are divided by 100 (max). CourseNana.COM

X3 CourseNana.COM

Windspeed CourseNana.COM

Continuous CourseNana.COM

Normalized windspeed. The values are divided by 100 (max). CourseNana.COM

X4 CourseNana.COM

Working day CourseNana.COM

Categorical CourseNana.COM

1 If day is neither weekend nor holiday, and 0 otherwise. CourseNana.COM

Y CourseNana.COM

Count CourseNana.COM

Continuous CourseNana.COM

The number of total rental bike users. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Data is accessible from the course website: Data and Resource > Data used in class> BikeProject.csv CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Part 2Introduction on the homework types and format requirements.

  CourseNana.COM

There are two kinds of problems, conceptual and application. The conceptual problem focuses on definition, notation, and formula. For this kind of problem, you are supposed to compute by hand (or basic arithmetic function in Excel or R), but not the function that directly shows the answer. Formula and working progress should be clearly shown. By default, all questions in the homework assignment are of this type. CourseNana.COM

  CourseNana.COM

The application problem focuses on R application skill and output interpretation. This problem usually contains the phrase “use R….”, or “according to the R output”. For this kind of problem, you don’t need to compute the results by hand. Instead, get the result from R and proceed. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

For instance, in the homework 1, problem 1-3 are conceptual problems, and problem 4 is application problem. CourseNana.COM

  CourseNana.COM

For problems 1 to 3, you may use Excel or R to compute the residuals and sum of squares, means for the variables before compute the residual standard error. When computing the item, show the formula and detail and use the correct notation. You may not use the linear regression function, such as lm() to compute the numbers because the purpose of these problems is to get familiar with the formula and notation. CourseNana.COM

For problem 4, you may use the linear regression function such as lm() to run the analysis, the purpose is to be familiar with the R output. CourseNana.COM

  CourseNana.COM

  CourseNana.COM

Part 3. Homework questions

  CourseNana.COM

In this homework, we consider a simple linear regression Y ~ X, where X=X1 is the temperature. The goal is to study the impact of the temperature on the bike rental counts. CourseNana.COM

  CourseNana.COM

1.(10) Estimate the parameters ( for a linear regression to predict Y based on X. Complete the following with details. CourseNana.COM

  CourseNana.COM


CourseNana.COM

2. (8) In order to estimate the linear impact of X on Y, at a confidence of (, you should use the critical value, or the t value denoted as t(___, ____), which has a value of ____ (use basic R function or Excel for the exact value), at , and _____at . The standard error of the estimation  _______________(formula)=________(value). The margin error, or of the confidence interval is _________ at , and _____at . CourseNana.COM

  CourseNana.COM

  CourseNana.COM

3. (10) perform a hypothesis test on the linear impact of X on Y, with a T test with a significant value of 0.1. CourseNana.COM

Note: CourseNana.COM

·       if a question doesn’t specify the hypothesized value, it is two-sided test against 0. CourseNana.COM

·       All hypothesis problem should include the following component: Ho/Ha defined in symbols ( etc.), test statistic (notation and formulas), reject region defined on a critical value (p-value computed on a probability formula), and conclusion.  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

  CourseNana.COM

4. (6) Use R to obtain a summary of this SLR model. Highlight the following concepts on the output, the notation, the values, and finally an interpretation.  Compute the item with R or Excel if it is not directly available in the summary. CourseNana.COM

  CourseNana.COM

For example, the point estimate of linear impact of X on Y CourseNana.COM

  CourseNana.COM

CourseNana.COM

  CourseNana.COM

The point estimate of linear impact of X on Y:  , it means when X is increased by 1 unit, Y is increased by 0.037756 unit. It measures the linear impact of X on Y through the SLR model. CourseNana.COM

  CourseNana.COM

a) The standard error of the point estimate of the linear impact of X on Y CourseNana.COM

b) The residual standard error CourseNana.COM

c) The degree of freedom of the residual (the interpretation of this concept will be covered later) CourseNana.COM

d)  The mean square of the standard error CourseNana.COM

e) The standard deviation of the dependent variable Y, denoted by  and briefly explain how it is related to the total sum of variance, SST= CourseNana.COM

  CourseNana.COM

  CourseNana.COM

5. (6) Multiple choice. CourseNana.COM

  CourseNana.COM

·       The tendency, or the form by which of the response variable, Y, varies with X can be estimated with a linear function ____(T/F). The linear function has a true form of  in the population domain.      CourseNana.COM

·       At a general X=Xh level, the predicted value is estimated by . Both  and  are variables and can be estimated by  and  on a sample ________(T/F) CourseNana.COM

·       The deviation between the actual response variable Y and the predicted Y, or  at a given X=Xh level is called the random error and is denoted by _____(/ ) which can be estimated with a value denoted by_____(/ ) in a sample. CourseNana.COM

·       This random error is assumed to have a distribution of  _______(T/F), where the standard deviation, can be estimated by the standard error term denoted by ___ ( / ) computed from a sample.  CourseNana.COM

·       The actual response variable, , represents the linear relationship between X and Y. The two “ingredients” in this relationship can be identified as ________. CourseNana.COM

A.                   B.        C. CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
US代写,Purdue University代写,STAT350代写,Introduction to Statistics代写,Bike project代写,R代写,US代编,Purdue University代编,STAT350代编,Introduction to Statistics代编,Bike project代编,R代编,US代考,Purdue University代考,STAT350代考,Introduction to Statistics代考,Bike project代考,R代考,UShelp,Purdue Universityhelp,STAT350help,Introduction to Statisticshelp,Bike projecthelp,Rhelp,US作业代写,Purdue University作业代写,STAT350作业代写,Introduction to Statistics作业代写,Bike project作业代写,R作业代写,US编程代写,Purdue University编程代写,STAT350编程代写,Introduction to Statistics编程代写,Bike project编程代写,R编程代写,USprogramming help,Purdue Universityprogramming help,STAT350programming help,Introduction to Statisticsprogramming help,Bike projectprogramming help,Rprogramming help,USassignment help,Purdue Universityassignment help,STAT350assignment help,Introduction to Statisticsassignment help,Bike projectassignment help,Rassignment help,USsolution,Purdue Universitysolution,STAT350solution,Introduction to Statisticssolution,Bike projectsolution,Rsolution,