STAT217 Worksheet #6

 

Question 1

You are given the following data of shipping distance and the time it takes for a shipment to travel that distance by courier.

Distance in km

825

215

1070

550

480

920

1350

325

670

1215

Time in days

3.5

1

4

2

1

3

4.5

1.5

3

5

a)     Construct a model in which the shipping time depends on the distance. State what the model is. (time = 0.1181 + 0.0036*distance)

b)     What percentage of the variation in shipping time is explained by the model? (90.05%)

c)     Test the hypothesis that the model is significant at the 5% level of significance. (F = 72.3959; conclude model is significant)

d)     Construct a 95% confidence interval of the slope coefficient. If this interval were used to test the hypothesis in part c, why would the same conclusion be reached? (0.0026 < B1 < 0.0046; hypothesized slope of zero is not in the interval)

e)     If the distance is 500 km, how many days should you expect the shipping time to be? Round to 2 decimals. (1.91 days)

f)      If the distance is 500 km, what is the range of the average shipping time for 95% of the time? Round to 2 decimals. (1.48 to 2.34 days)

g)     For a particular shipment that travels 500 km, what is the range of the shipping time for 95% of the time? Round to 2 decimals. (0.72 to 3.1 days)

 

Question 2

Suppose you are given the following information:

size of home (in thousands of square feet)

home price (in thousands of dollars)

size

1.82

1.59

1.57

1.81

2.01

1.57

1.87

1.82

1.59

1.95

price

173.1

160

164.6

183.5

194.8

166

178.7

181.5

160.5

196.5

a)     Construct a model in which the home price depends on the home size (price = 43.4702 + 75.2556*size)

b)     What percentage of the variation in price is explained by the model? (88.52%)

c)     Test the hypothesis that the model is significant at the 5% level of significance. (F = 61.6842; conclude model is significant)

d)     Construct a 95% confidence interval of the slope coefficient. If this interval were used to test the hypothesis in question 10, why would the same conclusion be reached? (53.1597 < B1 < 97.3515; hypothesized slope of zero is not in the interval)

e)     If a home has 1500 square feet, what would you expect the price to be? Round to the nearest hundred. ($156,400)

f)      If the square footage is 1500 square feet, what is the range of the average home price for 95% of the time? Round to the nearest hundred. ($149,600 to $163,100)

g)     If a particular home with 1500 square feet is put up for sale, what is the range of the selling price of this home for 95% of the time? Round to the nearest hundred. ($143,400 to $169,300)

 

Question 3

You are given the following 11 readings of wood density and stiffness:

Density

Stiffness

Density

Stiffness

21.7

47661

15

25319

15.2

28028

25.6

96305

23.4

104170

15

26222

15.4

25312

24.4

72594

14.5

22148

7

5304

16.7

49499

 

 

An initial model was built in which density is used to predict stiffness. Here is the plot of the fitted values against the residuals:

a)     Why does a linear transformation appear to be in order? (see key)

b)     If density is used to predict stiffness, which is the better transformation of y: natural log or square root based on r2? Find a suitable transformation and create a linear model using the transformed data. (ln stiffness = 7.9059 + 0.145*density)

c)     Is the model significant? Test at a 5% level of significance. (F = 95.7331; conclude the model is significant)

d)     If a piece of wood has a density of 20, what would you expect the stiffness to be? Round to the nearest hundred. (49,300)

e)     Construct a 95% confidence of the average stiffness for a density of 20, rounding the limits to the nearest hundred? (40,600 < my < 59,900)

f)      Suppose a square root transformation had been done instead. What would be the 95% confidence interval of the average stiffness for a density of 20, rounding the limits to the nearest hundred? (45,600 < my < 62,700)

 

Question 4

Given the following data:

speed limit

30

40

50

60

70

80

90

100

110

accidents

8

30

31

27

56

71

79

97

134

A linear regression model was built in which the speed limit was used to predict the number of accidents in a five-year period. Here is the plot of the fitted values against the residuals:

Along with the plot points:

Fitted

2.888889

16.97222

31.05556

45.13889

59.22222

73.30556

87.38889

101.4722

115.5556

Residuals

5.111111

13.02778

-0.05556

-18.1389

-3.22222

-2.30556

-8.38889

-4.47222

18.44444

a)     What would be the best course of action? (see key)

Here is the output of the second model:

SUMMARY OUTPUT

 

 

 

 

 

 

 

 

 

 

 

 

 

Regression Statistics

 

 

 

 

 

Multiple R

0.9823

 

 

 

 

 

R Square

0.9648

 

 

 

 

 

Adjusted R Square

0.9531

 

 

 

 

 

Standard Error

8.6845

 

 

 

 

 

Observations

9

 

 

 

 

 

 

 

 

 

 

 

 

ANOVA

 

 

 

 

 

 

 

df

SS

MS

F

Significance F

 

Regression

2

12419.03175

6209.516

82.332

4.34541E-05

 

Residual

6

452.5238095

75.42063

 

 

 

Total

8

12871.55556

 

 

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

-48.0119

8.9920

-5.3394

0.0018

-70.0147

-26.0092

speed

1.4083

0.1121

12.5613

2E-05

1.1340

1.6827

(speed-70)2

0.0130

0.0049

2.6223

0.0395

0.0009

0.0251

b)     What percentage of the variation in the number of accidents over a 5-year period is explained by the model? (96.48%)

c)     If the speed limit is 60 km per hour, what is the expected number of accidents over a 5-year period? Round to the nearest whole number. (38)

d)     Is the model significant? Test at a 5% level of significance. (conclude model is significant)

e)     Based on a 5% level of significance, why are both independent variables significant? (both p-values < 5%)

 

Question 5

The average gas mileage for vehicles (L/100 km) was recorded as well as the engine size (cc), vehicle weight (kg) and whether or not the vehicle had been serviced in the past 6 months or not (1 = yes, 0 = no). Various models were built to determine which of these factors contributed to gas mileage. Here is a computer output of the initial model using all the variables:

 

Regression model:

 

 

 

 

mileage = -2.5385 + 0.0039size + 0.0003weight - 0.9772service

Percentage of variation in mileage explained by the model: 92.23%

Adjusted for the number of variables: 90.78%

 

Ho: the model is not significant

 

 

 

Ha: the model is significant

 

 

 

Reject Ho if test statistic > 3.239

 

 

 

Test statistic = 63.329

 

 

 

P-value = 0

 

 

 

 

Reject Ho

 

 

 

 

Conclude the model is significant

 

 

 

 

 

 

 

 

 

 

 

95% Confidence Interval

 

 

Coefficient

Lower limit

Upper limit

P-value

VIF

Intercept

-2.5385

-5.4566

0.3796

0.0838

 

size

0.0039

-0.0072

0.0149

0.4714

91.1945

weight

0.0003

-0.0007

0.0014

0.528

91.117

service

-0.9772

-1.759

-0.1953

0.0175

1.0176

 

Next a second model using size and service:

Regression model:

 

 

 

 

mileage = -2.7125 + 0.0072size - 0.9713service

 

Percentage of variation in mileage explained by the model: 92.03%

Adjusted for the number of variables: 91.09%

 

Ho: the model is not significant

 

 

 

Ha: the model is significant

 

 

 

Reject Ho if test statistic > 3.592

 

 

 

Test statistic = 98.157

 

 

 

P-value = 0

 

 

 

 

Reject Ho

 

 

 

 

Conclude the model is significant

 

 

 

 

 

 

 

 

 

 

 

95% Confidence Interval

 

 

Coefficient

Lower limit

Upper limit

P-value

VIF

Intercept

-2.7125

-5.511

0.086

0.0567

 

size

0.0072

0.0061

0.0083

0

1.017

service

-0.9713

-1.7357

-0.2068

0.0158

1.017

 

Finally a third model using weight and service:

Regression model:

 

 

 

 

mileage = -2.2372 + 0.0007weight - 0.9876service

 

Percentage of variation in mileage explained by the model: 91.97%

Adjusted for the number of variables: 91.02%

 

Ho: the model is not significant

 

 

 

Ha: the model is significant

 

 

 

Reject Ho if test statistic > 3.592

 

 

 

Test statistic = 97.331

 

 

 

P-value = 0

 

 

 

 

Reject Ho

 

 

 

 

Conclude the model is significant

 

 

 

 

 

 

 

 

 

 

 

95% Confidence Interval

 

 

Coefficient

Lower limit

Upper limit

P-value

VIF

Intercept

-2.2372

-4.9732

0.4988

0.1026

 

weight

0.0007

0.0006

0.0008

0

1.0161

service

-0.9876

-1.7547

-0.2205

0.0147

1.0161

 

a)     Using the criteria of adjusted r2, ANOVA p-values, individual t tests (testing at 5%) and presence of multicollinearity, which of the 3 models is the best? (model #2)

b)     Using the second model, if an engine is 1500 cc and has been serviced, what would its average mileage be? (7.1162 L/100 km)