STAT217 final exam practice questions

 

This covers material since the midterm. Of course, you should be able to solve problems from before the midterm.

 

Question 1

An office supply store examined annual expenditures on office supplies for a sample of its customers, segregating them by number of employees. These were the results:

Under 10

10 to 49

50 to 99

100+

50

85

120

102

125

92

105

125

80

65

114

134

72

110

95

105

65

100

120

122

75

102

117

140

85

98

102

117

95

102

135

128

Analysis of the data indicates it is normally distributed with equal variances.

a)      Is there any significant difference in average annual expenditures among the companies of different sizes? Test at a 5% level of significance.

b)      For which size companies is there a significant difference? Test at a 5% level of significance.

c)      Construct a 95% simultaneous confidence interval of the difference between the groups with the largest and smallest averages. Why is this confidence interval consistent with the results of the analysis in part a?

 

Question 2

Four special-ed classes were taught math using different pedagogies. These were the results of the final exam:

Class 1

Class 2

Class 3

Class 4

52

62

24

8

60

70

80

92

29

71

82

12

84

72

39

15

43

18

85

87

33

65

32

80

40

65

39

16

51

69

41

85

Analysis of the data indicates that not all the class data are normally distributed. (Classes 3 and 4 in particular appear to be bimodal.)  Is there any significant difference in the pedagogies? Test at a 5% level of significance.

 


Question 3

A sales office wanted to examine the relationship between the number of hours per week its sales staff cold-called and gross monthly income. These are the results (income in thousands of dollars):

hours

12

15

10

20

14

30

16

22

8

17

income

6.2

6.8

4.5

9.2

6.6

12.2

7.4

10.3

4.8

8.9

a)      If the number of hours cold-calling is used to predict monthly gross income, state the model. Round the values to 4 decimals.

b)      What percentage of the variation in income is explained by the number of hours cold-calling?

c)      Is the model significant? Test at a 5% level of significance.

d)     Construct a 95% confidence interval of the slope. If this confidence interval were used to test the hypothesis in part c, why would the same conclusion be reached?

e)      Based on 25 hours of cold-calling per week, what would be the expected gross monthly income? Round to the nearest dollar.

f)       Construct a 95% confidence interval of the average gross monthly income based on 25 hours of cold calling per week.

 

Question 4

A researcher wanted to determine if salaried workers were working more than 10 hours of overtime per month on average but had no prior data to work with. A survey of people had the following results:

7

12

14

8

12.5

16

11

13

20

Does the data follow a normal distribution? Test at a 5% level of significance. Use the following table:

X

Z

F(z)

S(z)

S'(z)

D

7

-1.43

0.0764

8

-1.17

0.121

11

-0.41

0.3409

12

-0.16

0.4364

12.5

-0.03

0.488

13

0.10

0.5398

14

0.35

0.6368

16

0.86

0.8051

20

1.88

0.97

 

 


 

Question 5

Three judges were asked to rate 6 pairs of dancers. These were the results:

 

Judge 1

Judge 2

Judge 3

Pair 1

5

6

6

Pair 2

10

9

9

Pair 3

7

5

5

Pair 4

1

1

3

Pair 5

8

8

8

Pair 6

7

8

8

a)      Is there any significant difference among the judges in how they rated the pairs? Test at a 5% level of significance.

b)      Is there any significant difference among the couples in how they are rated? Test at a 5% level of significance.

 

Question 6

A company with 4 offices examined the monthly expenses in a variety of categories. These were the results (in thousands of dollars):

 

Office 1

Office 2

Office 3

Office 4

Salary

50.2

46.3

60.7

52.3

Utilities

1.3

1.4

0.9

1.5

Office supplies

0.7

0.6

1.3

0.9

Transportation

1.2

0.8

1.5

0.9

Entertainment

0.5

0.4

0.6

0.5

Miscellaneous

2.3

1.8

1.7

1.9

Analysis of the data in each category indicates it is normally distributed with equal variances.

a)      Is there any significant difference among the offices in their average monthly expenses? Test at a 5% level of significance.

b)      If salary is excluded, is there any significant difference among the expense categories? Test at a 5% level of significance.

c)      Based on the expense categories used in part b, between which expense categories is there a significant difference at a 5% level of significance?

 


 

Question 7

A researcher examined the relationship between gross annual income and number of employees. These were the results:

 

< 10

10 to 19

20 to 49

50 +

Total

< $10K

286

300

142

5

733

$10K to under $25K

42

152

130

29

353

$25K to under $50K

7

62

25

56

150

$50K +

0

9

8

89

106

Total

335

523

305

179

1342

a)      If we test to see if gross annual income depends on the number of employees at a 5% level of significance, show why the first term of the test statistic (using the observed value of 286) is sufficient to reject the null hypothesis.

b)      To what degree does gross annual income depend on the number of employees? Round to 2 decimals.

 

Question 8

A real estate firm wanted to see which of square footage (reported in hundreds), number of bedrooms, whether the house has an attached garage or not (attached=1) and whether the house has a developed basement or not (developed=1) contributes to the selling price of a house (reported in thousands). The initial model using all the variables had the following results:

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.9456

R Square

0.8942

Adjusted R Square

0.8338

Standard Error

36.1073

Observations

12

ANOVA

 

df

SS

MS

F

Significance F

Regression

4

77145.6604

19286.4151

14.7932

0.0016

Residual

7

9126.1687

1303.7384

Total

11

86271.8292

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

VIF

Intercept

-11.0880

76.5998

-0.1448

0.8890

-192.2176

170.0415

sq. ft.

21.5768

4.5930

4.6978

0.0022

10.7161

32.4376

2.5561

bedrooms

33.3956

13.0367

2.5617

0.0375

2.5687

64.2226

1.3362

garage

-30.7437

36.9822

-0.8313

0.4332

-118.1927

56.7053

2.7975

basement

38.9510

27.0114

1.4420

0.1925

-24.9207

102.8227

1.6323

 

a)      If a house has 1500 square feet, 3 bedrooms, an attached garage and an undeveloped basement, what would be the average selling price? Round to the nearest thousand.

b)      Why does the initial model not suffer from multicollinearity?

 

Two more models were built:

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.9276

R Square

0.8605

Adjusted R Square

0.8295

Standard Error

36.5705

Observations

12

ANOVA

 

df

SS

MS

F

Significance F

Regression

2

74235.2359

37117.6180

27.7536

0.0001

Residual

9

12036.5932

1337.3992

Total

11

86271.8292

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

VIF

Intercept

-15.3659

60.4445

-0.2542

0.8050

-152.1008

121.3691

sq. ft.

21.2965

2.9506

7.2178

0.0000

14.6219

27.9711

1.0283

bedrooms

34.9638

11.5832

3.0185

0.0145

8.7608

61.1668

1.0283

 

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.8481

R Square

0.7192

Adjusted R Square

0.6912

Standard Error

49.2159

Observations

12

ANOVA

 

df

SS

MS

F

Significance F

Regression

1

62049.7365

62049.7365

25.6170

0.0005

Residual

10

24222.0927

2422.2093

Total

11

86271.8292

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

VIF

Intercept

103.4504

61.7319

1.6758

0.1247

-34.0969

240.9977

sq. ft.

19.8191

3.9158

5.0613

0.0005

11.0942

28.5440

1.0000

 

c)      Using the criteria of adjusted r2, ANOVA p-value, t-test p-values (compared to 5%), which one of these models would you consider the best?