STAT217
final exam practice questions
This covers
material since the midterm. Of course, you should be able to solve problems
from before the midterm.
Question 1
An office
supply store examined annual expenditures on office supplies for a sample of
its customers, segregating them by number of employees. These were the results:
|
Under
10 |
10
to 49 |
50
to 99 |
100+ |
|
50 |
85 |
120 |
102 |
|
125 |
92 |
105 |
125 |
|
80 |
65 |
114 |
134 |
|
72 |
110 |
95 |
105 |
|
65 |
100 |
120 |
122 |
|
75 |
102 |
117 |
140 |
|
85 |
98 |
102 |
117 |
|
95 |
102 |
135 |
128 |
Analysis of
the data indicates it is normally distributed with equal variances.
a)
Is
there any significant difference in average annual expenditures among the
companies of different sizes? Test at a 5% level of significance.
b)
For
which size companies is there a significant difference? Test at a 5% level of
significance.
c)
Construct
a 95% simultaneous confidence interval of the difference between the groups
with the largest and smallest averages. Why is this confidence interval
consistent with the results of the analysis in part a?
Question 2
Four
special-ed classes were taught math using different
pedagogies. These were the results of the final exam:
|
Class
1 |
Class
2 |
Class
3 |
Class
4 |
|
52 |
62 |
24 |
8 |
|
60 |
70 |
80 |
92 |
|
29 |
71 |
82 |
12 |
|
84 |
72 |
39 |
15 |
|
43 |
18 |
85 |
87 |
|
33 |
65 |
32 |
80 |
|
40 |
65 |
39 |
16 |
|
51 |
69 |
41 |
85 |
Analysis of
the data indicates that not all the class data are normally distributed.
(Classes 3 and 4 in particular appear to be bimodal.) Is there any significant difference in the
pedagogies? Test at a 5% level of significance.
Question 3
A sales
office wanted to examine the relationship between the number of hours per week
its sales staff cold-called and gross monthly income. These are the results
(income in thousands of dollars):
|
hours |
12 |
15 |
10 |
20 |
14 |
30 |
16 |
22 |
8 |
17 |
|
income |
6.2 |
6.8 |
4.5 |
9.2 |
6.6 |
12.2 |
7.4 |
10.3 |
4.8 |
8.9 |
a)
If
the number of hours cold-calling is used to predict
monthly gross income, state the model. Round the values to 4 decimals.
b)
What
percentage of the variation in income is explained by the number of hours
cold-calling?
c)
Is
the model significant? Test at a 5% level of significance.
d)
Construct
a 95% confidence interval of the slope. If this confidence interval were used
to test the hypothesis in part c, why would the same conclusion be reached?
e)
Based
on 25 hours of cold-calling per week, what would be the expected gross monthly
income? Round to the nearest dollar.
f)
Construct
a 95% confidence interval of the average gross monthly income based on 25 hours
of cold calling per week.
Question 4
A
researcher wanted to determine if salaried workers were working more than 10
hours of overtime per month on average but had no prior data to work with. A
survey of people had the following results:
|
7 |
12 |
14 |
|
8 |
12.5 |
16 |
|
11 |
13 |
20 |
Does the
data follow a normal distribution? Test at a 5% level of significance. Use the
following table:
|
X |
Z |
F(z) |
S(z) |
S'(z) |
D |
|
7 |
-1.43 |
0.0764 |
|||
|
8 |
-1.17 |
0.121 |
|||
|
11 |
-0.41 |
0.3409 |
|||
|
12 |
-0.16 |
0.4364 |
|||
|
12.5 |
-0.03 |
0.488 |
|||
|
13 |
0.10 |
0.5398 |
|||
|
14 |
0.35 |
0.6368 |
|||
|
16 |
0.86 |
0.8051 |
|||
|
20 |
1.88 |
0.97 |
Question 5
Three
judges were asked to rate 6 pairs of dancers. These were the results:
|
|
Judge
1 |
Judge
2 |
Judge
3 |
|
Pair
1 |
5 |
6 |
6 |
|
Pair
2 |
10 |
9 |
9 |
|
Pair
3 |
7 |
5 |
5 |
|
Pair
4 |
1 |
1 |
3 |
|
Pair
5 |
8 |
8 |
8 |
|
Pair
6 |
7 |
8 |
8 |
a)
Is
there any significant difference among the judges in how they rated the pairs?
Test at a 5% level of significance.
b)
Is
there any significant difference among the couples in how they are rated? Test
at a 5% level of significance.
Question 6
A company
with 4 offices examined the monthly expenses in a variety of categories. These
were the results (in thousands of dollars):
|
|
Office
1 |
Office
2 |
Office
3 |
Office
4 |
|
Salary |
50.2 |
46.3 |
60.7 |
52.3 |
|
Utilities |
1.3 |
1.4 |
0.9 |
1.5 |
|
Office
supplies |
0.7 |
0.6 |
1.3 |
0.9 |
|
Transportation |
1.2 |
0.8 |
1.5 |
0.9 |
|
Entertainment |
0.5 |
0.4 |
0.6 |
0.5 |
|
Miscellaneous |
2.3 |
1.8 |
1.7 |
1.9 |
Analysis of
the data in each category indicates it is normally distributed with equal
variances.
a)
Is
there any significant difference among the offices in their average monthly
expenses? Test at a 5% level of significance.
b)
If
salary is excluded, is there any significant difference among the expense
categories? Test at a 5% level of significance.
c)
Based
on the expense categories used in part b, between which expense categories is
there a significant difference at a 5% level of significance?
Question 7
A
researcher examined the relationship between gross annual income and number of
employees. These were the results:
|
|
<
10 |
10
to 19 |
20
to 49 |
50
+ |
Total |
|
<
$10K |
286 |
300 |
142 |
5 |
733 |
|
$10K
to under $25K |
42 |
152 |
130 |
29 |
353 |
|
$25K
to under $50K |
7 |
62 |
25 |
56 |
150 |
|
$50K
+ |
0 |
9 |
8 |
89 |
106 |
|
Total |
335 |
523 |
305 |
179 |
1342 |
a)
If
we test to see if gross annual income depends on the number of employees at a
5% level of significance, show why the first term of the test statistic (using
the observed value of 286) is sufficient to reject the null hypothesis.
b)
To
what degree does gross annual income depend on the number of employees? Round
to 2 decimals.
Question 8
A real
estate firm wanted to see which of square footage (reported in hundreds),
number of bedrooms, whether the house has an attached garage or not
(attached=1) and whether the house has a developed basement or not
(developed=1) contributes to the selling price of a house (reported in
thousands). The initial model using all the variables had the following
results:
|
SUMMARY OUTPUT |
|||||||
|
Regression
Statistics |
|||||||
|
Multiple R |
0.9456 |
||||||
|
R Square |
0.8942 |
||||||
|
Adjusted R Square |
0.8338 |
||||||
|
Standard Error |
36.1073 |
||||||
|
Observations |
12 |
||||||
|
ANOVA |
|||||||
|
|
df |
SS |
MS |
F |
Significance
F |
||
|
Regression |
4 |
77145.6604 |
19286.4151 |
14.7932 |
0.0016 |
||
|
Residual |
7 |
9126.1687 |
1303.7384 |
||||
|
Total |
11 |
86271.8292 |
|
|
|
||
|
|
Coefficients |
Standard
Error |
t
Stat |
P-value |
Lower
95% |
Upper
95% |
VIF |
|
Intercept |
-11.0880 |
76.5998 |
-0.1448 |
0.8890 |
-192.2176 |
170.0415 |
|
|
sq. ft. |
21.5768 |
4.5930 |
4.6978 |
0.0022 |
10.7161 |
32.4376 |
2.5561 |
|
bedrooms |
33.3956 |
13.0367 |
2.5617 |
0.0375 |
2.5687 |
64.2226 |
1.3362 |
|
garage |
-30.7437 |
36.9822 |
-0.8313 |
0.4332 |
-118.1927 |
56.7053 |
2.7975 |
|
basement |
38.9510 |
27.0114 |
1.4420 |
0.1925 |
-24.9207 |
102.8227 |
1.6323 |
a)
If
a house has 1500 square feet, 3 bedrooms, an attached garage and an undeveloped
basement, what would be the average selling price? Round to the nearest
thousand.
b)
Why
does the initial model not suffer from multicollinearity?
Two more
models were built:
|
SUMMARY OUTPUT |
|||||||
|
Regression
Statistics |
|||||||
|
Multiple R |
0.9276 |
||||||
|
R Square |
0.8605 |
||||||
|
Adjusted R Square |
0.8295 |
||||||
|
Standard Error |
36.5705 |
||||||
|
Observations |
12 |
||||||
|
ANOVA |
|||||||
|
|
df |
SS |
MS |
F |
Significance
F |
||
|
Regression |
2 |
74235.2359 |
37117.6180 |
27.7536 |
0.0001 |
||
|
Residual |
9 |
12036.5932 |
1337.3992 |
||||
|
Total |
11 |
86271.8292 |
|
|
|
||
|
|
Coefficients |
Standard
Error |
t
Stat |
P-value |
Lower
95% |
Upper
95% |
VIF |
|
Intercept |
-15.3659 |
60.4445 |
-0.2542 |
0.8050 |
-152.1008 |
121.3691 |
|
|
sq. ft. |
21.2965 |
2.9506 |
7.2178 |
0.0000 |
14.6219 |
27.9711 |
1.0283 |
|
bedrooms |
34.9638 |
11.5832 |
3.0185 |
0.0145 |
8.7608 |
61.1668 |
1.0283 |
|
SUMMARY OUTPUT |
|||||||
|
Regression
Statistics |
|||||||
|
Multiple R |
0.8481 |
||||||
|
R Square |
0.7192 |
||||||
|
Adjusted R Square |
0.6912 |
||||||
|
Standard Error |
49.2159 |
||||||
|
Observations |
12 |
||||||
|
ANOVA |
|||||||
|
|
df |
SS |
MS |
F |
Significance
F |
||
|
Regression |
1 |
62049.7365 |
62049.7365 |
25.6170 |
0.0005 |
||
|
Residual |
10 |
24222.0927 |
2422.2093 |
||||
|
Total |
11 |
86271.8292 |
|
|
|
||
|
|
Coefficients |
Standard
Error |
t
Stat |
P-value |
Lower
95% |
Upper
95% |
VIF |
|
Intercept |
103.4504 |
61.7319 |
1.6758 |
0.1247 |
-34.0969 |
240.9977 |
|
|
sq. ft. |
19.8191 |
3.9158 |
5.0613 |
0.0005 |
11.0942 |
28.5440 |
1.0000 |
c)
Using
the criteria of adjusted r2, ANOVA p-value, t-test p-values
(compared to 5%), which one of these models would you consider the best?