STAT177 Homework sheet #1 Solutions
1) We look in the cumulative relative frequency column for 30-40 to find the percentage who are under 40 which is 70.24%
2) P(at least 20) = 100% - P(Under 20) = 100% - 3.39% = 96.61%
3) There are 744 clients between the ages of 20 and 30 and 1087 clients between the ages of 30 and 40. So, there are 1831 clients between the ages of 20 and 40. Since there are 2739 clients in total, the percentage between 20 and 40 is 1831/2739 = 66.85%.
4) Since
the median is the 50th percentile, we look to see in which the class
the cumulative relative frequency is more than 50%. We see that 30.56% of the
clients are under the age of 30, but 70.24% are under the age of 40. That means
the median is between the ages of 30 and 40. So we would find the median in the
30-40 age class.
5) To
do this using Excel, I would recreate this frequency table in Excel with the
class in column A and the frequency in column B, with the titles in row 1. To
create the histogram, click the graph icon and choose the Column type and click
Next.
·
The data range is B2:B8.
·
I click the Series tab. The Category (X) axis label is
A2:A8. I click Next.
·
Under the Titles tab, I enter “Ages of Outreach
Clients” as the chart title.
·
I click the Legends tab and uncheck the Show legend
box. I click Finish.
After the graph is created, I can adjust the size of the graph. I can also adjust the point size of the labels by right clicking on them and choosing the Format option. Here is what my graph looks like after adjusting it:

6) Since the number of times a person eats out is a counting number, the data is discrete.
7) To create the table, the upper class limits are 2, 4, 6 and 8. Here is the table:
|
Class |
Frequency |
Relative Frequency |
Cumulative Relative Frequency |
|
1 to 2 |
5 |
25.00% |
25.00% |
|
3 to 4 |
9 |
45.00% |
70.00% |
|
5 to 6 |
4 |
20.00% |
90.00% |
|
7 to 8 |
2 |
10.00% |
100.00% |
8) From the table, we see it is 90%.
9) Since 25% eat out no more than twice per month, that means 100% - 25% = 75% eat out more than twice per month.
10) Choose
Data Analysis from the Tools menu. (If it is not there, click Add-Ins and check
the Analysis ToolPak box.) From Data Analysis choose Descriptive Statistics.
Suppose the data is in column A from A1 through A20.
1. The
input range is A1:A20.
2. I
choose B1 for the output range.
3. I
check the Summary statistics box and click OK.
Here is the output:
|
Column1 |
|
|
|
|
|
Mean |
3.85 |
|
Standard
Error |
0.424729 |
|
Median |
4 |
|
Mode |
4 |
|
Standard
Deviation |
1.8994459 |
|
Sample
Variance |
3.6078947 |
|
Kurtosis |
-0.151232 |
|
Skewness |
0.494213 |
|
Range |
7 |
|
Minimum |
1 |
|
Maximum |
8 |
|
Sum |
77 |
|
Count |
20 |
From this we see that the mean is 3.85 and the standard deviation is 1.9 if we round to 2 decimals.
11) We use the outlier detection from Gerry’s stats tools. Here is the output:
|
Outlier
detection |
|
|
25th
percentile: 2.75 |
|
|
75th
percentile: 5 |
|
|
Lower
outer fence: -4 |
|
|
Lower
inner fence: -0.625 |
|
|
Upper
inner fence: 8.375 |
|
|
Upper
outer fence: 11.75 |
|
|
Lower
extreme outliers |
None |
|
Lower
mild outliers |
None |
|
Upper
mild outliers |
None |
|
Upper
extreme outliers |
None |
We see that there are no outliers.
To begin, we create a crosstab.
|
|
Do own return |
Not do own return |
Total |
|
Refund |
0.68 |
0.08 |
0.76 |
|
No refund |
0.19 |
0.05 |
0.24 |
|
Total |
0.87 |
0.13 |
1.0 |
12) P(return
or refund) = P(return) + P(refund) – P(return and refund) = 87% + 76% - 68% =
95%
13) P(refund | return) = P(return and refund)/P(return) = 68/87 = 78.16%
14) P(refund
| not own return) = P(not own return and refund)/P(not own return) = 8/13 =
61.54% which is less than 78.16% from question 13. Those who don’t do their own
return are not more likely to get a refund than those who do.
15) P(balance
< $1000) = 0.0806 + 0.269 = 0.3496 = 34.96%
16) P(balance
< $1000 | income between $35K and $50K) = (0.0508 + 0.1377)/(0.3029) =
0.1885/0.3029 = 0.6223 = 62.23%
17) P(balance
< $1000 | income < $75K) = (0.0107 + 0.0508 + 0.0191 + 0.0169 + 0.1377 +
0.0826)/(0.0276 + 0.3029 + 0.3241) = 0.3178/0.6546 = 0.4855 = 48.55%
18) P(income
at least $50K | balance at least $1000) = (0.1419 + 0.0911 + 0.0254 + 0.0805 +
0.1229 + 0.0742)/(0.3474 + 0.303) = 0.536/0.6504 = 0.8241 = 82.41%
19) The total percentage that has a balance of $1000 or more is (0.3474 + 0.303) = 0.6504 = 65.04%. We then compute the percentage in each income group:
|
<
35K |
35K
– 50K |
50K
– 75K |
75K
– 100K |
100K
+ |
|
0/0.6504
= 0 |
(0.089
+ 0.0254)/ 0.6504
= 0.1759 |
(0.1419
+ 0.0805)/ 0.6504
= 0.3419 |
(0.0911
+ 0.1229)/ 0.6504
= 0.329 |
(0.0254
+ 0.0742)/ 0.6504
= 0.1531 |
Thus, if a household has a balance of at least $1000, it is most likely to belong to the 50K-75K income group at 34.19%
20) We look at the crosstab for the values of zero. From this, we see that no household with income of $75,000 or more has a balance under $500. As well, no household with income under $35,000 has a balance of $1000 or more.