STAT177 Homework sheet #1 Solutions

 

1)     We look in the cumulative relative frequency column for 30-40 to find the percentage who are under 40 which is 70.24%

2)     P(at least 20) = 100% - P(Under 20) = 100% - 3.39% = 96.61%

3)     There are 744 clients between the ages of 20 and 30 and 1087 clients between the ages of 30 and 40. So, there are 1831 clients between the ages of 20 and 40. Since there are 2739 clients in total, the percentage between 20 and 40 is 1831/2739 = 66.85%.

4)     Since the median is the 50th percentile, we look to see in which the class the cumulative relative frequency is more than 50%. We see that 30.56% of the clients are under the age of 30, but 70.24% are under the age of 40. That means the median is between the ages of 30 and 40. So we would find the median in the 30-40 age class.

5)     To do this using Excel, I would recreate this frequency table in Excel with the class in column A and the frequency in column B, with the titles in row 1. To create the histogram, click the graph icon and choose the Column type and click Next.

·       The data range is B2:B8.

·       I click the Series tab. The Category (X) axis label is A2:A8. I click Next.

·       Under the Titles tab, I enter “Ages of Outreach Clients” as the chart title.

·       I click the Legends tab and uncheck the Show legend box. I click Finish.

After the graph is created, I can adjust the size of the graph. I can also adjust the point size of the labels by right clicking on them and choosing the Format option. Here is what my graph looks like after adjusting it:

6)     Since the number of times a person eats out is a counting number, the data is discrete.

7)     To create the table, the upper class limits are 2, 4, 6 and 8. Here is the table:

Class

Frequency

Relative Frequency

Cumulative Relative Frequency

1 to 2

5

25.00%

25.00%

3 to 4

9

45.00%

70.00%

5 to 6

4

20.00%

90.00%

7 to 8

2

10.00%

100.00%

8)     From the table, we see it is 90%.

9)     Since 25% eat out no more than twice per month, that means 100% - 25% = 75% eat out more than twice per month.

10) Choose Data Analysis from the Tools menu. (If it is not there, click Add-Ins and check the Analysis ToolPak box.) From Data Analysis choose Descriptive Statistics. Suppose the data is in column A from A1 through A20.

1.     The input range is A1:A20.

2.     I choose B1 for the output range.

3.     I check the Summary statistics box and click OK.

Here is the output:

Column1

 

 

 

Mean

3.85

Standard Error

0.424729

Median

4

Mode

4

Standard Deviation

1.8994459

Sample Variance

3.6078947

Kurtosis

-0.151232

Skewness

0.494213

Range

7

Minimum

1

Maximum

8

Sum

77

Count

20

From this we see that the mean is 3.85 and the standard deviation is 1.9 if we round to 2 decimals.

11) We use the outlier detection from Gerry’s stats tools. Here is the output:

Outlier detection

 

25th percentile: 2.75

 

75th percentile: 5

 

Lower outer fence: -4

 

Lower inner fence: -0.625

 

Upper inner fence: 8.375

 

Upper outer fence: 11.75

 

Lower extreme outliers

None

Lower mild outliers

None

Upper mild outliers

None

Upper extreme outliers

None

We see that there are no outliers.

To begin, we create a crosstab.

 

Do own return

Not do own return

Total

Refund

0.68

0.08

0.76

No refund

0.19

0.05

0.24

Total

0.87

0.13

1.0

12) P(return or refund) = P(return) + P(refund) – P(return and refund) = 87% + 76% - 68% = 95%

13) P(refund | return) = P(return and refund)/P(return) = 68/87 = 78.16%

14) P(refund | not own return) = P(not own return and refund)/P(not own return) = 8/13 = 61.54% which is less than 78.16% from question 13. Those who don’t do their own return are not more likely to get a refund than those who do.

15) P(balance < $1000) = 0.0806 + 0.269 = 0.3496 = 34.96%

16) P(balance < $1000 | income between $35K and $50K) = (0.0508 + 0.1377)/(0.3029) = 0.1885/0.3029 = 0.6223 = 62.23%

17) P(balance < $1000 | income < $75K) = (0.0107 + 0.0508 + 0.0191 + 0.0169 + 0.1377 + 0.0826)/(0.0276 + 0.3029 + 0.3241) = 0.3178/0.6546 = 0.4855 = 48.55%

18) P(income at least $50K | balance at least $1000) = (0.1419 + 0.0911 + 0.0254 + 0.0805 + 0.1229 + 0.0742)/(0.3474 + 0.303) = 0.536/0.6504 = 0.8241 = 82.41%

19) The total percentage that has a balance of $1000 or more is (0.3474 + 0.303) = 0.6504 = 65.04%. We then compute the percentage in each income group:

< 35K

35K – 50K

50K – 75K

75K – 100K

100K +

0/0.6504

 = 0

(0.089 + 0.0254)/

0.6504 = 0.1759

 

(0.1419 + 0.0805)/

0.6504 = 0.3419

(0.0911 + 0.1229)/

0.6504 = 0.329

(0.0254 + 0.0742)/

0.6504 = 0.1531

Thus, if a household has a balance of at least $1000, it is most likely to belong to the 50K-75K income group at 34.19%

20) We look at the crosstab for the values of zero. From this, we see that no household with income of $75,000 or more has a balance under $500. As well, no household with income under $35,000 has a balance of $1000 or more.