Solved
Contributor
Posts: 46

# Inferring data about a large population from a small sample

HI All,

I have a question that might be as much about stats as it is about SAS programming. I'm hoping that you folks can help.

I have a sample of 35 records from a population of 9635 that looks like this:

``````Data have;
Input Unit Repair_cost;
datalines:
1	10,277.00
2	33,615.00
3	23,442.00
4	11,220.00
5	41,321.00
6	40,801.00
7	20,896.00
8	44,753.00
9	28,659.00
10	19,753.00
11	28,760.00
12	24,537.00
13	20,536.00
14	20,959.00
15	5,693.00
16	8,290.00
17	28,715.00
18	41,550.00
19	18,459.00
20	49,197.00
21	28,955.00
22	46,149.00
23	25,273.00
24	45,867.00
25	24,716.00
26	43,519.00
27	27,884.00
28	37,714.00
29	8,001.00
30	42,151.00
31	43,197.00
32	27,245.00
33	31,736.00
34	9,503.00
35	14,946.00
;
run;``````

I figure I can calculate the SD and 95% confidence limits for the sample by using:

``````ods select BasicIntervals;
proc univariate data=have cibasic;
var Repair_cost;
run;``````

That should give me the mean repair cost and 95% confidence interval for an individual unit. My question is, can I then multiply the mean, upper and lower limits by the total population (9635) to get an expected total repair cost and associated confidence limits. It makes intuitive sense to me, but I've found that in stats, my intuition isn't always correct.

If I can't do it this way, can someone suggest the best way to get a predicted total repair cost and associated confidence interval for the entire population of 9635 based on the sample of 35 I have above?

any help is much appreciated.

Thanks so much

Mike

Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 23,663

## Re: Inferring data about a large population from a small sample

If it's a simple random sample you can use the method you initially suggested.

If it was a sample where the machines do not reflect your population of machines and each one has a specific weight attached to it to match the total population then that would be weighted analysis.

@righcoastmike wrote:

******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).

All Replies
Super User
Posts: 23,663

## Re: Inferring data about a large population from a small sample

[ Edited ]

Is there a guarantee that all units need to be repaired at some point? This is what I would call a back of the napkin type estimate....

@righcoastmike wrote:

HI All,

I have a question that might be as much about stats as it is about SAS programming. I'm hoping that you folks can help.

I have a sample of 35 records from a population of 9635 that looks like this:

``````Data have;
Input Unit Repair_cost;
datalines:
1	10,277.00
2	33,615.00
3	23,442.00
4	11,220.00
5	41,321.00
6	40,801.00
7	20,896.00
8	44,753.00
9	28,659.00
10	19,753.00
11	28,760.00
12	24,537.00
13	20,536.00
14	20,959.00
15	5,693.00
16	8,290.00
17	28,715.00
18	41,550.00
19	18,459.00
20	49,197.00
21	28,955.00
22	46,149.00
23	25,273.00
24	45,867.00
25	24,716.00
26	43,519.00
27	27,884.00
28	37,714.00
29	8,001.00
30	42,151.00
31	43,197.00
32	27,245.00
33	31,736.00
34	9,503.00
35	14,946.00
;
run;``````

I figure I can calculate the SD and 95% confidence limits for the sample by using:

``````ods select BasicIntervals;
proc univariate data=have cibasic;
var Repair_cost;
run;``````

That should give me the mean repair cost and 95% confidence interval for an individual unit. My question is, can I then multiply the mean, upper and lower limits by the total population (9635) to get an expected total repair cost and associated confidence limits. It makes intuitive sense to me, but I've found that in stats, my intuition isn't always correct.

If I can't do it this way, can someone suggest the best way to get a predicted total repair cost and associated confidence interval for the entire population of 9635 based on the sample of 35 I have above?

any help is much appreciated.

Thanks so much

Mike

Contributor
Posts: 46

## Re: Inferring data about a large population from a small sample

These numbers are the expected costs for each unit in 1 year. so yes, i suppose that they could be considered "guaranteed". We have this data projected out for 10 years (so 10 identical tables to the one I posted for every year from 2018-2027) Basically, assuming that the estimates are correct, we are looking for an estimated total repair cost in each year, as well as total over 10 years with a 95% confidence interval.

not sure if that helps of confuses, but thanks for having a think about this with me.

Mike

Super User
Posts: 10,761

## Re: Inferring data about a large population from a small sample

No. You can't . It depends on how it sample from population .

Or Calling @Rick_SAS . Maybe he can shed a light .

Contributor
Posts: 46

## Re: Inferring data about a large population from a small sample

[ Edited ]

If it helps, my sample should be considered as a simple random sample.

Super User
Posts: 10,761

## Re: Inferring data about a large population from a small sample

If I was right, then your estimator of sample is BLUE. i.e. the mean of sample is almost the same as the population. also for mean's CL .
Contributor
Posts: 46

## Re: Inferring data about a large population from a small sample

******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).

Solution
3 weeks ago
Super User
Posts: 23,663

## Re: Inferring data about a large population from a small sample

If it's a simple random sample you can use the method you initially suggested.

If it was a sample where the machines do not reflect your population of machines and each one has a specific weight attached to it to match the total population then that would be weighted analysis.

@righcoastmike wrote:

******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).

Contributor
Posts: 46

## Re: Inferring data about a large population from a small sample

Thanks Reeza, much appreciated.

Mike

SAS Super FREQ
Posts: 4,237

## Re: Inferring data about a large population from a small sample

This is an interesting question.  I think the confidence interval will depend on the assumed distribution of the prices. For example, the sum of IID exponential random variables has a gamma distribution.  The sum of IID normal variables is normal.

Assuming a simple random sample, the expected sum is N*XBar, where XBar is the sample mean and N=9635. However, I don't think multiplying the lower/upper limits by N gives the correct CI. I think that interval is too conservative (that is, wider than it needs to me).  If you want a ballpark figure, you can use it.

Contributor
Posts: 46

## Re: Inferring data about a large population from a small sample

Thanks Rick,

I would rather be too conservative as opposed to not, and for now I think a ballpark would work. At this point though, I'm just curious about how one would go about calculating the CI for the total properly. I'll keep looking and post a response here if I figure anything out.

Super User
Posts: 23,663