HI All,
I have a question that might be as much about stats as it is about SAS programming. I'm hoping that you folks can help.
I have a sample of 35 records from a population of 9635 that looks like this:
Data have;
Input Unit Repair_cost;
datalines:
1 10,277.00
2 33,615.00
3 23,442.00
4 11,220.00
5 41,321.00
6 40,801.00
7 20,896.00
8 44,753.00
9 28,659.00
10 19,753.00
11 28,760.00
12 24,537.00
13 20,536.00
14 20,959.00
15 5,693.00
16 8,290.00
17 28,715.00
18 41,550.00
19 18,459.00
20 49,197.00
21 28,955.00
22 46,149.00
23 25,273.00
24 45,867.00
25 24,716.00
26 43,519.00
27 27,884.00
28 37,714.00
29 8,001.00
30 42,151.00
31 43,197.00
32 27,245.00
33 31,736.00
34 9,503.00
35 14,946.00
;
run;
I figure I can calculate the SD and 95% confidence limits for the sample by using:
ods select BasicIntervals;
proc univariate data=have cibasic;
var Repair_cost;
run;
That should give me the mean repair cost and 95% confidence interval for an individual unit. My question is, can I then multiply the mean, upper and lower limits by the total population (9635) to get an expected total repair cost and associated confidence limits. It makes intuitive sense to me, but I've found that in stats, my intuition isn't always correct.
If I can't do it this way, can someone suggest the best way to get a predicted total repair cost and associated confidence interval for the entire population of 9635 based on the sample of 35 I have above?
any help is much appreciated.
Thanks so much
Mike
If it's a simple random sample you can use the method you initially suggested.
If it was a sample where the machines do not reflect your population of machines and each one has a specific weight attached to it to match the total population then that would be weighted analysis.
@righcoastmike wrote:
******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).
Is there a guarantee that all units need to be repaired at some point? This is what I would call a back of the napkin type estimate....
@righcoastmike wrote:
HI All,
I have a question that might be as much about stats as it is about SAS programming. I'm hoping that you folks can help.
I have a sample of 35 records from a population of 9635 that looks like this:
Data have; Input Unit Repair_cost; datalines: 1 10,277.00 2 33,615.00 3 23,442.00 4 11,220.00 5 41,321.00 6 40,801.00 7 20,896.00 8 44,753.00 9 28,659.00 10 19,753.00 11 28,760.00 12 24,537.00 13 20,536.00 14 20,959.00 15 5,693.00 16 8,290.00 17 28,715.00 18 41,550.00 19 18,459.00 20 49,197.00 21 28,955.00 22 46,149.00 23 25,273.00 24 45,867.00 25 24,716.00 26 43,519.00 27 27,884.00 28 37,714.00 29 8,001.00 30 42,151.00 31 43,197.00 32 27,245.00 33 31,736.00 34 9,503.00 35 14,946.00 ; run;
I figure I can calculate the SD and 95% confidence limits for the sample by using:
ods select BasicIntervals; proc univariate data=have cibasic; var Repair_cost; run;
That should give me the mean repair cost and 95% confidence interval for an individual unit. My question is, can I then multiply the mean, upper and lower limits by the total population (9635) to get an expected total repair cost and associated confidence limits. It makes intuitive sense to me, but I've found that in stats, my intuition isn't always correct.
If I can't do it this way, can someone suggest the best way to get a predicted total repair cost and associated confidence interval for the entire population of 9635 based on the sample of 35 I have above?
any help is much appreciated.
Thanks so much
Mike
These numbers are the expected costs for each unit in 1 year. so yes, i suppose that they could be considered "guaranteed". We have this data projected out for 10 years (so 10 identical tables to the one I posted for every year from 2018-2027) Basically, assuming that the estimates are correct, we are looking for an estimated total repair cost in each year, as well as total over 10 years with a 95% confidence interval.
not sure if that helps of confuses, but thanks for having a think about this with me.
Mike
No. You can't . It depends on how it sample from population .
Or Calling @Rick_SAS . Maybe he can shed a light .
If it helps, my sample should be considered as a simple random sample.
******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).
If it's a simple random sample you can use the method you initially suggested.
If it was a sample where the machines do not reflect your population of machines and each one has a specific weight attached to it to match the total population then that would be weighted analysis.
@righcoastmike wrote:
******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).
Thanks Reeza, much appreciated.
Mike
This is an interesting question. I think the confidence interval will depend on the assumed distribution of the prices. For example, the sum of IID exponential random variables has a gamma distribution. The sum of IID normal variables is normal.
Assuming a simple random sample, the expected sum is N*XBar, where XBar is the sample mean and N=9635. However, I don't think multiplying the lower/upper limits by N gives the correct CI. I think that interval is too conservative (that is, wider than it needs to me). If you want a ballpark figure, you can use it.
Thanks Rick,
I would rather be too conservative as opposed to not, and for now I think a ballpark would work. At this point though, I'm just curious about how one would go about calculating the CI for the total properly. I'll keep looking and post a response here if I figure anything out.
I think it becomes a prediction interval, not a confidence interval and that would be wider than the confidence interval.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.