BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
righcoastmike
Quartz | Level 8

HI All, 

 

I have a question that might be as much about stats as it is about SAS programming. I'm hoping that you folks can help. 

 

I have a sample of 35 records from a population of 9635 that looks like this: 

 

Data have;
Input Unit Repair_cost;
datalines:
1	10,277.00
2	33,615.00
3	23,442.00
4	11,220.00
5	41,321.00
6	40,801.00
7	20,896.00
8	44,753.00
9	28,659.00
10	19,753.00
11	28,760.00
12	24,537.00
13	20,536.00
14	20,959.00
15	5,693.00
16	8,290.00
17	28,715.00
18	41,550.00
19	18,459.00
20	49,197.00
21	28,955.00
22	46,149.00
23	25,273.00
24	45,867.00
25	24,716.00
26	43,519.00
27	27,884.00
28	37,714.00
29	8,001.00
30	42,151.00
31	43,197.00
32	27,245.00
33	31,736.00
34	9,503.00
35	14,946.00
;
run;

I figure I can calculate the SD and 95% confidence limits for the sample by using: 

 

ods select BasicIntervals;
proc univariate data=have cibasic;
   var Repair_cost;
run;

That should give me the mean repair cost and 95% confidence interval for an individual unit. My question is, can I then multiply the mean, upper and lower limits by the total population (9635) to get an expected total repair cost and associated confidence limits. It makes intuitive sense to me, but I've found that in stats, my intuition isn't always correct. 

 

If I can't do it this way, can someone suggest the best way to get a predicted total repair cost and associated confidence interval for the entire population of 9635 based on the sample of 35 I have above? 

 

any help is much appreciated. 

 

Thanks so much

 

Mike 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

If it's a simple random sample you can use the method you initially suggested. 

 

If it was a sample where the machines do not reflect your population of machines and each one has a specific weight attached to it to match the total population then that would be weighted analysis. 

 


@righcoastmike wrote:

******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).


 

View solution in original post

11 REPLIES 11
Reeza
Super User

Is there a guarantee that all units need to be repaired at some point? This is what I would call a back of the napkin type estimate....

 


@righcoastmike wrote:

HI All, 

 

I have a question that might be as much about stats as it is about SAS programming. I'm hoping that you folks can help. 

 

I have a sample of 35 records from a population of 9635 that looks like this: 

 

Data have;
Input Unit Repair_cost;
datalines:
1	10,277.00
2	33,615.00
3	23,442.00
4	11,220.00
5	41,321.00
6	40,801.00
7	20,896.00
8	44,753.00
9	28,659.00
10	19,753.00
11	28,760.00
12	24,537.00
13	20,536.00
14	20,959.00
15	5,693.00
16	8,290.00
17	28,715.00
18	41,550.00
19	18,459.00
20	49,197.00
21	28,955.00
22	46,149.00
23	25,273.00
24	45,867.00
25	24,716.00
26	43,519.00
27	27,884.00
28	37,714.00
29	8,001.00
30	42,151.00
31	43,197.00
32	27,245.00
33	31,736.00
34	9,503.00
35	14,946.00
;
run;

I figure I can calculate the SD and 95% confidence limits for the sample by using: 

 

ods select BasicIntervals;
proc univariate data=have cibasic;
   var Repair_cost;
run;

That should give me the mean repair cost and 95% confidence interval for an individual unit. My question is, can I then multiply the mean, upper and lower limits by the total population (9635) to get an expected total repair cost and associated confidence limits. It makes intuitive sense to me, but I've found that in stats, my intuition isn't always correct. 

 

If I can't do it this way, can someone suggest the best way to get a predicted total repair cost and associated confidence interval for the entire population of 9635 based on the sample of 35 I have above? 

 

any help is much appreciated. 

 

Thanks so much

 

Mike 


 

righcoastmike
Quartz | Level 8

These numbers are the expected costs for each unit in 1 year. so yes, i suppose that they could be considered "guaranteed". We have this data projected out for 10 years (so 10 identical tables to the one I posted for every year from 2018-2027) Basically, assuming that the estimates are correct, we are looking for an estimated total repair cost in each year, as well as total over 10 years with a 95% confidence interval. 

 

not sure if that helps of confuses, but thanks for having a think about this with me. 

 

Mike

Ksharp
Super User

No. You can't . It depends on how it sample from population .

Or Calling @Rick_SAS . Maybe he can shed a light .

righcoastmike
Quartz | Level 8

If it helps, my sample should be considered as a simple random sample. 

Ksharp
Super User
If I was right, then your estimator of sample is BLUE. i.e. the mean of sample is almost the same as the population. also for mean's CL .
righcoastmike
Quartz | Level 8

******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).

Reeza
Super User

If it's a simple random sample you can use the method you initially suggested. 

 

If it was a sample where the machines do not reflect your population of machines and each one has a specific weight attached to it to match the total population then that would be weighted analysis. 

 


@righcoastmike wrote:

******UPDATE********* I think proc surveymeans might be what I am looking for, but I'm still not sure how to get an expected total repair costs w. 95% confidence intervals for the entire population (9363 Units), based on the data in the sample population (35 units).


 

righcoastmike
Quartz | Level 8

Thanks Reeza, much appreciated. 

 

Mike

Rick_SAS
SAS Super FREQ

This is an interesting question.  I think the confidence interval will depend on the assumed distribution of the prices. For example, the sum of IID exponential random variables has a gamma distribution.  The sum of IID normal variables is normal.

 

Assuming a simple random sample, the expected sum is N*XBar, where XBar is the sample mean and N=9635. However, I don't think multiplying the lower/upper limits by N gives the correct CI. I think that interval is too conservative (that is, wider than it needs to me).  If you want a ballpark figure, you can use it. 

 

 

righcoastmike
Quartz | Level 8

Thanks Rick, 

 

I would rather be too conservative as opposed to not, and for now I think a ballpark would work. At this point though, I'm just curious about how one would go about calculating the CI for the total properly. I'll keep looking and post a response here if I figure anything out. 

Reeza
Super User

I think it becomes a prediction interval, not a confidence interval and that would be wider than the confidence interval. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2593 views
  • 4 likes
  • 4 in conversation