- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have calculated the confidence intervals using the reference https://www.lexjansen.com/pharmasug/2003/Posters/P048.pdf and implemented the following code for my dataset;
PROC MEANS DATA=data NOPRINT ; by model reason; VAR default; OUTPUT OUT=xxtmp N=n MEAN=mean STDERR=stderr LCLM=lclm Uclm=uclm ; RUN ; DATA xxtmp15 ; SET xxtmp ; lo = mean - ( TINV ( 0.9 , n-1 ) * stderr ) ; hi = mean + ( TINV ( 0.9 , n-1 ) * stderr ) ; RUN ;
However, as a result of it, I receive a lower limit of the confidence interval as negative.
I understand that I should use the log-normal distribution for avoiding the negative lower limit for confidence intervals. However, I don't exactly know how I can give this input to PROC MEANS.
Any help on this is highly appreciated.
Kind regards,
Mari
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, binomial distribution and your code is appropriate here.
If the confidence interval for the non-defaults is (made up example) 6% to 19%, then the confidence interval for the defaults is that confidence interval subtracted from 100% (or 81% to 94%).
And you can't get negative confidence intervals using the binomial distribution.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Negative values in a confidence interval are not impossible, so explain further why this is a problem.
Also, why do you compute the confidence intervals via a data step when they are computed by PROC MEANS and stored using the UCLM and LCLM option?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The reason, I believe may be due to the small sample size.
I have the very same results with or without using data step.
I read literature that this needs to be done using log-normal distribution and don't find the ways to input this need.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, you still have not explained why you consider negative confidence intervals a problem that needs to be fixed.
You use Lognormal distribution confidence intervals only if your data has a lognormal distribution.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I consider the negative lower confidence interval as a problem as the variable which is a default rate, can not be in negative. The goal is to see how the minimum and maximum values are for a default % for a specific population.
May I know how can I check if my data is log-normally distributed. (Indeed, I am going to check myself as well)
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Lognormal and normal are not appropriate for Rates, which I assume are a percent.
You might be able to make use of binomial distribution confidence intervals, again depending on the distribution of your data. What is the distribution of your data?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think I am understanding better with your comments.
In my case, i calculated the default % for a population (using the fact that whether a client has defaulted or not. 1 if yes, 0 if not). Then Ideally I am looking into the binomial distribution. I found this code helping;
proc freq data=data
by model reason;
tables default/ nocum norow binomial;
output out=results;
exact binomial;
run;
Is the above you think is right to do?
And in case if this is calculated for the non-defaults (0), exploring to know how can I inform SAS to estimate the confidence interval limits for the defaults (1).
Thanks
Kind regards,
Mari
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, binomial distribution and your code is appropriate here.
If the confidence interval for the non-defaults is (made up example) 6% to 19%, then the confidence interval for the defaults is that confidence interval subtracted from 100% (or 81% to 94%).
And you can't get negative confidence intervals using the binomial distribution.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In PROC FREQ, you can use the BINOMIAL LEVEL= option to specify the variable level for the binomial proportion. For example,
tables default / binomial(level='1');
You can use the BINOMIAL CL= option to specify the type(s) of binomial confidence limits to compute. Please see the doc for more info.
It's true that some asymptotic methods might produce an out-of-range confidence limit (e.g., negative) for particular data. PROC FREQ truncates the binomial confidence limits at 0 and 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for your help and triggering comments.
Kind regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ggfggrr wrote:
proc freq data=data by model reason; tables default/ nocum norow binomial; output out=results; exact binomial; run;
Is the above you think is right to do?
And in case if this is calculated for the non-defaults (0), exploring to know how can I inform SAS to estimate the confidence interval limits for the defaults (1).
Just to add to the good advice you've already received:
- As an alternative to computing the "100%−x%" differences you can use the LEVEL= suboption of the BINOMIAL option of the TABLES statement (see documentation) -- EDIT: I hadn't seen @Watts's post, sorry:
tables default / nocum binomial(level='1');
- The NOROW option is redundant here (as no crosstabulation is produced).
- You should add the BINOMIAL keyword to the OUTPUT statement in order to obtain the desired results in the output dataset:
output out=results binomial;
- The EXACT BINOMIAL statement is not needed for the exact confidence interval, but requests an exact test (in your case: of the default null hypothesis P=0.5). Do you really want this?
- I assume the missing semicolon after your PROC FREQ statement is only a typo.
- If you have downloaded the PDF file from the URL in your first post, you may want to change the file name to something like P048_FAULTY!.pdf or delete it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks so much and it helps me a lot in knowing these options.
Kind regards.
Mari