Solved: Re: Negative Confidence Interval using PROC means

ggfggrr · Posted 06-18-2019 10:16 AM

I have calculated the confidence intervals using the reference https://www.lexjansen.com/pharmasug/2003/Posters/P048.pdf and implemented the following code for my dataset;

PROC MEANS DATA=data NOPRINT ;
by model reason;
VAR default;
OUTPUT OUT=xxtmp N=n MEAN=mean
STDERR=stderr LCLM=lclm Uclm=uclm ;
RUN ;

DATA xxtmp15 ;
SET xxtmp ;
lo = mean - ( TINV ( 0.9 , n-1 ) * stderr ) ;
hi = mean + ( TINV ( 0.9 , n-1 ) * stderr ) ;
RUN ;

However, as a result of it, I receive a lower limit of the confidence interval as negative.

I understand that I should use the log-normal distribution for avoiding the negative lower limit for confidence intervals. However, I don't exactly know how I can give this input to PROC MEANS.

Any help on this is highly appreciated.

Kind regards,

Mari

PaigeMiller · Posted 06-18-2019 12:55 PM

Yes, binomial distribution and your code is appropriate here.

If the confidence interval for the non-defaults is (made up example) 6% to 19%, then the confidence interval for the defaults is that confidence interval subtracted from 100% (or 81% to 94%).

And you can't get negative confidence intervals using the binomial distribution.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 06-18-2019 10:29 AM

Negative values in a confidence interval are not impossible, so explain further why this is a problem.

Also, why do you compute the confidence intervals via a data step when they are computed by PROC MEANS and stored using the UCLM and LCLM option?

--
Paige Miller

ggfggrr · Posted 06-18-2019 10:38 AM

The reason, I believe may be due to the small sample size.

I have the very same results with or without using data step.

I read literature that this needs to be done using log-normal distribution and don't find the ways to input this need.

Thanks

PaigeMiller · Posted 06-18-2019 10:50 AM

So, you still have not explained why you consider negative confidence intervals a problem that needs to be fixed.

You use Lognormal distribution confidence intervals only if your data has a lognormal distribution.

--
Paige Miller

ggfggrr · Posted 06-18-2019 10:53 AM

I consider the negative lower confidence interval as a problem as the variable which is a default rate, can not be in negative. The goal is to see how the minimum and maximum values are for a default % for a specific population.

May I know how can I check if my data is log-normally distributed. (Indeed, I am going to check myself as well)

Thanks

PaigeMiller · Posted 06-18-2019 10:56 AM

Lognormal and normal are not appropriate for Rates, which I assume are a percent.

You might be able to make use of binomial distribution confidence intervals, again depending on the distribution of your data. What is the distribution of your data?

--
Paige Miller

ggfggrr · Posted 06-18-2019 11:46 AM

I think I am understanding better with your comments.

In my case, i calculated the default % for a population (using the fact that whether a client has defaulted or not. 1 if yes, 0 if not). Then Ideally I am looking into the binomial distribution. I found this code helping;

proc freq data=data
by model reason;
tables default/ nocum norow binomial;
output out=results;
exact binomial;
run;

Is the above you think is right to do?

And in case if this is calculated for the non-defaults (0), exploring to know how can I inform SAS to estimate the confidence interval limits for the defaults (1).

Thanks

Kind regards,

Mari

PaigeMiller · Posted 06-18-2019 12:55 PM

Yes, binomial distribution and your code is appropriate here.

If the confidence interval for the non-defaults is (made up example) 6% to 19%, then the confidence interval for the defaults is that confidence interval subtracted from 100% (or 81% to 94%).

And you can't get negative confidence intervals using the binomial distribution.

--
Paige Miller

Watts · Posted 06-18-2019 01:15 PM

In PROC FREQ, you can use the BINOMIAL LEVEL= option to specify the variable level for the binomial proportion. For example,

tables default / binomial(level='1');

You can use the BINOMIAL CL= option to specify the type(s) of binomial confidence limits to compute. Please see the doc for more info.

It's true that some asymptotic methods might produce an out-of-range confidence limit (e.g., negative) for particular data. PROC FREQ truncates the binomial confidence limits at 0 and 1.

ggfggrr · Posted 06-18-2019 01:18 PM

Thank you so much for your help and triggering comments.

Kind regards,

FreelanceReinh · Posted 06-18-2019 01:20 PM

@ggfggrr wrote:
proc freq data=data
by model reason;
tables default/ nocum norow binomial;
output out=results;
exact binomial;
run; 
Is the above you think is right to do?

And in case if this is calculated for the non-defaults (0), exploring to know how can I inform SAS to estimate the confidence interval limits for the defaults (1).

Just to add to the good advice you've already received:

As an alternative to computing the "100%−x%" differences you can use the LEVEL= suboption of the BINOMIAL option of the TABLES statement (see documentation) -- EDIT: I hadn't seen @Watts's post, sorry:
```
tables default / nocum binomial(level='1');
```
The NOROW option is redundant here (as no crosstabulation is produced).
You should add the BINOMIAL keyword to the OUTPUT statement in order to obtain the desired results in the output dataset:
```
output out=results binomial;
```
The EXACT BINOMIAL statement is not needed for the exact confidence interval, but requests an exact test (in your case: of the default null hypothesis P=0.5). Do you really want this?
I assume the missing semicolon after your PROC FREQ statement is only a typo.
If you have downloaded the PDF file from the URL in your first post, you may want to change the file name to something like P048_FAULTY!.pdf or delete it.

ggfggrr · Posted 06-19-2019 05:15 AM

Thanks so much and it helps me a lot in knowing these options.

Kind regards.

Mari

SAS Innovate 2025: Call for Content

Classroom Training Available!