BookmarkSubscribeRSS Feed
yli33
Fluorite | Level 6

Hi,

 

I have recently encountered an issue regarding wide to extremely wide 95% confidence intervals that are associated with odds ratio point estimates. Upon verifying the sample sizes with respect to each exposure/outcome category, I realized that small sample sizes was not the underlying reason (e.g. most categories had at least 10% sample size frequency percentages, which do not appear to be small based on my perspective). Therefore, I would like to seek additional assistance with regard to providing possible alternative explanations for the wide confidence interval issue. Thank you in advance.

 

Best,

Lisa

11 REPLIES 11
Reeza
Super User

Verify the standard deviation/variance on that particular variable.  If is highly variable a large CI is not surprising at all.  

 

And I'm assuming you probably standardized your variables? Otherwise, if they're on different scales you might also get something like this happening. 


These aren't the only reasons, but what I'd start checking out next. 

 

PS -> Look at the formula for a confidence interval and the parts that go into. Examine each of those parts to see what's causing the specific increase here and then trace it back to the original data. 

yli33
Fluorite | Level 6

Hi Reeza,

 

Thank you very much for your suggestions on interpreting the wide confidence intervals. I have just checked the standard error (0.7246) for one odds ratio point estimate (1.82) that corresponds to 95% confidence interval of  (0.44, 7.52), which is slightly larger than the standard errors that I have obtained for those point estimates that correspond with narrower confidence intervals. Moreover, I do not think that I have to standardize logistic regression because once I take the antilog of the log odds ratio (e.g. beta coefficient estimate), the point estimates would revert back to the normal scale, correct? Thus, would it be reasonable to conclude that the wide confidence intervals are due to larger standard errors? Thank you again!

 

Best,

Lisa

PaigeMiller
Diamond | Level 26

Large confidence intervals happens for lots of reasons, including the data itself is not consistent, or you have outliers in the data, or you have poorly specified model, or you have (partial) collinearity between the x-variables, and probably dozens of other reasons.

--
Paige Miller
yli33
Fluorite | Level 6

Hi Paige,

 

Thank you for providing several possible explanations regarding wide 95% confidence intervals. Since I am only conducting bivariate survey logistic regression using a single categorical predictor and response variable, I do not think that outliers or multi-collinearity would be an issue of concern for my analyses. From my scrutiny of the results of crude odds ratio point estimates and corresponding confidence intervals, it appears that the main reason for the wide to extremely wide confidence intervals may indeed be attributed to relatively small sample sizes within the individual cell of the 2x2 contingency table.

 

Best,

Lisa

PaigeMiller
Diamond | Level 26

You can have outliers with only one X variable. You can also have very inconsistent data, which would result in wide confidence intervals. Or small sample sizes will do it.

--
Paige Miller
yli33
Fluorite | Level 6

Hi Paige, 

 

Thank you again for suggesting plausible reasons for the observed wide to extremely wide confidence intervals. However, I would like to seek additional clarification on the possibility of outliers for the categorical variables, in which I have verified based on a descriptive summary statistics analysis that no extraneous category exists. Also, could you perhaps provide some explanations for the observation that certain odds ratio point estimates for exposed group and response group are significant while others for the same exposed group but a different response group are insignificant. I have actually noticed that some of the confidence intervals associated with the significant point estimates are rather narrow, which may indicate relatively larger sample sizes. Nevertheless, I would still like to seek your advice on this observation.

 

Also, could you possibly attempt to provide some explanations as to another observation in which I observed larger point estimates for the "low/medium" response category but smaller point estimates for the "high" response category. I have noticed that some of the larger point estimates correspond with a slightly wider confidence interval, which may be due to relatively smaller sample sizes. Another possible reason may be that relatively greater odds exist in the "low/medium" response category as opposed to the "high" response category for the same independent exposed group. Yet another rather obvious explanation may be that the log odds ratio estimate is simply higher in the "low/medium" response category with the larger odds ratio point estimate compared to that of the "high" response category with the slightly smaller odds ratio point estimate. Again, I would like to seeks your advice on the stated observations. Thank you!

 

Best,

Lisa

PaigeMiller
Diamond | Level 26

If you are talking about logistic regression (are you?), then you can have outliers in x if it is continuous. You do state that you have a single categorical x variable, so that means you can't have outliers in x. But you can still have inconsistent data in a category, giving you wide confidence intervals.

 

As far as your other question, can't you look at the sample sizes and see if they are correlated with the width of the confidence interval? 

--
Paige Miller
yli33
Fluorite | Level 6

Hi Paige,

 

Yes, I am employing survey-based binary logistic regression models for the categorical variables in my dataset. Nevertheless, I have not observed any extraneous categories within the either the independent or dependent outcome variable although there are some missing values within both variables. 

 

Regarding my previous questions, I would just like to know whether there may be some alternative explanations for those significant point estimates and relatively larger point estimates for a specific category of the dependent variable but not for the other category. While I have also confirmed the relative sample sizes for each independent/dependent variable category, I was only able to conclude that wider confidence intervals correspond relatively smaller sample sizes since some weighted percentages were greater than 10%. 

 

Best,

Lisa

PaigeMiller
Diamond | Level 26

You can have inconsistent data in a category, resulting in wide confidence intervals. 

--
Paige Miller
yli33
Fluorite | Level 6

Hi Paige,

 

Yes, I definitely think that inconsistent data in certain categories may be another major reason for the observed wide confidence intervals. Thank you again!

 

Best, 

Lisa

ballardw
Super User

@yli33 wrote:

Hi Paige,

 

Yes, I am employing survey-based binary logistic regression models for the categorical variables in my dataset. Nevertheless, I have not observed any extraneous categories within the either the independent or dependent outcome variable although there are some missing values within both variables. 

 

Regarding my previous questions, I would just like to know whether there may be some alternative explanations for those significant point estimates and relatively larger point estimates for a specific category of the dependent variable but not for the other category. While I have also confirmed the relative sample sizes for each independent/dependent variable category, I was only able to conclude that wider confidence intervals correspond relatively smaller sample sizes since some weighted percentages were greater than 10%. 

 

Best,

Lisa


It really helps to show the code as a minimum.

One potential cause in addition to @PaigeMiller's comments is the sample design information you provide in the survey proc.

 

Clusters for example may have unexpected behavior if some of your clusters have 0 or 100 percent prevalence (no variability within the cluster) and only a few cluster providing the variability have relatively large differences between them.

 

You may want to look at the data with surveymeans / surveyfreq requesting the CV statistic. Largish values for this statistic, rule of thumb >.5, tell you that you may have suspect reliability for those sample cells.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 14678 views
  • 2 likes
  • 4 in conversation