I'm calculating odds ratios between two binary (yes/no) variables. At some point in the process, the missing values for one of the variables were recoded to -1 and was left this way when the odds ratios were run. Fearing this may lead to inaccurate results, I recoded the -1's back to missing, ran frequencies to verify all -1's were now missing, and reran the odds ratios. The results didn't change at all.
After thinking about it, I believe the explanation is that the value of -1 would affect stats like mean and standard deviation. But for odds ratios, these variables are being treated as categorical rather than continuous because there are only 3 levels. Whether the missing cases are assigned a category ("-1") or just left as missing doesn't affect the odds for the "0" and "1" categories. Is my thinking correct, or is there something else going on here?
You didn't change the relative counts of 1 vs 0. Those stayed the same. If you had changed the -1 to 0 the ratios would change.
Probably should provide the code that you were using and indicate the variable(s) of interest. There are several procedures that calculate odds ratios and may behave a bit differently depending on procedure and options chosen.
Also, do you have any sort of custom format applied to the variable? A variable that plays a group role could well not show a difference if the definition for one of the ranges of a custom format looked like "low - 1" or similar.
No custom formats have been applied. "SI_flag" is the variable of interest. Here is the code:
PROC LOGISTIC data=analysis_for_model; CLASS SI_flag (ref="0");
MODEL mgmt_flag(EVENT='1')=SI_flag;
ODS output oddsratios=SI_flag_OR;
RUN;
QUIT;
Okay nothing complex.
Did you verify that your original values of -1 actually have a value for the dependent variable? If what ever cause the -1 or original missing for SI_flag is associated with missing value for Mgmt_flag then the observations would not have been used as there was nothing for the calculation.
Check your logs for running with both sets of data and see how many observations are read and how many used by the model. If the number didn't change I strongly suspect that the dependent is missing as well.
mgmt_flag doesn't have any missing -- all values are either 0 or 1. When comparing records with missing SI_flag to mgmt_flag, sometimes mgmt_flag is 0 and sometimes it's 1.
In the file version where SI_flag is set to missing, there is a note in the log. The number is the same number of records where SI_flag is missing
Note: 5389 observations were deleted due to missing values for the response or explanatory variables.
When running the version with missing set to -1, the odds ratio output shows this as point estimate (confidence interval):
SI_flag -1 vs 0 0.968 (0.897 1.044 )
SI_flag 1 vs 0 0.742 (0.608 0.906)
When running the version recoded back to missing, the odds ratio output shows this as point estimate (confidence interval):
SI_flag 1 vs 0 0.742 (0.608 0.906)
You didn't change the relative counts of 1 vs 0. Those stayed the same. If you had changed the -1 to 0 the ratios would change.
@ballardw wrote:
You didn't change the relative counts of 1 vs 0. Those stayed the same. If you had changed the -1 to 0 the ratios would change.
I think I understand now... because the relative counts didn't change, the odds of having a value of 1 vs 0 didn't change either. Thanks!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.