09-24-2015 03:35 PM
I'm doing a proc logistic and trying to figure out how to determine the frequencies of each variable used in my final model. The output tells me the number of observations read, but is there a way to know how the 'n' of each variable in my multivariate model?
09-24-2015 03:51 PM
If I am understanding this question correctly, you want to know how many observations are used for each variable of your model. I believe the answer is the same for all variables and will be the one that you already have. If a given value is missing for a single variable, then even if there is data for the other variables, the entire observation will be dropped from the model calculations. So any observation with missing values for variables included in your model will be dropped. You should look into handling missing data so that you can fill these missing values before hand with imputed values.
Best of luck,
09-24-2015 04:06 PM
Hmm I included the code below and specified that observations are dropped for the variables in the model. However, I've seen tables in many publications where the 'n' is listed for variables in unadjusted and adjusted models. And the variables in the adjusted models have different 'n' values.
proc logistic descending data=meley.burum7; /*MODEL 1*/
model exclusive= mbmi /*keep*/
Lib_lessthan8 Lib_morethan8 Gha_zone1112 /*ACC: Time living in Ghana with 4 groups*/
where exclusive ne . and mbmi ne . and childage ne . and primi ne . and marital_status_model2 ne . and dailyincome2 ne .
and Read_or_Write ne . and Q38_Borrowed_money_from_others ne . and acc25_timeinghana ne . and mom_age ne . ; run;
09-24-2015 11:51 PM
09-25-2015 10:00 AM - edited 09-25-2015 10:02 AM
The problem is that if a record is missing observations, then comparing it to those that are not is like comparing apples and oranges. For example, if you are predicting exclusive and have trained your model and found that both childage and primi are very influential in predicting exclusive. Then if you have one observation that has records for both childage and primi and one record that is missing a value for primi, then it would be inconsistent to use the same model to predict both records. This has to do with the fact that when you are calculating model coefficients, you are finding the impact of any independent variable on the dependent variable while holding all other independent variables constant. If you don't have those other variables (due to missing values), then really what you have is a completely different model e.g. y=x_1+x_2 vs. y=x_1. So the 'n' or number of observations for your model should be the same for all variables used in your model. I am not sure what you have seen in other publications, perhaps they were running multiple experiments with different cohorts and then comparing them? By default, SAS will drop any observations with missing values that are used in the model for this reason. I hope that helps.