Hi All,
Here I have a question about variable selesction in SAS. Say I have the following data set:
age gender height weight death
89 0 . 110 1
70 1 58 . 0
50 1 173 130 1;
And I want to fit a logistic regression with backwards variable selection, so I coded like this:
proc logistic data=have descending;
class gender;
model death = age gender height weight /selection=backward fast ;
run;
But since the data has missing value, only the last obs will be used to do this model selection (i.e program will delete entries that has missing values based on full model).
But, I want the program to include more obs when it evaluate model with: age gender height, i.e. use obs 2 and 3. Is there a command to make this haapened in SAS ??
Thank you very much!
Best,
In classical linear models, the regression needs to form the so-called SSCP matrix X`*X. To form this matrix product require removing observations that have missing values.
See a previous discussion about this topic for other options, including multiple imputation with PROC MI.
If you are committed to PROC LOGISTIC, multiple imputation is a good solution. For survey data, SAS provides PROC SURVEYLOGISIC. If you can express your model as a mixed model, PROC GLIMMIX handles missing data differently.
There is extensive literature in this area. I particulalry like the research and suggestions by Paul Allison, and recommend that you do an internet search for
logistic regression "missing data" site:statisticalhorizons.com
I don't understand what you mean by "add more observations." SAS (and all statistical software) analyzes the data you have. Can you provide an example? For the data you've presented, what would you like to happen if only two or three covariates are being analyzed?
I am not aware of any variable selection technique for logistic regression in which different observations are used for different sets of candidate variables.
There are other predictive models that are more tolerant of missing values. You might look at PROC HPSPLIT, which uses tree-based models for building regression models.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.