I have a dataset with lots of missing data. Rather than impute, I was informed to build two logistic regression prediction models: (1) include only variables that most participants have observations on and (2) include all variables with participants who have most observations.
Originally I was going to use cmiss(of _ALL_) = 0 to obtain complete cases for the second model, but this left me with an incredibly small sample size. Is there a way to include participants that contain say 90% of the information filled out? That is, if I have 20 variables (columns), then only participants who have information in at least 18 of those columns are kept.
Conversely, is there a way to only keep variables that contain 90% of the information? That is, if I have 50 participants (rows), then variables that contain at least 45 observations will be kept. I suppose I can manually look and drop variable names, but I was hoping for something more efficient.
Thank you in advance!
how about: cmiss(of _ALL_)/20 <= 0.1?
Hahaha. The answer was so obvious. Thank you!
Do you know how to do a similar thing for columns instead of rows? Would transposing be the best option?
What do you mean by "similar thing for columns"? Drop column if it has less then 90% of non-missing?
B
Yes.
I have been faced with this problem at some occasions where you have to decide on the combination of variables and observations to retain for an analysis that excludes observations with missing values. I wrote a macro that finds ALL of those combinations, called complete subsets, to help you choose. You might choose, for example, the largest subset (nbVars x nbObs) that includes some key variables.
The macro is described here:
Good luck.
If my variable is stored as numeric but to be treated as categorical in analysis, does this macro still work okay with it?
Ah. Nevermind. It seems SAS University does not like OPTNET. Thank you though.
Please note that proc optnet was introduced with SAS 9.3TS1M2.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.