BookmarkSubscribeRSS Feed
amarikow57
Obsidian | Level 7

I have a dataset with lots of missing data. Rather than impute, I was informed to build two logistic regression prediction models: (1) include only variables that most participants have observations on and (2) include all variables with participants who have most observations.

 

Originally I was going to use cmiss(of _ALL_) = 0 to obtain complete cases for the second model, but this left me with an incredibly small sample size. Is there a way to include participants that contain say 90% of the information filled out? That is, if I have 20 variables (columns), then only participants who have information in at least 18 of those columns are kept.

 

Conversely, is there a way to only keep variables that contain 90% of the information? That is, if I have 50 participants (rows), then variables that contain at least 45 observations will be kept. I suppose I can manually look and drop variable names, but I was hoping for something more efficient.

 

Thank you in advance!

 

 

8 REPLIES 8
yabwon
Onyx | Level 15

how about: cmiss(of _ALL_)/20 <= 0.1?

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



amarikow57
Obsidian | Level 7

Hahaha. The answer was so obvious. Thank you! 

 

Do you know how to do a similar thing for columns instead of rows? Would transposing be the best option?

 

yabwon
Onyx | Level 15

What do you mean by "similar thing for columns"? Drop column if it has less then 90% of non-missing? 

B

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



amarikow57
Obsidian | Level 7

Yes.

PGStats
Opal | Level 21

I have been faced with this problem at some occasions where you have to decide on the combination of variables and observations to retain for an analysis that excludes observations with missing values. I wrote a macro that finds ALL of those combinations, called complete subsets, to help you choose. You might choose, for example, the largest subset (nbVars x nbObs) that includes some key variables.

 

The macro is described here:

 

https://communities.sas.com/t5/SAS-Communities-Library/Finding-a-complete-sub-matrix-aka-finding-max... 

 

Good luck.

PG
amarikow57
Obsidian | Level 7

If my variable is stored as numeric but to be treated as categorical in analysis, does this macro still work okay with it?

amarikow57
Obsidian | Level 7

Ah. Nevermind. It seems SAS University does not like OPTNET. Thank you though.

PGStats
Opal | Level 21

Please note that proc optnet was introduced with SAS 9.3TS1M2.

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1704 views
  • 0 likes
  • 3 in conversation