06-28-2017 12:17 PM
Hi, I have a problem of output observations which contains missing values in some variables, but not all the variables.
Currentlly, I have a dataset, which contains 53 character variables and 24 numeric variables, and about 100000 observations. I would like to output a new table which contains those observations, which have missing values in some variables, but not all the variables.
Or, I like to output each table for each variable, which only contains observations which contain missing values under this variable.
Could anyone show me code to do this?
06-28-2017 01:55 PM
The second is very easy though routine and tedious repeatedly:
where missing (varname);
Repeat for each variable of interest.
If your variable names are not too long you could automate this with:
proc sql; create table vars as select name from dictionary.columns where libname='MYLIB' and memname='SETNAME' ; quit; data _null_; set vars; call execute ("data "|| cats('_',name)||';'); call execute ("set MYLIB.SETNAME;"); call execute ("where missing("||name||");"); call execute ("run;"); run;
MYLIB should be replaced with the name of your library or WORK if you didn't specify and SETNAME with the name of your data set.
In the Proc SQL code those should be in upper case as that is how they are stored in the Dictionary.Columns table.
Missing in some but not all needs additional explanation as you have a lot of possible combinations.