Hi,
I have a file with almost 1,000 variables. I want a single report that lists every var in my file, the # of missing values, and the number of non-missing values:
Varname nmiss n
var1 100 0
var2 0 100
var3 50 50
....
etc.
I usually use proc iml for this, but I received an error message about not having enough memory, so I am looking for an alternate methodology. I know that missingness for character and numeric variables can be displayed using PROC FREQ, but I really want the output all together instead of separate for each variable. Note that I have too many vars to list all of them by hand, and that I have both numeric and character variables.
Any help is much appreciated!
If you just want to know whether there are missing values use NLEVELS option on PROC FREQ. You can use ODS OUTPUT to make it a dataset.
proc freq data=mydata nlevel;
ods output nlevels=nlevels;
tables _all_ / noprint;
run;
proc print data=nlevels;
run;
If you just want to know whether there are missing values use NLEVELS option on PROC FREQ. You can use ODS OUTPUT to make it a dataset.
proc freq data=mydata nlevel;
ods output nlevels=nlevels;
tables _all_ / noprint;
run;
proc print data=nlevels;
run;
That worked perfectly! Thank you
The DATA step below will process an arbitrary data set HAVE to generate the kind of file you want as long as HAVE contains fewer than 100001 variables (you can change this limit by revaluing the macro variable nVarLim):
%let nVarLim = 100000 ;
data want (keep = varname nmiss n) ;
set have end = z nobs = nobs ;
array nn _numeric_ ;
array cc _character_ ;
array mm [2, &nVarLim] _temporary_ (%eval(2*&nVarLim) * 0) ;
do over nn ;
mm [1, _i_] + missing (nn) ;
end ;
do over cc ;
mm [2, _i_] + missing (nn) ;
end ;
if z ;
do over nn ;
VarName = put (vname (nn), $32.) ;
Nmiss = mm [1, _i_] ;
N = nobs - Nmiss ;
output ;
end ;
do over cc ;
VarName = put (vname (cc), $32.) ;
Nmiss = mm [2, _i_] ;
N = nobs - Nmiss ;
output ;
end ;
run ;
Note that if HAVE should happen to have no numeric or no character variables, you'd get a harmless log warning "Defining an array with zero elements". Of course, instead of setting nVarLim to an insanely big number, you can populate it with the real number of variables in HAVE from the dictionary tables.
An even more dynamic approach that doesn't require even that and confines the entire thing to a single DATA step is to use a hash table instead of the temporary array:
data _null_ ;
if _n_ = 1 then do ;
dcl hash h (ordered:"A") ;
h.definekey ("varname") ;
h.definedata ("varname", "nmiss", "n") ;
h.definedone () ;
end ;
set have end = z ;
array nn _numeric_ ;
array cc _character_ ;
do over nn ;
_n_ = missing (nn) ;
VarName = put (vname (nn), $32.) ;
link getmiss ;
end ;
do over cc ;
_n_ = missing (cc) ;
VarName = put (vname (cc), $32.) ;
link getmiss ;
end ;
if z then h.output (dataset:"want") ;
return ;
getmiss: if h.find() ne 0 then call missing (Nmiss, N) ;
Nmiss + _n_ = 1 ;
N + _n_ = 0 ;
h.replace() ;
return ;
run ;
The above note about the cases where either numeric or character variables are absent remains valid. Also note that if you have a very large input data set, the array-based program is likely to run quite a bit faster, as in the hash program above the FIND and REPLACE methods are called NOBS*NVARS times.
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.