BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Walternate
Obsidian | Level 7

Hi,

 

I have a file with almost 1,000 variables. I want a single report that lists every var in my file, the # of missing values, and the number of non-missing values:

 

Varname       nmiss      n

var1               100         0

var2                 0        100

var3               50         50

....

etc.

 

I usually use proc iml for this, but I received an error message about not having enough memory, so I am looking for an alternate methodology. I know that missingness for character and numeric variables can be displayed using PROC FREQ, but I really want the output all together instead of separate for each variable. Note that I have too many vars to list all of them by hand, and that I have both numeric and character variables. 

 

Any help is much appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you just want to know whether there are missing values use NLEVELS option on PROC FREQ. You can use ODS OUTPUT to make it a dataset.

proc freq data=mydata nlevel;
  ods output nlevels=nlevels;
  tables _all_ / noprint;
run;

proc print data=nlevels;
run;

 

 

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

If you just want to know whether there are missing values use NLEVELS option on PROC FREQ. You can use ODS OUTPUT to make it a dataset.

proc freq data=mydata nlevel;
  ods output nlevels=nlevels;
  tables _all_ / noprint;
run;

proc print data=nlevels;
run;

 

 

Walternate
Obsidian | Level 7

That worked perfectly! Thank you Smiley Happy

hashman
Ammonite | Level 13

@Walternate :

The DATA step below will process an arbitrary data set HAVE to generate the kind of file you want as long as HAVE contains fewer than 100001 variables (you can change this limit by revaluing the macro variable nVarLim):

%let nVarLim = 100000 ;                                       
                                                              
data want (keep = varname nmiss n) ;                          
  set have end = z nobs = nobs ;                              
  array nn _numeric_ ;                                        
  array cc _character_ ;                                      
  array mm [2, &nVarLim] _temporary_ (%eval(2*&nVarLim) * 0) ;
  do over nn ;                                                
    mm [1, _i_] + missing (nn) ;                              
  end ;                                                       
  do over cc ;                                                
    mm [2, _i_] + missing (nn) ;                              
  end ;                                                       
  if z ;                                                      
  do over nn ;                                                
    VarName = put (vname (nn), $32.) ;                        
    Nmiss = mm [1, _i_] ;                                     
    N = nobs - Nmiss ;                                        
    output ;                                                  
  end ;                                                       
  do over cc ;                                                
    VarName = put (vname (cc), $32.) ;                        
    Nmiss = mm [2, _i_] ;                                     
    N = nobs - Nmiss ;                                        
    output ;                                                  
  end ;                                                       
run ;                                                         

Note that if HAVE should happen to have no numeric or no character variables, you'd get a harmless log warning "Defining an array with zero elements". Of course, instead of setting nVarLim to an insanely big number, you can populate it with the real number of variables in HAVE from the dictionary tables.

 

An even more dynamic approach that doesn't require even that and confines the entire thing to a single DATA step is to use a hash table instead of the temporary array:

data _null_ ;                                             
  if _n_ = 1 then do ;                                    
    dcl hash h (ordered:"A") ;                            
    h.definekey ("varname") ;                             
    h.definedata ("varname", "nmiss", "n") ;              
    h.definedone () ;                                     
  end ;                                                   
  set have end = z ;                                      
  array nn _numeric_ ;                                    
  array cc _character_ ;                                  
  do over nn ;                                            
    _n_ = missing (nn) ;                                  
    VarName = put (vname (nn), $32.) ;                    
    link getmiss ;                                        
  end ;                                                   
  do over cc ;                                            
    _n_ = missing (cc) ;                                  
    VarName = put (vname (cc), $32.) ;                    
    link getmiss ;                                        
  end ;                                                   
  if z then h.output (dataset:"want") ;                   
  return ;                                                
  getmiss: if h.find() ne 0 then call missing (Nmiss, N) ;
           Nmiss + _n_ = 1 ;                              
           N     + _n_ = 0 ;                              
           h.replace() ;                                  
  return ;                                                
run ;                                                     

The above note about the cases where either numeric or character variables are absent remains valid. Also note that if you have a very large input data set, the array-based program is likely to run quite a bit faster, as in the hash program above the FIND and REPLACE methods are called NOBS*NVARS times. 

 

Kind regards

Paul D.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1018 views
  • 2 likes
  • 3 in conversation