How to count non-missing rows for *all* columns and store results in a single data set

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

How to count non-missing rows for *all* columns and store results in a single data set

[ Edited ]

Hi everyone,

 

I am trying to find an elegant way to collect some metadata for all columns in a data set. What I want to know is, for each and every variable in a data set, how many rows are populated? I would like the result of this query to be output into a single data set which contains one row per variable in the original data, and a column giving the number of non-missing rows for that variable. Here's some example data and output:

 

data demo_data;
  infile datalines dsd truncover;
  input Make:$13. Model:$40. Type:$8. Origin:$6. DriveTrain:$5. MSRP:DOLLAR8. Invoice:DOLLAR8. EngineSize:32. Cylinders:32. Horsepower:32. MPG_City:32. MPG_Highway:32. Weight:32. Wheelbase:32. Length:32.;
datalines4;
Acura,,SUV,Asia,All,"$36,945","$33,337",3.5,6,,17,23,4451,106,189
Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2,4,,24,31,2778,101,172
Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4,,22,29,3230,105,183
Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",,6,270,20,28,3575,108,186
Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",,6,225,18,24,3880,115,197
;;;;

Desired output:

 

namenum_not_missing
Cylinders5
DriveTrain5
EngineSize3
Horsepower2
Invoice5
Length5
MPG_City5
MPG_Highway5
MSRP5
Make5
Model4
Origin5
Type5
Weight5
Wheelbase5

 

... where I have inserted missing values in the demo dataset and the resulting metadata shows that some columns contain missing values.

 

I do have a solution to this problem already, but I would like to know whether there is a more elegant or SAS-like way to get the same results (if only to satisfy my own curiosity!). My current (inelegant) solution is as follows:

 

proc contents data=demo_data out=cont_demo_data(keep=name varnum type length) noprint; run;
proc sql noprint;
	select catx(" ", "count(", name, ") as ", name)
	into :list separated by ', '
	from cont_demo_data;
quit;
proc sql;
	create table counts_demo_data as
	select
		&list.
	from demo_data;
quit;
proc transpose 
	data = counts_demo_data
	out = counts_demo_data_t(rename=(_NAME_=name COL1=num_not_missing));
run;

So as you can see, it's somewhat inelegant but it does at least get to the correct answer.

 

Other approaches I've tried that don't give me the result I need:

  • Using proc means as this will only work for numeric variables
  • proc freq with _NUMERIC_ or _CHAR_ in the tables statement - this gives a separate table for each variable, not all in the same table

Thanks!

 

Adam

 

p.s. I'm using base SAS 9.2


Accepted Solutions
Solution
‎10-17-2017 07:32 AM
Trusted Advisor
Posts: 1,312

Re: How to count non-missing rows for *all* columns and store results in a single data set

First, it is possible to use PROC FREQ to generate a single dataset with all the variables and their frequencies.  But instead of the OUTPUT statement, you have to use ODS OUTPUT:

 

proc format;
  value nmis .='.' other='x';
  value $cmis ' '='.' other='x';
run;

ods output onewayfreqs=freq (keep=table frequency f_:);
ods listing close;
proc freq data=demo_data ;
  tables _all_ / missing ;
  format _numeric_ nmis.  _character_ $cmis.;
run;
ods listing;
ods output close;

 

But I think your program is more compact.  And you can make it even more compact by dropping the PROC CONTENTS, and having the first proc sql read from dictionary.columns:

 

proc sql noprint;
	select catx(" ", "count(", name, ") as ", name)
	into :list separated by ', '
	from dictionary.columns 
          where libname='WORK' and memname='DEMO_DATA';
quit;

 

And I suppose you can put the CREATE TABLE statement in the first PROC SQL step.

View solution in original post


All Replies
Super User
Posts: 23,323

Re: How to count non-missing rows for *all* columns and store results in a single data set

In SAS 9.2 that's probably as elegant as you're going to get, and even in 9.4. 

 

The solution I was going to suggest, isn't a succinct. The only risk is for big data sets, where the macro variable may be longer than the 32K character limit. 

 

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111

 

Full non macro solution

Valued Guide
Posts: 2,191

Re: How to count non-missing rows for *all* columns and store results in a single data set

proc summary data= your_data ;
var _numeric_ ;
output nmiss= ;
proc transpose data= _last_( drop= _FREQ_ _TYPE_ )
         out= nmiss_counts(rename= col1 = nmiss ) ;
run ;

should work in any old release

unless you have columns in your_data  named _type_ or _freq_  --> then I wouldn't know

Valued Guide
Posts: 2,191

Re: How to count non-missing rows for *all* columns and store results in a single data set

proc summary data= your_data ;
var _numeric_ ;
output n= ;
proc transpose data= _last_( drop= _FREQ_ _TYPE_ )
       out= non_miss_counts(rename= col1 = n ) ;
run ;

revised to provide counts of non-missing (proc means / summary statistic N) rather than the NMISS statistic

Senior User
Posts: 1

Re: How to count non-missing rows for *all* columns and store results in a single data set

Based on your data demo_data, I built an 'elegant' macro to give you the desired output dataset.

 

The basic idea is to 1) use sas sql to count the number of non-missing values; 2) use union to concatenate the names and the non-missing values; 3) use macro and do-loop to go through the variable list and repeat the step1 and 2.

 

So, as shown in the codes, I first created the variable list, and then calculate the length of the list that will be used for do-loop based on the position of each variable. With the help of do-loop, I combined each variable name and its non-missing value into one dataset, starting from the second variable (that's why I used 'if then' to control the loop). And voila. below is the output dataset.

 

 

 Capture1013.PNGNon-missing value dataset

%let varlist=Make#Model#Type#Origin#DriveTrain#MSRP#Invoice#EngineSize#Cylinders#Horsepower#MPG_City#MPG_Highway#Weight#Wheelbase#Length;                                                                                         
                                                                                                                                                                                                                                  
%macro nmiss;                                                                                                                                                                                                                     
%let varnum=%eval(1+(%length(%sysfunc(compbl(&varlist))) - %length(%sysfunc(compress(&varlist,'#')))));                                                                                                                           
proc sql;                                                                                                                                                                                                                         
create table nmiss as                                                                                                                                                                                                             
%do i= 1 %to &varnum.;                                                                                                                                                                                                            
%if &i ne 1 %then union;                                                                                                                                                                                                          
select "%scan(&varlist.,&i,'#')" as name, count(%scan(&varlist.,&i,'#'))  as num_not_missing                                                                                                                                      
from demo_data                                                                                                                                                                                                                    
%end;                                                                                                                                                                                                                             
;                                                                                                                                                                                                                                 
quit;                                                                                                                                                                                                                             
%mend;                                                                                                                                                                                                                            
%nmiss; 

 

Well, if you are not comfortable with macro,  you can also use the following codes in sas sql. The coding could be long depending on the number of the variables, but the idea is really straightforward.

 

For example:

select 'Make' as name, count(Make) as num_not_missing from demo_data

union

select 'Model' as name, count(Model) as num_not_missing from demo_data

union

...

Respected Advisor
Posts: 4,687

Re: How to count non-missing rows for *all* columns and store results in a single data set

[ Edited ]

@adam_t

Here a data step approach which should work for SAS 9.2 and for any number of variables in your source data set.

data demo_data;
  infile datalines dsd truncover;
  input Make:$13. Model:$40. Type:$8. Origin:$6. DriveTrain:$5. MSRP:DOLLAR8. Invoice:DOLLAR8. EngineSize:32. Cylinders:32. Horsepower:32. MPG_City:32. MPG_Highway:32. Weight:32. Wheelbase:32. Length:32.;
datalines4;
Acura,,SUV,Asia,All,"$36,945","$33,337",3.5,6,,17,23,4451,106,189
Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2,4,,24,31,2778,101,172
Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4,,22,29,3230,105,183
Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",,6,270,20,28,3575,108,186
Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",,6,225,18,24,3880,115,197
;;;;
run;

data _null_;
  set demo_data end=last;
  array _c_var {*} _character_;
  array _n_var {*} _numeric_;

  /* hash to collect variable stats */
  if _n_=1 then 
    do;
      length name $32 num_not_missing 8;
      dcl hash h1(ordered:'y');
      h1.defineKey('name');
      h1.defineData('name','num_not_missing');
      h1.defineDone();
    end;

  /* stats for char vars */
  do i=1 to dim(_c_var);
    name=vname(_c_var[i]);
    num_not_missing=0;
    if h1.find() ne 0 then h1.add(); 
    if not missing(_c_var[i]) then 
      do;
        num_not_missing= sum(num_not_missing, 1);
        _rc=h1.replace();
      end;
  end;

  /* stats for num vars */
  do i=1 to dim(_n_var);
    name=vname(_n_var[i]);
    num_not_missing=0;
    if h1.find() ne 0 then h1.add(); 
    if not missing(_n_var[i]) then 
      do;
        num_not_missing= sum(num_not_missing, 1);
        _rc=h1.replace();
      end;
  end;

  /* write stats to table want */
  if last then h1.output(dataset:'want');

run;

proc print data=want noobs;
run;

 

Solution
‎10-17-2017 07:32 AM
Trusted Advisor
Posts: 1,312

Re: How to count non-missing rows for *all* columns and store results in a single data set

First, it is possible to use PROC FREQ to generate a single dataset with all the variables and their frequencies.  But instead of the OUTPUT statement, you have to use ODS OUTPUT:

 

proc format;
  value nmis .='.' other='x';
  value $cmis ' '='.' other='x';
run;

ods output onewayfreqs=freq (keep=table frequency f_:);
ods listing close;
proc freq data=demo_data ;
  tables _all_ / missing ;
  format _numeric_ nmis.  _character_ $cmis.;
run;
ods listing;
ods output close;

 

But I think your program is more compact.  And you can make it even more compact by dropping the PROC CONTENTS, and having the first proc sql read from dictionary.columns:

 

proc sql noprint;
	select catx(" ", "count(", name, ") as ", name)
	into :list separated by ', '
	from dictionary.columns 
          where libname='WORK' and memname='DEMO_DATA';
quit;

 

And I suppose you can put the CREATE TABLE statement in the first PROC SQL step.

New Contributor
Posts: 2

Re: How to count non-missing rows for *all* columns and store results in a single data set

Thank you everyone for your responses, this was really helpful! I've some great ideas to go away and try now.

 

Adam

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 746 views
  • 7 likes
  • 6 in conversation