BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Wolverine
Pyrite | Level 9

This seems like it should be quite easy to do, but I haven't found a way yet.  I have a large file (millions of records) with names, dates of birth, addresses, etc, and I want to know the missing/non-missing percentages for each of these variables.  I don't want to run proc freq, as this will give me a huge list of every name in the dataset. 

 

My work-around so far has been to recode these types of variables into dichotomous variables (ie, 0=missing and 1=non-missing), and then run freqs.  But that's time consuming and tedious.  Isn't there an option in proc freq or another procedure to do this automatically?

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Wolverine,

 


@Wolverine wrote:

I don't want to run proc freq, as this will give me a huge list of every name in the dataset.


This doesn't happen if you apply a suitable format to the variables in question such as

proc format;
value miss
._-.z = 'missing'
other = 'non-missing';

value $miss
' ' = 'missing'
other = 'non-missing';
run;

proc freq data=sashelp.heart;
format _numeric_ miss. _char_ $miss.;
tables _all_ / missing;
run;

View solution in original post

5 REPLIES 5
FreelanceReinh
Jade | Level 19

Hello @Wolverine,

 


@Wolverine wrote:

I don't want to run proc freq, as this will give me a huge list of every name in the dataset.


This doesn't happen if you apply a suitable format to the variables in question such as

proc format;
value miss
._-.z = 'missing'
other = 'non-missing';

value $miss
' ' = 'missing'
other = 'non-missing';
run;

proc freq data=sashelp.heart;
format _numeric_ miss. _char_ $miss.;
tables _all_ / missing;
run;
Reeza
Super User

Here's a variant on the program from @FreelanceReinh that cleans up the display into a few nice options.

 

/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

Change the macro variables in the code and re-run it on your own data set to get your own results.

It should be pretty fast.

 

*change these to what you need in code above;
%let INPUT_DSN = class; 
%let OUTPUT_DSN = want;

@Wolverine wrote:

This seems like it should be quite easy to do, but I haven't found a way yet.  I have a large file (millions of records) with names, dates of birth, addresses, etc, and I want to know the missing/non-missing percentages for each of these variables.  I don't want to run proc freq, as this will give me a huge list of every name in the dataset. 

 

My work-around so far has been to recode these types of variables into dichotomous variables (ie, 0=missing and 1=non-missing), and then run freqs.  But that's time consuming and tedious.  Isn't there an option in proc freq or another procedure to do this automatically?


 

Ksharp
Super User
%let lib=sashelp;
%let dsn=heart;



data _null_;
 set sashelp.vcolumn(where=(libname=upcase("&lib") and memname=upcase("&dsn"))) end=last;
 if _n_=1 then call execute('proc sql;create table want as select ');
 call execute(cat("nmiss(",name,")/(select count(*) from &lib..&dsn ) as ",name," format=percent8.2"));
 if last then call execute("from &lib..&dsn ;quit;" );
  else call execute(',');
run;

Wolverine
Pyrite | Level 9

I accepted FreelanceReinhard's solution because it's simple and easy to apply, and it serves my purposes quite well for this particular data.  But I will make a note of Reeza's macro, as I anticipate it could be useful in the future for larger/more complex datasets.  Thanks!

Reeza
Super User
Just a note, that my code is not a macro.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1018 views
  • 7 likes
  • 4 in conversation