BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Wolverine
Pyrite | Level 9

This seems like it should be quite easy to do, but I haven't found a way yet.  I have a large file (millions of records) with names, dates of birth, addresses, etc, and I want to know the missing/non-missing percentages for each of these variables.  I don't want to run proc freq, as this will give me a huge list of every name in the dataset. 

 

My work-around so far has been to recode these types of variables into dichotomous variables (ie, 0=missing and 1=non-missing), and then run freqs.  But that's time consuming and tedious.  Isn't there an option in proc freq or another procedure to do this automatically?

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Wolverine,

 


@Wolverine wrote:

I don't want to run proc freq, as this will give me a huge list of every name in the dataset.


This doesn't happen if you apply a suitable format to the variables in question such as

proc format;
value miss
._-.z = 'missing'
other = 'non-missing';

value $miss
' ' = 'missing'
other = 'non-missing';
run;

proc freq data=sashelp.heart;
format _numeric_ miss. _char_ $miss.;
tables _all_ / missing;
run;

View solution in original post

5 REPLIES 5
FreelanceReinh
Jade | Level 19

Hello @Wolverine,

 


@Wolverine wrote:

I don't want to run proc freq, as this will give me a huge list of every name in the dataset.


This doesn't happen if you apply a suitable format to the variables in question such as

proc format;
value miss
._-.z = 'missing'
other = 'non-missing';

value $miss
' ' = 'missing'
other = 'non-missing';
run;

proc freq data=sashelp.heart;
format _numeric_ miss. _char_ $miss.;
tables _all_ / missing;
run;
Reeza
Super User

Here's a variant on the program from @FreelanceReinh that cleans up the display into a few nice options.

 

/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

Change the macro variables in the code and re-run it on your own data set to get your own results.

It should be pretty fast.

 

*change these to what you need in code above;
%let INPUT_DSN = class; 
%let OUTPUT_DSN = want;

@Wolverine wrote:

This seems like it should be quite easy to do, but I haven't found a way yet.  I have a large file (millions of records) with names, dates of birth, addresses, etc, and I want to know the missing/non-missing percentages for each of these variables.  I don't want to run proc freq, as this will give me a huge list of every name in the dataset. 

 

My work-around so far has been to recode these types of variables into dichotomous variables (ie, 0=missing and 1=non-missing), and then run freqs.  But that's time consuming and tedious.  Isn't there an option in proc freq or another procedure to do this automatically?


 

Ksharp
Super User
%let lib=sashelp;
%let dsn=heart;



data _null_;
 set sashelp.vcolumn(where=(libname=upcase("&lib") and memname=upcase("&dsn"))) end=last;
 if _n_=1 then call execute('proc sql;create table want as select ');
 call execute(cat("nmiss(",name,")/(select count(*) from &lib..&dsn ) as ",name," format=percent8.2"));
 if last then call execute("from &lib..&dsn ;quit;" );
  else call execute(',');
run;

Wolverine
Pyrite | Level 9

I accepted FreelanceReinhard's solution because it's simple and easy to apply, and it serves my purposes quite well for this particular data.  But I will make a note of Reeza's macro, as I anticipate it could be useful in the future for larger/more complex datasets.  Thanks!

Reeza
Super User
Just a note, that my code is not a macro.
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2158 views
  • 7 likes
  • 4 in conversation