DATA Step, Macro, Functions and more

Is there any macro to check percentage of missing values in a dataset

Accepted Solution Solved
Reply
Contributor
Posts: 52
Accepted Solution

Is there any macro to check percentage of missing values in a dataset

Just feel curious. Thanks. If you have, can you post it here? Smiley Happy


Accepted Solutions
Solution
‎02-21-2012 02:46 PM
PROC Star
Posts: 7,492

Is there any macro to check percentage of missing values in a dataset

You could just use a simple datastep.  e.g.:

/*create a test file with some missing data*/

data have;

  set sashelp.class;

  if _n_ in (5,10,12) then do;

    call missing(height,weight);

  end;

  call missing(age);

run;

data want (keep=percent_missing);

  set have end=last;

  array nums _numeric_;

  array chars _character_;

  do over nums;

    if missing(nums) then m+1;

    else n+1;

  end;

  do over chars;

    if missing(chars) then m+1;

    else n+1;

  end;

  if last then do;

    percent_missing=100*m/sum(m,n);

    output;

  end;

run;

proc print data=want;

run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: Is there any macro to check percentage of missing values in a dataset

Ruth,

I am sure there must be better approaches, I am here just to start the discussion. following code will only work one variable at a time, but by tweaking it, it could also work on multiple variables.

DATA have;

infile cards;

input a @@;

cards;

1 . 2 . 3 4 5 6 7 . 8 . 9

;

%macro mpcnt (dat=, var=);

data _null_;

set &dat end=last nobs=nobs;

if missing(&var) then _n+1;

if last then do;

       _p= _n/nobs;

put "percent=" _p percent7.2 ;

end;

run;

%mend;

%mpcnt (dat=have,var=a)

Or for SQL solution,

%macro sql_pcnt (dat=, var=);

proc sql;

select sum(missing(&var))/count(*) as Missing_percent format percent7.2

from &dat;

quit;

%mend;

%sql_pcnt (dat=have,var=a)

Hope you can have some taste of the logic and work out some solution for your real life data.

Regards,

Haikuo

Super User
Posts: 5,516

Is there any macro to check percentage of missing values in a dataset

Ruth,

I imagine there's not much out there in the way of macros, because SAS handles the problem easily enough without macros.

If you can work with the number (instead of percentage) of missing values, it's as easy as this:

proc means data=mydata maxdec=0 nmiss;

run;

If you really must have percentages, then create an output data set:

proc means data=mydata noprint;

   var _numeric_;

   output out=missings (drop=_type_) nmiss=;

run;

In your output data set MISSINGS, each variable name will contain the number of observations that the original variable was missing.  You will also have the automatic variable _FREQ_, which is the total number of observations.  If you bring this into a DATA step, it will be easy enough to calculate the percentages.

Good luck.

Super User
Posts: 19,869

Is there any macro to check percentage of missing values in a dataset

You can create a format for missing based on numeric and character variables and run proc freq.

proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

data have;

     set have;

format _numeric_ nmissfmt. _character_ $missfmt.;

run;

proc freq data=have;

run;

But here's my macro. It needs work since I chopped it together from another macro I use, so not all the code in here is actually required.

%macro sum_missing(libname, dsetin, dsetout);

*Delete old dataset;

proc datasets nodetails nolist;

    delete &dsetout;

quit;

*Upcase all macro variables to have consistency;

data _null_;

    call symput ("libname", put(upcase("&libname."), $8.));

    call symput ("dsetin", put(upcase("&dsetin."), $32.));

run;

*Formats for missing for character and numeric variables;

proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

*Copy dataset and apply missing formats to it;

data step1;

    set &libname..&dsetin;

    format _character_ $missfmt. _numeric_ nmissfmt.;

run;

options mprint symbolgen;

*Get variable list;

proc sql noprint;

    select name  into : var_list separated by " "

    from dictionary.columns

    where upcase(libname) ="&libname"

    and upcase(memname)="&dsetin";

    *Get count of total number of observations into macro variable;

    select count(*) into : obs_count

    from &libname..&dsetin.;

quit;

*Start looping through the variable list here;

%let i=1;

%do %while (%scan(&var_list, &i, " ") ^=%str());

%let var=%scan(&var_list, &i, " ");   

*Get format of missing;

data _null_;

set step1;

call symput("var_fmt", vformat(&var));

call symput("var_label", vlabel(&var));

run;

*Get count of missing;

proc freq data=step1 noprint;

    table &var/missing out=cross_tab1;

run;

data cross_tab2;

    set cross_tab1;

    length variable $50.;

    category=put(&var, &var_fmt.);

    variable="&var_label.";

   

    if _n_=1 and category='Not Missing' then do;;

        Number_Missing=&obs_count-count;

        Percent_Missing=Number_Missing/&obs_count.;

        percent=percent/100;

        output;

    end;

    else if _n_=1 and category='Missing' then do;

       

        Number_Missing=count;

        Percent_Missing=percent/100;

        output;

    end;

        format percent: percent10.1;

    keep variable Number_Missing Percent_Missing;

run;

proc append base=&dsetout data=cross_tab2 force;

run;

proc datasets nodetails nolist;

    delete cross_tab1 cross_tab2;

run; quit;

*Increment counter;

%let i=%eval(&i+1);

%end; *Categorical;

proc datasets nodetails nolist;

    delete step1;

run; quit;

%mend;

Contributor
Posts: 52

Is there any macro to check percentage of missing values in a dataset

A few good answers now. Thanks.

At the same time, can I ask if a program can automatically create a new temporary dataset which reports the percentage of missing values for all variables? Probably some use of the dictionary is needed.

Solution
‎02-21-2012 02:46 PM
PROC Star
Posts: 7,492

Is there any macro to check percentage of missing values in a dataset

You could just use a simple datastep.  e.g.:

/*create a test file with some missing data*/

data have;

  set sashelp.class;

  if _n_ in (5,10,12) then do;

    call missing(height,weight);

  end;

  call missing(age);

run;

data want (keep=percent_missing);

  set have end=last;

  array nums _numeric_;

  array chars _character_;

  do over nums;

    if missing(nums) then m+1;

    else n+1;

  end;

  do over chars;

    if missing(chars) then m+1;

    else n+1;

  end;

  if last then do;

    percent_missing=100*m/sum(m,n);

    output;

  end;

run;

proc print data=want;

run;

Respected Advisor
Posts: 3,156

Is there any macro to check percentage of missing values in a dataset

Art, Wow! I did not know there is such a thing called 'do over'. I was thinking using proc content, and get the counts of numeric and character variable into macro variables. Much appreciated!

Haikuo

PROC Star
Posts: 7,492

Is there any macro to check percentage of missing values in a dataset

Just be alerted that some users have said that we shouldn't use that form of a loop because it is no longer documented.

I happen to like it and, unlike functions, etc. that have never been documented, this one has been, but isn't in the current documentation.

Super User
Posts: 10,044

Is there any macro to check percentage of missing values in a dataset

Or SQL.

/*create a test file with some missing data*/
data have;
  set sashelp.class;
  if _n_ in (5,10,12) then do;
    call missing(height,weight);
  end;
  call missing(age);
run;
proc sql ;
 select name into : list separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE';

 select nvar into : nvar
  from dictionary.tables
   where libname='WORK' and memname='HAVE';

 select count(*) into : total from have;

 select sum(cmiss(&list))/(&nvar * &total) as percent
  from have;

quit;

Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 949 views
  • 6 likes
  • 6 in conversation