BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ruth
Fluorite | Level 6

Just feel curious. Thanks. If you have, can you post it here? Smiley Happy

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

You could just use a simple datastep.  e.g.:

/*create a test file with some missing data*/

data have;

  set sashelp.class;

  if _n_ in (5,10,12) then do;

    call missing(height,weight);

  end;

  call missing(age);

run;

data want (keep=percent_missing);

  set have end=last;

  array nums _numeric_;

  array chars _character_;

  do over nums;

    if missing(nums) then m+1;

    else n+1;

  end;

  do over chars;

    if missing(chars) then m+1;

    else n+1;

  end;

  if last then do;

    percent_missing=100*m/sum(m,n);

    output;

  end;

run;

proc print data=want;

run;

View solution in original post

8 REPLIES 8
Haikuo
Onyx | Level 15

Ruth,

I am sure there must be better approaches, I am here just to start the discussion. following code will only work one variable at a time, but by tweaking it, it could also work on multiple variables.

DATA have;

infile cards;

input a @@;

cards;

1 . 2 . 3 4 5 6 7 . 8 . 9

;

%macro mpcnt (dat=, var=);

data _null_;

set &dat end=last nobs=nobs;

if missing(&var) then _n+1;

if last then do;

       _p= _n/nobs;

put "percent=" _p percent7.2 ;

end;

run;

%mend;

%mpcnt (dat=have,var=a)

Or for SQL solution,

%macro sql_pcnt (dat=, var=);

proc sql;

select sum(missing(&var))/count(*) as Missing_percent format percent7.2

from &dat;

quit;

%mend;

%sql_pcnt (dat=have,var=a)

Hope you can have some taste of the logic and work out some solution for your real life data.

Regards,

Haikuo

Astounding
PROC Star

Ruth,

I imagine there's not much out there in the way of macros, because SAS handles the problem easily enough without macros.

If you can work with the number (instead of percentage) of missing values, it's as easy as this:

proc means data=mydata maxdec=0 nmiss;

run;

If you really must have percentages, then create an output data set:

proc means data=mydata noprint;

   var _numeric_;

   output out=missings (drop=_type_) nmiss=;

run;

In your output data set MISSINGS, each variable name will contain the number of observations that the original variable was missing.  You will also have the automatic variable _FREQ_, which is the total number of observations.  If you bring this into a DATA step, it will be easy enough to calculate the percentages.

Good luck.

Reeza
Super User

You can create a format for missing based on numeric and character variables and run proc freq.

proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

data have;

     set have;

format _numeric_ nmissfmt. _character_ $missfmt.;

run;

proc freq data=have;

run;

But here's my macro. It needs work since I chopped it together from another macro I use, so not all the code in here is actually required.

%macro sum_missing(libname, dsetin, dsetout);

*Delete old dataset;

proc datasets nodetails nolist;

    delete &dsetout;

quit;

*Upcase all macro variables to have consistency;

data _null_;

    call symput ("libname", put(upcase("&libname."), $8.));

    call symput ("dsetin", put(upcase("&dsetin."), $32.));

run;

*Formats for missing for character and numeric variables;

proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

*Copy dataset and apply missing formats to it;

data step1;

    set &libname..&dsetin;

    format _character_ $missfmt. _numeric_ nmissfmt.;

run;

options mprint symbolgen;

*Get variable list;

proc sql noprint;

    select name  into : var_list separated by " "

    from dictionary.columns

    where upcase(libname) ="&libname"

    and upcase(memname)="&dsetin";

    *Get count of total number of observations into macro variable;

    select count(*) into : obs_count

    from &libname..&dsetin.;

quit;

*Start looping through the variable list here;

%let i=1;

%do %while (%scan(&var_list, &i, " ") ^=%str());

%let var=%scan(&var_list, &i, " ");   

*Get format of missing;

data _null_;

set step1;

call symput("var_fmt", vformat(&var));

call symput("var_label", vlabel(&var));

run;

*Get count of missing;

proc freq data=step1 noprint;

    table &var/missing out=cross_tab1;

run;

data cross_tab2;

    set cross_tab1;

    length variable $50.;

    category=put(&var, &var_fmt.);

    variable="&var_label.";

   

    if _n_=1 and category='Not Missing' then do;;

        Number_Missing=&obs_count-count;

        Percent_Missing=Number_Missing/&obs_count.;

        percent=percent/100;

        output;

    end;

    else if _n_=1 and category='Missing' then do;

       

        Number_Missing=count;

        Percent_Missing=percent/100;

        output;

    end;

        format percent: percent10.1;

    keep variable Number_Missing Percent_Missing;

run;

proc append base=&dsetout data=cross_tab2 force;

run;

proc datasets nodetails nolist;

    delete cross_tab1 cross_tab2;

run; quit;

*Increment counter;

%let i=%eval(&i+1);

%end; *Categorical;

proc datasets nodetails nolist;

    delete step1;

run; quit;

%mend;

Ruth
Fluorite | Level 6

A few good answers now. Thanks.

At the same time, can I ask if a program can automatically create a new temporary dataset which reports the percentage of missing values for all variables? Probably some use of the dictionary is needed.

art297
Opal | Level 21

You could just use a simple datastep.  e.g.:

/*create a test file with some missing data*/

data have;

  set sashelp.class;

  if _n_ in (5,10,12) then do;

    call missing(height,weight);

  end;

  call missing(age);

run;

data want (keep=percent_missing);

  set have end=last;

  array nums _numeric_;

  array chars _character_;

  do over nums;

    if missing(nums) then m+1;

    else n+1;

  end;

  do over chars;

    if missing(chars) then m+1;

    else n+1;

  end;

  if last then do;

    percent_missing=100*m/sum(m,n);

    output;

  end;

run;

proc print data=want;

run;

Haikuo
Onyx | Level 15

Art, Wow! I did not know there is such a thing called 'do over'. I was thinking using proc content, and get the counts of numeric and character variable into macro variables. Much appreciated!

Haikuo

art297
Opal | Level 21

Just be alerted that some users have said that we shouldn't use that form of a loop because it is no longer documented.

I happen to like it and, unlike functions, etc. that have never been documented, this one has been, but isn't in the current documentation.

Ksharp
Super User

Or SQL.

/*create a test file with some missing data*/
data have;
  set sashelp.class;
  if _n_ in (5,10,12) then do;
    call missing(height,weight);
  end;
  call missing(age);
run;
proc sql ;
 select name into : list separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE';

 select nvar into : nvar
  from dictionary.tables
   where libname='WORK' and memname='HAVE';

 select count(*) into : total from have;

 select sum(cmiss(&list))/(&nvar * &total) as percent
  from have;

quit;

Ksharp

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 3504 views
  • 6 likes
  • 6 in conversation