BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kegyp20057
Calcite | Level 5

I hope this is an easy question.  I have a dataset of survey responses.  Most of the survey reponses use either a likert response scale (ex, very dissatisfied/dissatisfied/satisfied/very satisfied) or yes/no response scale.  I'm looking for a way to generate a table that lists how many overall valid and missing responses for each variable.  Ideally, the report would look something like this:

Survey Question# Valid# Missing
I am satisfied with my education.1,254139
I graduated in four years or less.1,109284
I would recommend this institution to others.1,3930
It is important to participate in community service activities.1,008385

The data look like this:

EducationGraduatedRecommendCommServ
Very SatisfiedYesYes
NoVery Important
DissatisfiedYesNot Important

Thanks in advance for you help.

Karen E.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I have a macro you can use, but it counts missing, which may or may not be the same as invalid data, depending on how much you cleaned it up ahead of time.

It uses your datalabels from the dataset so the output dataset is formatted nicely.  Its definitely not efficient, but it works and I haven't seen reason to modify it Smiley Happy

/*This is a macro to summarize a data set into missing and non-missing observations*/

%macro sum_missing(libname, dsetin, dsetout);

*Delete old dataset;

proc datasets nodetails nolist;

    delete &dsetout;

quit;

*Upcase all macro variables to have consistency;

data _null_;

    call symput ("libname", upcase("&libname."));

    call symput ("dsetin", upcase("&dsetin."));

run;

*Formats for missing for character and numeric variables;

proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

ods table variablesshort=varshort;

proc contents data=&libname..&dsetin. short;

run;

data _null_;

set varshort;

call symput ("var_list", variables);

run;

proc sql noprint;

    *Get count of total number of observations into macro variable;

    select count(*) into : obs_count

    from &libname..&dsetin.;

quit;

*Start looping through the variable list here;

%let i=1;

%do %while (%scan(&var_list, &i, " ") ^=%str());

%let var=%scan(&var_list, &i, " ");   

*Get count of missing;

proc freq data=&libname..&dsetin. noprint;

    table &var/missing out=cross_tab1;

    format _character_ $missfmt. _numeric_ nmissfmt.;

run;

*Get format of missing;

data _null_;

set cross_tab1;

call symput("var_fmt", vformat(&var));

call symput("var_label", vlabel(&var));

run;

data cross_tab2;

    set cross_tab1;

    length variable $50.;

    category=put(&var, &var_fmt.);

    variable="&var_label.";

   

    if _n_=1 and category='Not Missing' then do;;

        Number_Missing=&obs_count-count;

        Percent_Missing=Number_Missing/&obs_count.;

        percent=percent/100;

        output;

    end;

    else if _n_=1 and category='Missing' then do;

       

        Number_Missing=count;

        Percent_Missing=percent/100;

        output;

    end;

        format percent: percent10.1;

    keep variable Number_Missing Percent_Missing;

run;

proc append base=&dsetout data=cross_tab2 force;

run;

proc datasets nodetails nolist;

    delete cross_tab1 cross_tab2;

run; quit;

*Increment counter;

%let i=%eval(&i+1);

%end; *Categorical;

proc datasets nodetails nolist;

    delete step1;

run; quit;

%mend;

data class;

set sashelp.class;

if age=14 then call missing(height, weight, sex);

if name='Alfred' then call missing(sex, age, height);

label age="Fancy Age Label";

run;

%sum_missing(work, class, class_missing);

View solution in original post

7 REPLIES 7
ballardw
Super User

Procs Report, Tabulate and Means/Summary  support the N, number values not missing, and NMISS, number missing, statistics.

If your responses are not numeric you probably want Report or Tabulate.

An example for Tabulate:

proc tabulate data=have;

     class q1 q2 q3 / missing; /* you want to count missing*/

     table q1 q2 q3, n='# valid'*f=comma6. nmiss='# missing'*f=comma6.;

run;

Where "have" is your data set and q1 q2 q3 are your question variable names.

If you have labels assigned to your variables you should get a nice table similar to what you wanted

kegyp20057
Calcite | Level 5

Hi,

Thanks for your suggestion.  This was helpful.  However, the only issue is that it would give me the number of valid/missing per response value as well as the question.  I want it to only return the question.  However, I have the answer I needed.

Thanks,

Karen

Reeza
Super User

I have a macro you can use, but it counts missing, which may or may not be the same as invalid data, depending on how much you cleaned it up ahead of time.

It uses your datalabels from the dataset so the output dataset is formatted nicely.  Its definitely not efficient, but it works and I haven't seen reason to modify it Smiley Happy

/*This is a macro to summarize a data set into missing and non-missing observations*/

%macro sum_missing(libname, dsetin, dsetout);

*Delete old dataset;

proc datasets nodetails nolist;

    delete &dsetout;

quit;

*Upcase all macro variables to have consistency;

data _null_;

    call symput ("libname", upcase("&libname."));

    call symput ("dsetin", upcase("&dsetin."));

run;

*Formats for missing for character and numeric variables;

proc format;

value $ missfmt ' '="Missing"

other="Not Missing"

;

value nmissfmt . ="Missing"

other="Not Missing"

;

run;

ods table variablesshort=varshort;

proc contents data=&libname..&dsetin. short;

run;

data _null_;

set varshort;

call symput ("var_list", variables);

run;

proc sql noprint;

    *Get count of total number of observations into macro variable;

    select count(*) into : obs_count

    from &libname..&dsetin.;

quit;

*Start looping through the variable list here;

%let i=1;

%do %while (%scan(&var_list, &i, " ") ^=%str());

%let var=%scan(&var_list, &i, " ");   

*Get count of missing;

proc freq data=&libname..&dsetin. noprint;

    table &var/missing out=cross_tab1;

    format _character_ $missfmt. _numeric_ nmissfmt.;

run;

*Get format of missing;

data _null_;

set cross_tab1;

call symput("var_fmt", vformat(&var));

call symput("var_label", vlabel(&var));

run;

data cross_tab2;

    set cross_tab1;

    length variable $50.;

    category=put(&var, &var_fmt.);

    variable="&var_label.";

   

    if _n_=1 and category='Not Missing' then do;;

        Number_Missing=&obs_count-count;

        Percent_Missing=Number_Missing/&obs_count.;

        percent=percent/100;

        output;

    end;

    else if _n_=1 and category='Missing' then do;

       

        Number_Missing=count;

        Percent_Missing=percent/100;

        output;

    end;

        format percent: percent10.1;

    keep variable Number_Missing Percent_Missing;

run;

proc append base=&dsetout data=cross_tab2 force;

run;

proc datasets nodetails nolist;

    delete cross_tab1 cross_tab2;

run; quit;

*Increment counter;

%let i=%eval(&i+1);

%end; *Categorical;

proc datasets nodetails nolist;

    delete step1;

run; quit;

%mend;

data class;

set sashelp.class;

if age=14 then call missing(height, weight, sex);

if name='Alfred' then call missing(sex, age, height);

label age="Fancy Age Label";

run;

%sum_missing(work, class, class_missing);

kegyp20057
Calcite | Level 5

Hi Reeza,

Hopefully, I'm replying to the correct message.  I'm new to this forum.  Thanks for your help.  This definitely solves my problem.

Karen

Scott_Mitchell
Quartz | Level 8

Hi Kegyp,

I am glad you posted this question.  It has been a long time since I have used Proc Tabulate, even a simple one like this and it gave me an excuse to brush up on my skills.

I hope it helps you, if you haven't solved this problem already.

DATA HAVE ;

  INFILE CARDS DLM='|' DSD TRUNCOVER ;

  LENGTH EDUCATION $14 GRADUATED $3 RECOMMEND $3 COMMSERV $14;

  INPUT EDUCATION $ GRADUATED $ RECOMMEND $ COMMSERV $;

CARDS;

VERY SATISFIED|YES|YES|

|NO||VERY IMPORTANT

DISSATISFIED||YES|NOT IMPORTANT

;

RUN;

PROC TRANSPOSE DATA=HAVE DELIMITER=_ OUT=HAVE1;

  VAR EDUCATION GRADUATED RECOMMEND COMMSERV;

RUN;

PROC TRANSPOSE DATA=HAVE1 DELIMITER=_ OUT=HAVETRANS;

  BY _NAME_ NOTSORTED;

  VAR  COL:  ;

RUN;

PROC FORMAT;

VALUE $MISS " "="# MISSING"

OTHER="# VALID";

VALUE $DESC "EDUCATION" = "I AM SATISFIED WITH MY EDUCATION."

             "GRADUATED" = "I GRADUATED IN FOUR YEARS OR LESS."

  "RECOMMEND" = "I WOULD RECOMMEND THIS INSTITUTION TO OTHERS."

  "COMMSERV"  = "IT IS IMPORTANT TO PARTICIPATE IN COMMUNITY SERVICE ACTIVITIES.";

RUN;

PROC TABULATE DATA = HAVETRANS OUT = STUFF MISSING;

     CLASS _NAME_ COL1 / MISSING;

     TABLE _NAME_="",COL1=""*(N="");

  FORMAT _NAME_ $DESC. COL1 $MISS.;

RUN;

kegyp20057
Calcite | Level 5

Scott,

Thanks for your help.  Another suggestion worked better for me just because I have quite a few variables.  However, your suggestion solves my problem.

Karen

kegyp20057
Calcite | Level 5

Hi Reeza,

Hopefully, I'm replying to the correct message.  I'm new to this forum.  Thanks for your help.  This definitely solves my problem.

Karen

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1446 views
  • 6 likes
  • 4 in conversation