BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I have a question about how to go about getting a count of all missing and all nonmissing entries in a wide and not too long dataset.

For example, I have REALD fields like

 

DEYE     DEAR     DMHD    DCOM     DPHY     DDRS  .....  (there are more)

 

individual is asked if they experience difficulty in these areas as well as others

 

where  DEYE  =  blind

            DEAR =   hearing

            DMHD =  mood, intense feelings,control of behaviour

            DCOM =  communicating (understanding or being understood)

            DPHY =   walking; climbing stairs

            DDRS =   dressing and bathing

 

the dataset looks like this

                                            DEYE     DEAR     DMHD    DCOM     DPHY     DDRS  .....

Obs 1                                   No            Yes          No          No                        No          

Obs 2                                   Yes           No           No          No           No        No

Obs 3                                   No            No           No          No           No    

Obs 4                                                                   Yes         No           No        Yes

   .

   .

   .

Obs n                                  No            No           No            No          No         No

 

 

So each cell can contain a 'No', 'Yes",  or neither  '  '  missing

 

My question is what code approach I could take to be able to check each field (and there are many many others with the same type of possible responses {No, Yes, missing})   in order to get a count for each

field by type of response?

I hope that makes some sense.

 

Thank you for help and assistance.

 

wklierman

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

TRANSPOSE + PROC FREQ/TABULATE. 

 

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
		end;
		output;
	end;
	drop i;
run;

proc transpose data=have out=long (rename=(_name_=Code col1=Value));
	by obs;
	var DDA--DDD;
run;

*you probably want to control the names better here;

proc freq data=long;
	table CODE*VALUE / out=want;
run;

If you're interested in missing/non missing only, you can use the a custom format.

This is overkill but also works for your example.

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
			if vars(i) = '003' then call missing(vars(i));
		end;
		output;
	end;
	drop i;
run;



*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;
Missing Report of have

Variable	# Missing	N_Not Missing	% Missing	PCT_Not Missing	Missing	Not Missing
DDA	8	42	16.00	84.00	8 ( 16.0%)	42 ( 84.0%)
DDB	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
DDC	15	35	30.00	70.00	15 ( 30.0%)	35 ( 70.0%)
DDD	16	34	32.00	68.00	16 ( 32.0%)	34 ( 68.0%)
DDE	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
obs	.	50	.	100.00	 	50 ( 100%)

 

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111 

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Convert the data to a long format and use Proc Summary, Proc Tabulate or Proc Report. 

 

Post sample data in the form of a data step if you want a usable code answer.

wlierman
Lapis Lazuli | Level 10
Thank you. The data is from OPERA and not deidentified. So I have to maintain the rules that have been formulated.
Reeza
Super User

You can make fake data like I did in my posts though or by spending a few minutes in Excel making similar data.

 


@wlierman wrote:
Thank you. The data is from OPERA and not deidentified. So I have to maintain the rules that have been formulated.

 

Reeza
Super User

TRANSPOSE + PROC FREQ/TABULATE. 

 

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
		end;
		output;
	end;
	drop i;
run;

proc transpose data=have out=long (rename=(_name_=Code col1=Value));
	by obs;
	var DDA--DDD;
run;

*you probably want to control the names better here;

proc freq data=long;
	table CODE*VALUE / out=want;
run;

If you're interested in missing/non missing only, you can use the a custom format.

This is overkill but also works for your example.

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
			if vars(i) = '003' then call missing(vars(i));
		end;
		output;
	end;
	drop i;
run;



*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;
Missing Report of have

Variable	# Missing	N_Not Missing	% Missing	PCT_Not Missing	Missing	Not Missing
DDA	8	42	16.00	84.00	8 ( 16.0%)	42 ( 84.0%)
DDB	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
DDC	15	35	30.00	70.00	15 ( 30.0%)	35 ( 70.0%)
DDD	16	34	32.00	68.00	16 ( 32.0%)	34 ( 68.0%)
DDE	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
obs	.	50	.	100.00	 	50 ( 100%)

 

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111 

wlierman
Lapis Lazuli | Level 10

Thank you for the very extensive response.

 

It looks to be a perfect response for what I need to do.

 

Take care.

 

wklierman

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 563 views
  • 5 likes
  • 3 in conversation