BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I have a question about how to go about getting a count of all missing and all nonmissing entries in a wide and not too long dataset.

For example, I have REALD fields like

 

DEYE     DEAR     DMHD    DCOM     DPHY     DDRS  .....  (there are more)

 

individual is asked if they experience difficulty in these areas as well as others

 

where  DEYE  =  blind

            DEAR =   hearing

            DMHD =  mood, intense feelings,control of behaviour

            DCOM =  communicating (understanding or being understood)

            DPHY =   walking; climbing stairs

            DDRS =   dressing and bathing

 

the dataset looks like this

                                            DEYE     DEAR     DMHD    DCOM     DPHY     DDRS  .....

Obs 1                                   No            Yes          No          No                        No          

Obs 2                                   Yes           No           No          No           No        No

Obs 3                                   No            No           No          No           No    

Obs 4                                                                   Yes         No           No        Yes

   .

   .

   .

Obs n                                  No            No           No            No          No         No

 

 

So each cell can contain a 'No', 'Yes",  or neither  '  '  missing

 

My question is what code approach I could take to be able to check each field (and there are many many others with the same type of possible responses {No, Yes, missing})   in order to get a count for each

field by type of response?

I hope that makes some sense.

 

Thank you for help and assistance.

 

wklierman

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

TRANSPOSE + PROC FREQ/TABULATE. 

 

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
		end;
		output;
	end;
	drop i;
run;

proc transpose data=have out=long (rename=(_name_=Code col1=Value));
	by obs;
	var DDA--DDD;
run;

*you probably want to control the names better here;

proc freq data=long;
	table CODE*VALUE / out=want;
run;

If you're interested in missing/non missing only, you can use the a custom format.

This is overkill but also works for your example.

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
			if vars(i) = '003' then call missing(vars(i));
		end;
		output;
	end;
	drop i;
run;



*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;
Missing Report of have

Variable	# Missing	N_Not Missing	% Missing	PCT_Not Missing	Missing	Not Missing
DDA	8	42	16.00	84.00	8 ( 16.0%)	42 ( 84.0%)
DDB	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
DDC	15	35	30.00	70.00	15 ( 30.0%)	35 ( 70.0%)
DDD	16	34	32.00	68.00	16 ( 32.0%)	34 ( 68.0%)
DDE	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
obs	.	50	.	100.00	 	50 ( 100%)

 

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111 

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Convert the data to a long format and use Proc Summary, Proc Tabulate or Proc Report. 

 

Post sample data in the form of a data step if you want a usable code answer.

wlierman
Lapis Lazuli | Level 10
Thank you. The data is from OPERA and not deidentified. So I have to maintain the rules that have been formulated.
Reeza
Super User

You can make fake data like I did in my posts though or by spending a few minutes in Excel making similar data.

 


@wlierman wrote:
Thank you. The data is from OPERA and not deidentified. So I have to maintain the rules that have been formulated.

 

Reeza
Super User

TRANSPOSE + PROC FREQ/TABULATE. 

 

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
		end;
		output;
	end;
	drop i;
run;

proc transpose data=have out=long (rename=(_name_=Code col1=Value));
	by obs;
	var DDA--DDD;
run;

*you probably want to control the names better here;

proc freq data=long;
	table CODE*VALUE / out=want;
run;

If you're interested in missing/non missing only, you can use the a custom format.

This is overkill but also works for your example.

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
			if vars(i) = '003' then call missing(vars(i));
		end;
		output;
	end;
	drop i;
run;



*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;
Missing Report of have

Variable	# Missing	N_Not Missing	% Missing	PCT_Not Missing	Missing	Not Missing
DDA	8	42	16.00	84.00	8 ( 16.0%)	42 ( 84.0%)
DDB	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
DDC	15	35	30.00	70.00	15 ( 30.0%)	35 ( 70.0%)
DDD	16	34	32.00	68.00	16 ( 32.0%)	34 ( 68.0%)
DDE	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
obs	.	50	.	100.00	 	50 ( 100%)

 

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111 

wlierman
Lapis Lazuli | Level 10

Thank you for the very extensive response.

 

It looks to be a perfect response for what I need to do.

 

Take care.

 

wklierman

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 734 views
  • 5 likes
  • 3 in conversation