Solved: How to get a count of missing and nonmissing entries in a sparse datas...

wlierman · Posted 05-04-2021 01:59 PM

I have a question about how to go about getting a count of all missing and all nonmissing entries in a wide and not too long dataset.

For example, I have REALD fields like

DEYE DEAR DMHD DCOM DPHY DDRS ..... (there are more)

individual is asked if they experience difficulty in these areas as well as others

where DEYE = blind

DEAR = hearing

DMHD = mood, intense feelings,control of behaviour

DCOM = communicating (understanding or being understood)

DPHY = walking; climbing stairs

DDRS = dressing and bathing

the dataset looks like this

DEYE DEAR DMHD DCOM DPHY DDRS .....

Obs 1 No Yes No No No

Obs 2 Yes No No No No No

Obs 3 No No No No No

Obs 4 Yes No No Yes

.

Obs n No No No No No No

So each cell can contain a 'No', 'Yes", or neither ' ' missing

My question is what code approach I could take to be able to check each field (and there are many many others with the same type of possible responses {No, Yes, missing}) in order to get a count for each

field by type of response?

I hope that makes some sense.

Thank you for help and assistance.

wklierman

Reeza · Posted 05-04-2021 02:15 PM

TRANSPOSE + PROC FREQ/TABULATE.

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
		end;
		output;
	end;
	drop i;
run;

proc transpose data=have out=long (rename=(_name_=Code col1=Value));
	by obs;
	var DDA--DDD;
run;

*you probably want to control the names better here;

proc freq data=long;
	table CODE*VALUE / out=want;
run;

If you're interested in missing/non missing only, you can use the a custom format.

This is overkill but also works for your example.

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
			if vars(i) = '003' then call missing(vars(i));
		end;
		output;
	end;
	drop i;
run;



*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

Missing Report of have

Variable	# Missing	N_Not Missing	% Missing	PCT_Not Missing	Missing	Not Missing
DDA	8	42	16.00	84.00	8 ( 16.0%)	42 ( 84.0%)
DDB	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
DDC	15	35	30.00	70.00	15 ( 30.0%)	35 ( 70.0%)
DDD	16	34	32.00	68.00	16 ( 32.0%)	34 ( 68.0%)
DDE	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
obs	.	50	.	100.00	 	50 ( 100%)

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111

View solution in original post

PeterClemmensen · Posted 05-04-2021 02:03 PM

Convert the data to a long format and use Proc Summary, Proc Tabulate or Proc Report.

Post sample data in the form of a data step if you want a usable code answer.

The DATA to DATA Step Macro
Blog: SASnrd

wlierman · Posted 05-04-2021 02:37 PM

Thank you. The data is from OPERA and not deidentified. So I have to maintain the rules that have been formulated.

Reeza · Posted 05-04-2021 02:40 PM

You can make fake data like I did in my posts though or by spending a few minutes in Excel making similar data.

@wlierman wrote:
Thank you. The data is from OPERA and not deidentified. So I have to maintain the rules that have been formulated.

Reeza · Posted 05-04-2021 02:15 PM

TRANSPOSE + PROC FREQ/TABULATE.

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
		end;
		output;
	end;
	drop i;
run;

proc transpose data=have out=long (rename=(_name_=Code col1=Value));
	by obs;
	var DDA--DDD;
run;

*you probably want to control the names better here;

proc freq data=long;
	table CODE*VALUE / out=want;
run;

If you're interested in missing/non missing only, you can use the a custom format.

This is overkill but also works for your example.

data have;
	call streaminit(45);
	length obs 8.;
	array vars(*) $3. DDA DDB DDC DDE DDD;

	do obs=1 to 50;

		do i=1 to dim(vars);
			vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
			if vars(i) = '003' then call missing(vars(i));
		end;
		output;
	end;
	drop i;
run;



*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

Missing Report of have

Variable	# Missing	N_Not Missing	% Missing	PCT_Not Missing	Missing	Not Missing
DDA	8	42	16.00	84.00	8 ( 16.0%)	42 ( 84.0%)
DDB	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
DDC	15	35	30.00	70.00	15 ( 30.0%)	35 ( 70.0%)
DDD	16	34	32.00	68.00	16 ( 32.0%)	34 ( 68.0%)
DDE	11	39	22.00	78.00	11 ( 22.0%)	39 ( 78.0%)
obs	.	50	.	100.00	 	50 ( 100%)

https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111

wlierman · Posted 05-04-2021 02:34 PM

Thank you for the very extensive response.

It looks to be a perfect response for what I need to do.

Take care.

wklierman

How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

Re: How to get a count of missing and nonmissing entries in a sparse dataset

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away