I have a question about how to go about getting a count of all missing and all nonmissing entries in a wide and not too long dataset.
For example, I have REALD fields like
DEYE DEAR DMHD DCOM DPHY DDRS ..... (there are more)
individual is asked if they experience difficulty in these areas as well as others
where DEYE = blind
DEAR = hearing
DMHD = mood, intense feelings,control of behaviour
DCOM = communicating (understanding or being understood)
DPHY = walking; climbing stairs
DDRS = dressing and bathing
the dataset looks like this
DEYE DEAR DMHD DCOM DPHY DDRS .....
Obs 1 No Yes No No No
Obs 2 Yes No No No No No
Obs 3 No No No No No
Obs 4 Yes No No Yes
.
.
.
Obs n No No No No No No
So each cell can contain a 'No', 'Yes", or neither ' ' missing
My question is what code approach I could take to be able to check each field (and there are many many others with the same type of possible responses {No, Yes, missing}) in order to get a count for each
field by type of response?
I hope that makes some sense.
Thank you for help and assistance.
wklierman
TRANSPOSE + PROC FREQ/TABULATE.
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DDA DDB DDC DDE DDD;
do obs=1 to 50;
do i=1 to dim(vars);
vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
end;
output;
end;
drop i;
run;
proc transpose data=have out=long (rename=(_name_=Code col1=Value));
by obs;
var DDA--DDD;
run;
*you probably want to control the names better here;
proc freq data=long;
table CODE*VALUE / out=want;
run;
If you're interested in missing/non missing only, you can use the a custom format.
This is overkill but also works for your example.
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DDA DDB DDC DDE DDD;
do obs=1 to 50;
do i=1 to dim(vars);
vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
if vars(i) = '003' then call missing(vars(i));
end;
output;
end;
drop i;
run;
*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;
Missing Report of have Variable # Missing N_Not Missing % Missing PCT_Not Missing Missing Not Missing DDA 8 42 16.00 84.00 8 ( 16.0%) 42 ( 84.0%) DDB 11 39 22.00 78.00 11 ( 22.0%) 39 ( 78.0%) DDC 15 35 30.00 70.00 15 ( 30.0%) 35 ( 70.0%) DDD 16 34 32.00 68.00 16 ( 32.0%) 34 ( 68.0%) DDE 11 39 22.00 78.00 11 ( 22.0%) 39 ( 78.0%) obs . 50 . 100.00 50 ( 100%)
https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111
Convert the data to a long format and use Proc Summary, Proc Tabulate or Proc Report.
Post sample data in the form of a data step if you want a usable code answer.
You can make fake data like I did in my posts though or by spending a few minutes in Excel making similar data.
@wlierman wrote:
Thank you. The data is from OPERA and not deidentified. So I have to maintain the rules that have been formulated.
TRANSPOSE + PROC FREQ/TABULATE.
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DDA DDB DDC DDE DDD;
do obs=1 to 50;
do i=1 to dim(vars);
vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
end;
output;
end;
drop i;
run;
proc transpose data=have out=long (rename=(_name_=Code col1=Value));
by obs;
var DDA--DDD;
run;
*you probably want to control the names better here;
proc freq data=long;
table CODE*VALUE / out=want;
run;
If you're interested in missing/non missing only, you can use the a custom format.
This is overkill but also works for your example.
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DDA DDB DDC DDE DDD;
do obs=1 to 50;
do i=1 to dim(vars);
vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
if vars(i) = '003' then call missing(vars(i));
end;
output;
end;
drop i;
run;
*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;
Missing Report of have Variable # Missing N_Not Missing % Missing PCT_Not Missing Missing Not Missing DDA 8 42 16.00 84.00 8 ( 16.0%) 42 ( 84.0%) DDB 11 39 22.00 78.00 11 ( 22.0%) 39 ( 78.0%) DDC 15 35 30.00 70.00 15 ( 30.0%) 35 ( 70.0%) DDD 16 34 32.00 68.00 16 ( 32.0%) 34 ( 68.0%) DDE 11 39 22.00 78.00 11 ( 22.0%) 39 ( 78.0%) obs . 50 . 100.00 50 ( 100%)
https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111
Thank you for the very extensive response.
It looks to be a perfect response for what I need to do.
Take care.
wklierman
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.