I have a question about how to go about getting a count of all missing and all nonmissing entries in a wide and not too long dataset.
For example, I have REALD fields like
DEYE DEAR DMHD DCOM DPHY DDRS ..... (there are more)
individual is asked if they experience difficulty in these areas as well as others
where DEYE = blind
DEAR = hearing
DMHD = mood, intense feelings,control of behaviour
DCOM = communicating (understanding or being understood)
DPHY = walking; climbing stairs
DDRS = dressing and bathing
the dataset looks like this
DEYE DEAR DMHD DCOM DPHY DDRS .....
Obs 1 No Yes No No No
Obs 2 Yes No No No No No
Obs 3 No No No No No
Obs 4 Yes No No Yes
.
.
.
Obs n No No No No No No
So each cell can contain a 'No', 'Yes", or neither ' ' missing
My question is what code approach I could take to be able to check each field (and there are many many others with the same type of possible responses {No, Yes, missing}) in order to get a count for each
field by type of response?
I hope that makes some sense.
Thank you for help and assistance.
wklierman
TRANSPOSE + PROC FREQ/TABULATE.
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DDA DDB DDC DDE DDD;
do obs=1 to 50;
do i=1 to dim(vars);
vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
end;
output;
end;
drop i;
run;
proc transpose data=have out=long (rename=(_name_=Code col1=Value));
by obs;
var DDA--DDD;
run;
*you probably want to control the names better here;
proc freq data=long;
table CODE*VALUE / out=want;
run;
If you're interested in missing/non missing only, you can use the a custom format.
This is overkill but also works for your example.
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DDA DDB DDC DDE DDD;
do obs=1 to 50;
do i=1 to dim(vars);
vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
if vars(i) = '003' then call missing(vars(i));
end;
output;
end;
drop i;
run;
*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;
Missing Report of have Variable # Missing N_Not Missing % Missing PCT_Not Missing Missing Not Missing DDA 8 42 16.00 84.00 8 ( 16.0%) 42 ( 84.0%) DDB 11 39 22.00 78.00 11 ( 22.0%) 39 ( 78.0%) DDC 15 35 30.00 70.00 15 ( 30.0%) 35 ( 70.0%) DDD 16 34 32.00 68.00 16 ( 32.0%) 34 ( 68.0%) DDE 11 39 22.00 78.00 11 ( 22.0%) 39 ( 78.0%) obs . 50 . 100.00 50 ( 100%)
https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111
Convert the data to a long format and use Proc Summary, Proc Tabulate or Proc Report.
Post sample data in the form of a data step if you want a usable code answer.
You can make fake data like I did in my posts though or by spending a few minutes in Excel making similar data.
@wlierman wrote:
Thank you. The data is from OPERA and not deidentified. So I have to maintain the rules that have been formulated.
TRANSPOSE + PROC FREQ/TABULATE.
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DDA DDB DDC DDE DDD;
do obs=1 to 50;
do i=1 to dim(vars);
vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
end;
output;
end;
drop i;
run;
proc transpose data=have out=long (rename=(_name_=Code col1=Value));
by obs;
var DDA--DDD;
run;
*you probably want to control the names better here;
proc freq data=long;
table CODE*VALUE / out=want;
run;
If you're interested in missing/non missing only, you can use the a custom format.
This is overkill but also works for your example.
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DDA DDB DDC DDE DDD;
do obs=1 to 50;
do i=1 to dim(vars);
vars(i)=put(rand('table', 0.4, 0.4 , 0.2), z3.);
if vars(i) = '003' then call missing(vars(i));
end;
output;
end;
drop i;
run;
*set input data set name;
%let INPUT_DSN = have;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;
Missing Report of have Variable # Missing N_Not Missing % Missing PCT_Not Missing Missing Not Missing DDA 8 42 16.00 84.00 8 ( 16.0%) 42 ( 84.0%) DDB 11 39 22.00 78.00 11 ( 22.0%) 39 ( 78.0%) DDC 15 35 30.00 70.00 15 ( 30.0%) 35 ( 70.0%) DDD 16 34 32.00 68.00 16 ( 32.0%) 34 ( 68.0%) DDE 11 39 22.00 78.00 11 ( 22.0%) 39 ( 78.0%) obs . 50 . 100.00 50 ( 100%)
https://gist.github.com/statgeek/2de1faf1644dc8160fe721056202f111
Thank you for the very extensive response.
It looks to be a perfect response for what I need to do.
Take care.
wklierman
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.