You can use SAS features, such as formats to make this easier.
See this example (note this is overkill and generic code so possibly a little bit longer than what you need). You should just need to change the input data set name and desired output data set name to have it work for you.
/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;
data class;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;
@shami wrote:
I would like to create a table that looks like this:
Demographics |
Identified |
|
Missing |
|
Total |
|
N |
% |
N |
% |
|
PATIENT_COUNTY1
|
|
|
|
|
|
PATIENT_RACE1
|
|
|
|
|
|
PATIENT_ETHNICITY1
|
|
|
|
|
|
PATIENT_BIRTH_SEX1
|
|
|
|
|
|
PATIENT_DOB1
|
|
|
|
|
|
I have the following code below, I don't know how to proceed from here or if I'm doing the right thing or if this step is even necessary.
Thank you.
DATA WANT;
SET HAVE;
LENGTH PATIENT_COUNTY1 $20
PATIENT_RACE1 $50
PATIENT_ETHNICITY1 $50
PATIENT_BIRTH_SEX1 $20
PATIENT_DOB1 $50
GENDER_IDENTITY1 $20;
IF PATIENT_COUNTY = '' THEN PATIENT_COUNTY1 = 'MISSING';
ELSE PATIENT_COUNTY1 = 'IDENTIFIED';
IF PATIENT_RACE = '' THEN PATIENT_RACE1 = 'MISSING';
ELSE PATIENT_RACE1 = 'IDENTIFIED';
IF PATIENT_ETHNICITY = '' THEN PATIENT_ETHNICITY1 = 'MISSING';
ELSE PATIENT_ETHNICITY1 = 'IDENTIFIED';
IF PATIENT_BIRTH_SEX = '' THEN PATIENT_BIRTH_SEX1 = 'MISSING';
ELSE PATIENT_BIRTH_SEX1 = 'IDENTIFIED';
IF PATIENT_DOB = '' THEN PATIENT_DOB1 = 'MISSING';
ELSE PATIENT_DOB1 = 'IDENTIFIED';
IF GENDER_IDENTITY = '' THEN GENDER_IDENTITY1 = 'MISSING';
ELSE GENDER_IDENTITY1 = 'IDENTIFIED';
RUN;
The above code gives me the following table:
ID
|
PATIENT_COUNTY1
|
PATIENT_RACE1
|
PATIENT_ETHNICITY1
|
PATIENT_BIRTH_SEX1
|
PATIENT_DOB1
|
1
|
IDENTIFIED
|
IDENTIFIED
|
MISSING
|
IDENTIFIED
|
MISSING
|
2
|
IDENTIFIED
|
IDENTIFIED
|
MISSING
|
IDENTIFIED
|
MISSING
|
3
|
IDENTIFIED
|
IDENTIFIED
|
MISSING
|
IDENTIFIED
|
MISSING
|