Summary table of missed and identified entries

shami · Posted 08-19-2024 12:47 PM

I would like to create a table that looks like this:

Demographics	Identified		Missing		Total
	N	%	N	%
PATIENT_COUNTY1
PATIENT_RACE1
PATIENT_ETHNICITY1
PATIENT_BIRTH_SEX1
PATIENT_DOB1

I have the following code below, I don't know how to proceed from here or if I'm doing the right thing or if this step is even necessary.

Thank you.

DATA WANT;
SET HAVE;

LENGTH PATIENT_COUNTY1 $20 
PATIENT_RACE1 $50
PATIENT_ETHNICITY1 $50
PATIENT_BIRTH_SEX1 $20 
PATIENT_DOB1 $50
GENDER_IDENTITY1 $20;

IF PATIENT_COUNTY = '' THEN PATIENT_COUNTY1 = 'MISSING';
ELSE PATIENT_COUNTY1 = 'IDENTIFIED';

IF PATIENT_RACE = '' THEN PATIENT_RACE1 = 'MISSING';
ELSE PATIENT_RACE1 = 'IDENTIFIED';

IF PATIENT_ETHNICITY = '' THEN PATIENT_ETHNICITY1 = 'MISSING';
ELSE PATIENT_ETHNICITY1 = 'IDENTIFIED';

IF PATIENT_BIRTH_SEX = '' THEN PATIENT_BIRTH_SEX1 = 'MISSING';
ELSE PATIENT_BIRTH_SEX1 = 'IDENTIFIED';

IF PATIENT_DOB = '' THEN PATIENT_DOB1 = 'MISSING';
ELSE PATIENT_DOB1 = 'IDENTIFIED';

IF GENDER_IDENTITY = '' THEN GENDER_IDENTITY1 = 'MISSING';
ELSE GENDER_IDENTITY1 = 'IDENTIFIED';

RUN;

The above code gives me the following table:

ID	PATIENT_COUNTY1	PATIENT_RACE1	PATIENT_ETHNICITY1	PATIENT_BIRTH_SEX1	PATIENT_DOB1
1	IDENTIFIED	IDENTIFIED	MISSING	IDENTIFIED	MISSING
2	IDENTIFIED	IDENTIFIED	MISSING	IDENTIFIED	MISSING
3	IDENTIFIED	IDENTIFIED	MISSING	IDENTIFIED	MISSING

Reeza · Posted 08-19-2024 03:10 PM

You can use SAS features, such as formats to make this easier.

See this example (note this is overkill and generic code so possibly a little bit longer than what you need). You should just need to change the input data set name and desired output data set name to have it work for you.

/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;

@shami wrote:

I would like to create a table that looks like this:

Demographics Identified Missing Total

N % N %

PATIENT_COUNTY1

PATIENT_RACE1

PATIENT_ETHNICITY1

PATIENT_BIRTH_SEX1

PATIENT_DOB1

I have the following code below, I don't know how to proceed from here or if I'm doing the right thing or if this step is even necessary.

Thank you.
DATA WANT;
SET HAVE;

LENGTH PATIENT_COUNTY1 $20 
PATIENT_RACE1 $50
PATIENT_ETHNICITY1 $50
PATIENT_BIRTH_SEX1 $20 
PATIENT_DOB1 $50
GENDER_IDENTITY1 $20;

IF PATIENT_COUNTY = '' THEN PATIENT_COUNTY1 = 'MISSING';
ELSE PATIENT_COUNTY1 = 'IDENTIFIED';

IF PATIENT_RACE = '' THEN PATIENT_RACE1 = 'MISSING';
ELSE PATIENT_RACE1 = 'IDENTIFIED';

IF PATIENT_ETHNICITY = '' THEN PATIENT_ETHNICITY1 = 'MISSING';
ELSE PATIENT_ETHNICITY1 = 'IDENTIFIED';

IF PATIENT_BIRTH_SEX = '' THEN PATIENT_BIRTH_SEX1 = 'MISSING';
ELSE PATIENT_BIRTH_SEX1 = 'IDENTIFIED';

IF PATIENT_DOB = '' THEN PATIENT_DOB1 = 'MISSING';
ELSE PATIENT_DOB1 = 'IDENTIFIED';

IF GENDER_IDENTITY = '' THEN GENDER_IDENTITY1 = 'MISSING';
ELSE GENDER_IDENTITY1 = 'IDENTIFIED';

RUN;
The above code gives me the following table:

ID

PATIENT_COUNTY1

PATIENT_RACE1

PATIENT_ETHNICITY1

PATIENT_BIRTH_SEX1

PATIENT_DOB1

1

IDENTIFIED

IDENTIFIED

MISSING

IDENTIFIED

MISSING

2

IDENTIFIED

IDENTIFIED

MISSING

IDENTIFIED

MISSING

3

IDENTIFIED

IDENTIFIED

MISSING

IDENTIFIED

MISSING

Summary table of missed and identified entries

Re: Summary table of missed and identified entries

2025 SAS Hackathon: There is still time!