I have a table of PATIENTS with missing values, and the dummy variables for the drug columns (STATIN, BETA_BLOCKER, etc.) are coded as 0's and 1's where 0 = No and 1 = Yes.
I want a tabe where:
• Each patient would get a value of 0 (not missing) or 1 (missing) for each variable
• Create column for total number of missing variables per patient
• Create column with a string of 0's and 1's showing the pattern of missing variables per patient
data PATIENTS;
infile datalines dlm=',';
input PAT_ID AGE SEX $ RACE $ LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93704,.,F,Black,90,0,0,0,0,0
93705,26,F,White,120,1,1,1,0,0
93706,28,M,Black,.,0,0,0,0,0
93707,.,F,.,110,0,0,0,0,0
93708,47,M,.,103,0,0,0,0,0
93709,38,.,Black,94,1,0,0,1,1
93710,44,F,Black,102,0,0,0,0,0
93711,.,M,.,.,0,0,0,0,0
93712,24,F,White,.,0,0,0,0,0
93713,40,F,White,.,1,0,0,0,0
93714,40,.,Black,108,0,0,0,0,0
93715,25,M,Black,134,1,1,0,0,1
93716,23,M,White,132,0,0,0,1,0
93717,.,F,White,.,0,1,1,1,0
93718,41,F,.,105,0,0,1,0,0
;
The result would look like the table below:
(Notice that that since the drug columns in the PATIENTS table (STATIN, BETA_BLOCKER, etc.) are dummy variables they will all get 0's since they have a value.)
PAT_ID | AGE | SEX | RACE | LDL | STATIN | BETA_BLOCKER | ACEI | DIURETIC | STEROID | NUMBER_MISSING | PATTERN_MISSING |
93704 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 100000000 |
93705 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 000000000 |
93706 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 000100000 |
93707 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 101000000 |
93708 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 001000000 |
93709 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 010000000 |
93710 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 000000000 |
93711 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 3 | 101100000 |
93712 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 000100000 |
93713 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 000100000 |
93714 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 010000000 |
93715 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 000000000 |
93716 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 000000000 |
93717 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 100100000 |
93718 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 001000000 |
Data is not reading in correctly. Can you try posting it again?
Also, the missing function already assigns 0 or 1. missing(var) gets assigned to 1 if it's missing and 0 if it's not missing.
This is what your current datalines makes. It doesn't look like what you intended to me.
Add DSD option to the INFILE statement. Also FIRSTOBS=2
@tarheel13 I corrected the data so it now reads correctly.
@ballardw already gave you a solution, which is the same as what I was suggesting: to use the missing function.
Since your Sex and Race variables are character they can't be numeric. You can make new variables if you want the indicator to be numeric.
Please test your data step before posting. Yours throws invalid data errors and does not read Age correctly when missing.
SAS will return 1 and 0 for True or False
data PATIENTS; infile datalines dlm=',' dsd truncover; input PAT_ID AGE SEX $ RACE $ LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID; datalines; 93704,,F,Black,90,0,0,0,0,0 93705,26,F,White,120,1,1,1,0,0 93706,28,M,Black,,0,0,0,0,0 93707,,F,,110,0,0,0,0,0 93708,47,M,,103,0,0,0,0,0 93709,38,,Black,94,1,0,0,1,1 93710,44,F,Black,102,0,0,0,0,0 93711,,M,,,0,0,0,0,0 93712,24,F,White,,0,0,0,0,0 93713,40,F,White,,1,0,0,0,0 93714,40,,Black,108,0,0,0,0,0 93715,25,M,Black,134,1,1,0,0,1 93716,23,M,White,132,0,0,0,1,0 93717,,F,White,,0,1,1,1,0 93718,41,F,,105,0,0,1,0,0 ; data want; set patients; array c (*) sex race; array n (*) AGE LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID; do i=1 to dim(c); c[i]= put(missing(c[i]),f1.); end; do i=1 to dim(n); n[i]= missing(n[i]); end; number_missing= sum(missing(sex),missing(race), sum(of n(*))); pattern = cats(age,sex,race,LDL,STATIN, BETA_BLOCKER, ACEI, DIURETIC, STEROID); drop i; run;
data PATIENTS;
infile datalines dlm=',' dsd;
input PAT_ID AGE SEX $ RACE $ LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93704,.,F,Black,90,0,0,0,0,0
93705,26,F,White,120,1,1,1,0,0
93706,28,M,Black,.,0,0,0,0,0
93707,.,F,.,110,0,0,0,0,0
93708,47,M,.,103,0,0,0,0,0
93709,38,.,Black,94,1,0,0,1,1
93710,44,F,Black,102,0,0,0,0,0
93711,.,M,.,.,0,0,0,0,0
93712,24,F,White,.,0,0,0,0,0
93713,40,F,White,.,1,0,0,0,0
93714,40,.,Black,108,0,0,0,0,0
93715,25,M,Black,134,1,1,0,0,1
93716,23,M,White,132,0,0,0,1,0
93717,.,F,White,.,0,1,1,1,0
93718,41,F,.,105,0,0,1,0,0
;
proc sql;
create table want as
select PAT_ID,
missing(AGE) as age,
missing(SEX) as sex,
missing(RACE) as race,
missing(LDL) as ldl,
missing(STATIN) as statin,
missing(BETA_BLOCKER) as beta_blocker,
missing(ACEI) as acei,
missing(DIURETIC) as diuretic,
missing(STEROID) as steroid,
sum(calculated age,
calculated sex,
calculated race,
calculated ldl,
calculated statin,
calculated beta_blocker,
calculated diuretic,
calculated steroid
) as NUMBER_MISSING ,
cats(calculated age,
calculated sex,
calculated race,
calculated ldl,
calculated statin,
calculated beta_blocker,
calculated diuretic,
calculated steroid
)as PATTERN_MISSING
from patients;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.