BookmarkSubscribeRSS Feed
PharmlyDoc
Quartz | Level 8

I have a table of PATIENTS with missing values, and the dummy variables for the drug columns (STATIN, BETA_BLOCKER, etc.) are coded as 0's and 1's where 0 = No  and 1 = Yes. 

 

I want a tabe where:

• Each patient would get a value of 0 (not missing) or 1 (missing) for each variable

• Create column for total number of missing variables per patient 

• Create column with a string of 0's and 1's showing the pattern of missing variables per patient 

 

data PATIENTS;
infile datalines dlm=',';
input PAT_ID AGE SEX $ RACE $ LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93704,.,F,Black,90,0,0,0,0,0
93705,26,F,White,120,1,1,1,0,0
93706,28,M,Black,.,0,0,0,0,0
93707,.,F,.,110,0,0,0,0,0
93708,47,M,.,103,0,0,0,0,0
93709,38,.,Black,94,1,0,0,1,1
93710,44,F,Black,102,0,0,0,0,0
93711,.,M,.,.,0,0,0,0,0
93712,24,F,White,.,0,0,0,0,0
93713,40,F,White,.,1,0,0,0,0
93714,40,.,Black,108,0,0,0,0,0
93715,25,M,Black,134,1,1,0,0,1
93716,23,M,White,132,0,0,0,1,0
93717,.,F,White,.,0,1,1,1,0
93718,41,F,.,105,0,0,1,0,0
;

 

 

The result would look like the table below:

(Notice that that since the drug columns in the PATIENTS table (STATIN, BETA_BLOCKER, etc.) are dummy variables they will all get 0's since they have a value.) 

 

PAT_ID AGE SEX RACE LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID NUMBER_MISSING PATTERN_MISSING
93704 1 0 0 0 0 0 0 0 0 1 100000000
93705 0 0 0 0 0 0 0 0 0 0 000000000
93706 0 0 0 1 0 0 0 0 0 1 000100000
93707 1 0 1 0 0 0 0 0 0 2 101000000
93708 0 0 1 0 0 0 0 0 0 1 001000000
93709 0 1 0 0 0 0 0 0 0 1 010000000
93710 0 0 0 0 0 0 0 0 0 0 000000000
93711 1 0 1 1 0 0 0 0 0 3 101100000
93712 0 0 0 1 0 0 0 0 0 1 000100000
93713 0 0 0 1 0 0 0 0 0 1 000100000
93714 0 1 0 0 0 0 0 0 0 1 010000000
93715 0 0 0 0 0 0 0 0 0 0 000000000
93716 0 0 0 0 0 0 0 0 0 0 000000000
93717 1 0 0 1 0 0 0 0 0 2 100100000
93718 0 0 1 0 0 0 0 0 0 1 001000000
6 REPLIES 6
tarheel13
Rhodochrosite | Level 12

Data is not reading in correctly. Can you try posting it again? 

Also, the missing function already assigns 0 or 1. missing(var) gets assigned to 1 if it's missing and 0 if it's not missing. 

This is what your current datalines makes. It doesn't look like what you intended to me. Screen Shot 2021-09-23 at 10.43.37 AM.png

tarheel13
Rhodochrosite | Level 12

@ballardw already gave you a solution, which is the same as what I was suggesting: to use the missing function. 

ballardw
Super User

Since your Sex and Race variables are character they can't be numeric. You can make new variables if you want the indicator to be numeric.

 

Please test your data step before posting. Yours throws invalid data errors and does not read Age correctly when missing.

 

SAS will return 1 and 0 for True or False

data PATIENTS;
infile datalines dlm=',' dsd truncover;
input PAT_ID AGE SEX $ RACE $ LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93704,,F,Black,90,0,0,0,0,0
93705,26,F,White,120,1,1,1,0,0
93706,28,M,Black,,0,0,0,0,0
93707,,F,,110,0,0,0,0,0
93708,47,M,,103,0,0,0,0,0
93709,38,,Black,94,1,0,0,1,1
93710,44,F,Black,102,0,0,0,0,0
93711,,M,,,0,0,0,0,0
93712,24,F,White,,0,0,0,0,0
93713,40,F,White,,1,0,0,0,0
93714,40,,Black,108,0,0,0,0,0
93715,25,M,Black,134,1,1,0,0,1
93716,23,M,White,132,0,0,0,1,0
93717,,F,White,,0,1,1,1,0
93718,41,F,,105,0,0,1,0,0
;

data want;
   set patients;
   array c (*) sex race;
   array n (*) AGE  LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
   do i=1 to dim(c);
      c[i]= put(missing(c[i]),f1.);
   end;
   do i=1 to dim(n);
      n[i]= missing(n[i]);
   end;
   number_missing= sum(missing(sex),missing(race),    sum(of n(*)));
   pattern = cats(age,sex,race,LDL,STATIN, BETA_BLOCKER, ACEI, DIURETIC, STEROID);
   drop i;
run;
Ksharp
Super User
data PATIENTS;
infile datalines dlm=',' dsd;
input PAT_ID AGE SEX $ RACE $ LDL STATIN BETA_BLOCKER ACEI DIURETIC STEROID;
datalines;
93704,.,F,Black,90,0,0,0,0,0
93705,26,F,White,120,1,1,1,0,0
93706,28,M,Black,.,0,0,0,0,0
93707,.,F,.,110,0,0,0,0,0
93708,47,M,.,103,0,0,0,0,0
93709,38,.,Black,94,1,0,0,1,1
93710,44,F,Black,102,0,0,0,0,0
93711,.,M,.,.,0,0,0,0,0
93712,24,F,White,.,0,0,0,0,0
93713,40,F,White,.,1,0,0,0,0
93714,40,.,Black,108,0,0,0,0,0
93715,25,M,Black,134,1,1,0,0,1
93716,23,M,White,132,0,0,0,1,0
93717,.,F,White,.,0,1,1,1,0
93718,41,F,.,105,0,0,1,0,0
;

proc sql;
create table want as
select PAT_ID, 
missing(AGE) as age, 
missing(SEX) as sex,
missing(RACE) as race,
missing(LDL) as ldl,
missing(STATIN) as statin,
missing(BETA_BLOCKER) as beta_blocker,
missing(ACEI) as acei,
missing(DIURETIC) as diuretic,
missing(STEROID) as steroid,

sum(calculated age,
calculated  sex,
calculated  race,
calculated  ldl,
calculated  statin,
calculated  beta_blocker,
calculated  diuretic,
calculated  steroid
) as NUMBER_MISSING ,

cats(calculated age,
calculated  sex,
calculated  race,
calculated  ldl,
calculated  statin,
calculated  beta_blocker,
calculated  diuretic,
calculated  steroid
)as PATTERN_MISSING
 from patients;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 590 views
  • 2 likes
  • 5 in conversation