it seems to me that you tried to filled up missing demographic info for patient, and for each patient it can only one value if not missing. For instance, a patient (same CPI) can not be both male or female at the same time. therefore my simple Proc SQL approach should have a shot:
DATA HAVE;
INPUT CPI :$3. (EnrollDate VisitDate) (:MMDDYY10.) (BP1 Race Gender) (:$10.);
CARDS;
123 1/1/2001 1/1/2001 120/80 White Female
123 1/1/2001 2/1/2001 119/85 . .
123 9/9/2009 9/9/2009 115/75 White .
123 9/9/2009 10/9/2009 . . .
456 2/2/2002 2/2/2002 118/85 Black Male
456 2/2/2002 3/2/2002 110/82 Black Male
789 3/3/2003 3/3/2003 135/81 Hispanic .
789 3/3/2003 4/3/2003 132/83 Hispanic Male
789 3/3/2003 5/3/2003 . . Male
789 3/3/2003 6/3/2003 128/81 . .
;
proc sql;
create table want as
select CPI, EnrollDate FORMAT=MMDDYY10., VisitDate FORMAT=MMDDYY10., BP1,
COALESCEC(RACE, MAX(RACE)) AS RACE,
COALESCEC(GENDER, MAX(GENDER)) AS GENDER
FROM HAVE
GROUP BY CPI
;
QUIT;
Note: coalescec() is abundant in your case, Max() alone is sufficient .
Haikuo
For demographic variables I would probably create a DM data with one obs per CPI and not LOCF those variables. Here is UPDATE method LOCF and NOCB(next obs carried back?). This method has the advantage of LOCFing a lot of variables without a lot of code. There is also the assumption that any completely missing visits have been added to the data.
data _null_,
Can you explain, please, syntax of operation array f first:; ?
I mean for what first:; stands?
Katie,
Here is an alternative approach that you might want to try:
data have;
input CPI (EnrollDate VisitDate) (:mmddyy10.) (BP1 Race Gender) ($);
cards;
123 1/1/2001 1/1/2001 120/80 White Female
123 1/1/2001 2/1/2001 119/85 . .
123 9/9/2009 9/9/2009 115/75 White .
123 9/9/2009 10/9/2009 . . .
456 2/2/2002 2/2/2002 118/85 Black Male
456 2/2/2002 3/2/2002 110/82 Black Male
789 3/3/2003 3/3/2003 135/81 Hispanic .
789 3/3/2003 4/3/2003 132/83 Hispanic Male
789 3/3/2003 5/3/2003 . . Male
789 3/3/2003 6/3/2003 128/81 . .
;
data want (drop=_:);
length _race _gender $8;
do until (last.cpi);
set have;
by cpi;
race=coalescec(race,_race);
gender=coalescec(gender,_gender);
_race=race;
_gender=gender;
end;
do until (last.cpi);
set have (drop=race gender);
by cpi;
output;
end;
run;
Thank you all so much for your help. I believe I've gotten it to work!
Thanks again!
Calling all data scientists and open-source enthusiasts! Want to solve real problems that impact your company or the world? Register to hack by August 31st!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.