Hello! I am working with health data in the following format:
ID = Patient ID
Date = Date of Procedure
DocIDProcedure1 = the ID number of the doctor that performed the 1st procedure
DocIDProcedure2 = the ID number of the doctor that performed the 2nd procedure
DocIDProcedure3 = the ID number of the doctor that performed the 3rd procedure
This is an abbreviated list. I have a list of over 50 procedures for each row.
The data looks like the following:
ID Date DocIDProcedure1 DocIDProcedure2 DocIDProcedure3
1 Nov112020 1234
1 Nov112020 3456
1 Nov112020 7890
2 Dec032020 2465
2 Dec032020 1170
What I would like to do:
For each ID/Date combination, I would like to back and forward fill the doctor IDs so that they show up on each line (for the same ID and date).
So, I would like the above dataset to look like the following:
ID Date DocIDProcedure1 DocIDProcedure2 DocIDProcedure3
1 Nov112020 1234 3456 7890
1 Nov112020 1234 3456 7890
1 Nov112020 1234 3456 7890
2 Dec032020 1170 2465
2 Dec032020 1170 2465
This is really stumping me. Any help would be greatly appreciated!
Mikki
data have;
input (ID Date ) ($) DocIDProcedure1 DocIDProcedure2 DocIDProcedure3;
cards;
1 Nov112020 1234 . .
1 Nov112020 . 3456 .
1 Nov112020 . . 7890
2 Dec032020 . . 2465
2 Dec032020 . 1170 .
;
data want;
do _n_=1 by 1 until(last.date);
update have(obs=0) have;
by id date;
end;
do _n_=1 to _n_;
output;
end;
run;
proc print noobs;run;
ID | Date | DocIDProcedure1 | DocIDProcedure2 | DocIDProcedure3 |
---|---|---|---|---|
1 | Nov11202 | 1234 | 3456 | 7890 |
1 | Nov11202 | 1234 | 3456 | 7890 |
1 | Nov11202 | 1234 | 3456 | 7890 |
2 | Dec03202 | . | 1170 | 2465 |
2 | Dec03202 | . | 1170 | 2465 |
data have;
input (ID Date ) ($) DocIDProcedure1 DocIDProcedure2 DocIDProcedure3;
cards;
1 Nov112020 1234 . .
1 Nov112020 . 3456 .
1 Nov112020 . . 7890
2 Dec032020 . . 2465
2 Dec032020 . 1170 .
;
data want;
do _n_=1 by 1 until(last.date);
update have(obs=0) have;
by id date;
end;
do _n_=1 to _n_;
output;
end;
run;
proc print noobs;run;
ID | Date | DocIDProcedure1 | DocIDProcedure2 | DocIDProcedure3 |
---|---|---|---|---|
1 | Nov11202 | 1234 | 3456 | 7890 |
1 | Nov11202 | 1234 | 3456 | 7890 |
1 | Nov11202 | 1234 | 3456 | 7890 |
2 | Dec03202 | . | 1170 | 2465 |
2 | Dec03202 | . | 1170 | 2465 |
Can we assume that for any given block of ID*DATE combinations there is only one non missing value in each of those ...Proc... variables?
Why not just collapse into one observation per block? Are there other variables you didn't mention whose values vary over the observations in a block?
data have;
input (ID Date ) ($) DocIDProcedure1 DocIDProcedure2 DocIDProcedure3;
cards;
1 Nov112020 1234 . .
1 Nov112020 . 3456 .
1 Nov112020 . . 7890
2 Dec032020 . . 2465
2 Dec032020 . 1170 .
;
run;
data want(rename=(DocIDProcedure1_=DocIDProcedure1 DocIDProcedure2_=DocIDProcedure2 DocIDProcedure3_=DocIDProcedure3));
do until(last.id);
set have;
by id;
DocIDProcedure1_=coalesce(DocIDProcedure1_,DocIDProcedure1);
DocIDProcedure2_=coalesce(DocIDProcedure2_,DocIDProcedure2);
DocIDProcedure3_=coalesce(DocIDProcedure3_,DocIDProcedure3);
end;
do until(last.id);
set have;
by id;
output;
end;
drop DocIDProcedure1 DocIDProcedure2 DocIDProcedure3;
run;
data have; input (ID Date ) ($) DocIDProcedure1 DocIDProcedure2 DocIDProcedure3; cards; 1 Nov112020 1234 . . 1 Nov112020 . 3456 . 1 Nov112020 . . 7890 2 Dec032020 . . 2465 2 Dec032020 . 1170 . ; run; data have; set have; dummy='x'; run; proc sql; create table want(drop=dummy) as select id,date,dummy, max(DocIDProcedure1) as DocIDProcedure1, max(DocIDProcedure2) as DocIDProcedure2, max(DocIDProcedure3) as DocIDProcedure3 from have group by id,date; quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.