Hi SAS pro
I am doing a research on polypharmacy and want to re-structure the dataset to see the start and end of single drug, and the start and end of two drugs, and the start and end of three drugs.
The example of original
dataset is like this :
ID | drug | start | end | SCORE |
1 | A | 06/01/2020 | 11/15/2020 | 2 |
1 | B | 08/01/2020 | 09/09/2020 | 1 |
1 | C | 06/01/2020 | 07/01/2020 | 3 |
1 | D | 10/01/2020 | 12/31/2020 | 4 |
2 | A | 01/15/2020 | 03/04/2020 | 2 |
2 | C | 01/27/2020 | 03/09/2020 | 3 |
3 | A | 03/01/2020 | 04/01/2020 | 2 |
3 | C | 03/26/2020 | 04/05/2020 | 3 |
3 | D | 03/28/2020 | 05/05/2020 | 4 |
4 | A | 04/04/2020 | 06/07/2020 | 2 |
4 | C | 04/12/2020 | 07/01/2020 | 3 |
I want to change the dataset as below:
ID | drugs | start | end | score |
1 | A+C | 06/01/2020 | 07/01/2020 | 2+3=5 |
1 | A | 07/02/2020 | 07/31/2020 | 2 |
1 | A+B | 08/01/2020 | 09/09/2020 | 2+1=3 |
1 | A | 09/10/2020 | 09/30/2020 | 2 |
1 | A+D | 10/01/2020 | 11/15/2020 | 2+4=6 |
1 | D | 11/16/2020 | 12/31/2020 | 4 |
2 | A | 01/15/2020 | 01/26/2020 | 2 |
2 | A+C | 01/27/2020 | 03/04/2020 | 2+3=5 |
2 | C | 03/05/2020 | 03/09/2020 | 3 |
I know this analysis is comprehensive.. I am much appreciate if you could help me to sort out this analysis.
You will have a much better chance of someone answering if you post your example data as a datastep, e.g.
data have;
length id 8 drug $1 start end score 8;
informat start end mmddyy10.;
format start end date9.;
input ID drug start end SCORE;
cards;
1 A 06/01/2020 11/15/2020 2
1 B 08/01/2020 09/09/2020 1
1 C 06/01/2020 07/01/2020 3
1 D 10/01/2020 12/31/2020 4
2 A 01/15/2020 03/04/2020 2
2 C 01/27/2020 03/09/2020 3
3 A 03/01/2020 04/01/2020 2
3 C 03/26/2020 04/05/2020 3
3 D 03/28/2020 05/05/2020 4
4 A 04/04/2020 06/07/2020 2
4 C 04/12/2020 07/01/2020 3
;run;
I think the easiest way around this is to create a "long" dataset, with one entry for each drug, each day it was active:
data long;
set have;
do date=start to end;
output;
end;
drop start end;
format date date9.;
run;
Then sort that by ID and date, and calculate combination and score for each date:
proc sort data=long;
by id date drug;
run;
data long2;
length drugs $10;
total=0;
do until(last.date);
set long;
by id date;
call catx('+',drugs,drug);
total+score;
end;
drop drug score;
rename total=score;
run;
And finally, convert back to intervals:
data want;
do until(last.drugs);
set long2;
by id drugs notsorted;
if first.drugs then
start=date;
end;
format start date9.;
rename date=end;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.