BookmarkSubscribeRSS Feed
simonqin
Fluorite | Level 6

Hi SAS pro

I am doing a research on polypharmacy and want to re-structure the dataset to see the start and end of single drug, and the start and end of two drugs, and the start and end of three drugs.

 

The example of original

dataset is like this :

IDdrugstartendSCORE 
1A06/01/202011/15/20202
1B08/01/202009/09/20201
1C06/01/202007/01/20203
1D10/01/202012/31/20204
2A01/15/202003/04/20202
2C01/27/202003/09/20203
3A03/01/202004/01/20202
3C03/26/202004/05/20203
3D03/28/202005/05/20204
4A04/04/202006/07/20202
4C04/12/202007/01/20203

 

I want to change the dataset as below:

 

IDdrugsstart endscore 
1A+C06/01/202007/01/20202+3=5
1A07/02/202007/31/20202
1A+B08/01/202009/09/20202+1=3
1A09/10/202009/30/20202
1A+D10/01/202011/15/20202+4=6
1D11/16/202012/31/20204
2A01/15/202001/26/20202
2A+C01/27/202003/04/20202+3=5
2C03/05/202003/09/20203

 

I know this analysis is comprehensive.. I am much appreciate if you could help me to sort out this analysis.

1 REPLY 1
s_lassen
Meteorite | Level 14

You will have a much better chance of someone answering if you post your example data as a datastep, e.g.

data have;                                
  length id 8 drug $1 start end score 8;  
  informat start end mmddyy10.;           
  format start end date9.;                
  input ID drug start end SCORE;          
cards;                                    
1 A 06/01/2020 11/15/2020 2               
1 B 08/01/2020 09/09/2020 1               
1 C 06/01/2020 07/01/2020 3               
1 D 10/01/2020 12/31/2020 4               
2 A 01/15/2020 03/04/2020 2               
2 C 01/27/2020 03/09/2020 3               
3 A 03/01/2020 04/01/2020 2               
3 C 03/26/2020 04/05/2020 3               
3 D 03/28/2020 05/05/2020 4               
4 A 04/04/2020 06/07/2020 2               
4 C 04/12/2020 07/01/2020 3               
;run;

I think the easiest way around this is to create a "long" dataset, with one entry for each drug, each day it was active:

data long;              
  set have;             
  do date=start to end; 
    output;             
    end;                
  drop start end;       
  format date date9.;   
run;

Then sort that by ID and date, and calculate combination and score for each date:

proc sort data=long;              
  by id date drug;                
run;                              
data long2;                       
  length drugs $10;               
  total=0;                        
  do until(last.date);            
    set long;                     
    by id date;                   
    call catx('+',drugs,drug);    
    total+score;                  
    end;                          
  drop drug score;                
  rename total=score;             
run;

And finally, convert back to intervals:

data want;                 
  do until(last.drugs);    
    set long2;             
    by id drugs notsorted; 
    if first.drugs then    
      start=date;          
    end;                   
  format start date9.;     
  rename date=end;         
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 593 views
  • 2 likes
  • 2 in conversation