Split dataset, rename variables, and put it back together

Reply
Occasional Contributor 127
Occasional Contributor
Posts: 7

Split dataset, rename variables, and put it back together

[ Edited ]

Hello, 

I am struggling write a SAS code that would do the following : 

Step 1:  Split the data by ID and Drug for example: 

             new1: CardID1, drug1     

             new2: CadrID1, durg2

             new3: CardID1, drug3

             

             new4: CardID2, drug1 

             new5: CardID2, drug1     

             new6: CardID2, durg2

             new7: CardID2, drug3

             new8: CardID2, durg4

            

I have thousands of CardIDs and they all have different number of drugs.  Here's my code: 

data _null_;
if 0 then set both;
declare hash dog(ordered: 'a');
dog.definekey ('UniqueID2','NDC','Card'); /*NDC is a drug name charter variables, Card is just a short ID i crated to split the data*/
dog.definedata ('CardID','NDC', 'DaysSupply1','DaysSupply2', 'DaysSupply3', 'DaysSupply4', 'DaysSupply5', 'DaysSupply6', 'DaysSupply7',
				'filldate1', 'filldate2', 'filldate3', 'filldate4', 'filldate5', 'filldate6', 'filldate7',
       			'end_dt','start_dt');
dog.definedone();
do _n_=1 by 1 until (last.UniqueID2);
set both;
by UniqueID2;
dog.add();
end;
dog.output (dataset: 'new'||compress(UniqueID2));
run;

 

 

Step 2: Assign unique names to drug days and drug fill and calculate PDC score for all the data sets created in Step 1.
I have the following code that needs help: 

data new1; 
set new1 ; /*This needs to call out all the datasets for each person created in Step1. Then repeat this for every other person. I am not sure how to do this w/o typing hundred of dataset name*/ 
array daydummy(180) day1-day180; /*Because in the next data step, I use merge and an array, for each new dataset the variable names need to change so that dataset New2 would have variables names drug2day1-drug2day180, dataset  New3 drug3day1-drug3day180 */
array filldates(*) filldate1 - filldate11; 
array dayssupply(*) dayssupply1-dayssupply11;
do ii=1 to 180; daydummy(ii)=0;end;
do ii=1 to 180;
do i = 1 to dim(filldates) while (filldates(i) ne .);
if filldates(i)<= start_dt + ii -1 <= filldates(i)+dayssupply(i)-1
then daydummy(ii)=1;
end;
end;

do u=2 to 11 while (filldates(u) ne .);
if filldates(u)<filldates(u-1)+dayssupply(u-1)
then filldates(u)=filldates(u-1)+dayssupply(u-1);
end;

drop i ii u;
dayscovered=sum(of day1-day180);label dayscovered='Total Days Covered';
p_dayscovered=dayscovered/180;label p_dayscovered='Proportion of Days Covered';
run;

Step 3: Combines the files back by CardID and calculates cumulative PDC score. 
/*These 3 datasets are just for one person, I need to do that for all CardIDs for all different number of datasets each person will have*/
proc sort data=new1; by cardID; run; proc sort data=new2; by cardID; run;
proc sort data=new3;
by cardID;
run;
data pdc_1_2; merge new1 new2 new3; by CardID; dayscovered_both=0; array drug1 (180) day1 - day180; array drug2 (180) drug2day1 - drug2day180;
array drug3 (180) drug3day1-drug3day180;

if drug1(i) = 1 AND drug2(i) = 1 then dayscovered_both = dayscovered_both + 1;
end;
label dayscovered_both = ‘Total Days Covered on Both Drugs’;
p_dayscovered_both= dayscovered_both / 180;
label p_dayscovered_both = ‘Proportion of Days Covered on Both Drugs’;
run;


 Please help, I am desperate and stuck!
Thanks. 

Super User
Posts: 23,296

Re: Split dataset, rename variables, and put it back together

What's your question?

 

Spoiler

127 wrote:

Hello, 

I am struggling write a SAS code that would do the following : 

Step 1:  Split the data by ID and Drug for example: 

             new1: CardID1, drug1     

             new2: CadrID1, durg2

             new3: CardID1, drug3

             

             new4: CardID2, drug1 

             new5: CardID2, drug1     

             new6: CardID2, durg2

             new7: CardID2, drug3

             new8: CardID2, durg4

            

I have thousands of CardIDs and they all have different number of drugs.  Here's my code: 

data _null_;
if 0 then set both;
declare hash dog(ordered: 'a');
dog.definekey ('UniqueID2','NDC','Card'); /*NDC is a drug name charter variables, Card is just a short ID i crated to split the data*/
dog.definedata ('CardID','NDC', 'DaysSupply1','DaysSupply2', 'DaysSupply3', 'DaysSupply4', 'DaysSupply5', 'DaysSupply6', 'DaysSupply7',
				'filldate1', 'filldate2', 'filldate3', 'filldate4', 'filldate5', 'filldate6', 'filldate7',
       			'end_dt','start_dt');
dog.definedone();
do _n_=1 by 1 until (last.UniqueID2);
set both;
by UniqueID2;
dog.add();
end;
dog.output (dataset: 'new'||compress(UniqueID2));
run;

 

 

Step 2: Assign unique names to drug days and drug fill and calculate PDC score for all the data sets created in Step 1.
I have the following code that needs help:

data new1; set new1 ; /*This needs to call out all the datasets for each person created in Step1. Then repeat this for every other person. I am not sure how to do this w/o typing hundred of dataset name*/ array daydummy(180) day1-day180; /*Because in the next data step, I use merge and an array, for each new dataset the variable names need to change so that dataset New2 would have variables names drug2day1-drug2day180, dataset New3 drug3day1-drug3day180 */ array filldates(*) filldate1 - filldate11; array dayssupply(*) dayssupply1-dayssupply11; do ii=1 to 180; daydummy(ii)=0;end; do ii=1 to 180; do i = 1 to dim(filldates) while (filldates(i) ne .); if filldates(i)<= start_dt + ii -1 <= filldates(i)+dayssupply(i)-1 then daydummy(ii)=1; end; end; do u=2 to 11 while (filldates(u) ne .); if filldates(u)<filldates(u-1)+dayssupply(u-1) then filldates(u)=filldates(u-1)+dayssupply(u-1); end; drop i ii u; dayscovered=sum(of day1-day180);label dayscovered='Total Days Covered'; p_dayscovered=dayscovered/180;label p_dayscovered='Proportion of Days Covered'; run;

Step 3: Combines the files back by CardID and calculates cumulative PDC score.
/*These 3 datasets are just for one person, I need to do that for all CardIDs for all different number of datasets each person will have*/
proc sort data=new1; by cardID; run; proc sort data=new2; by cardID; run;
proc sort data=new3;
by cardID;
run;
data pdc_1_2; merge new1 new2 new3; by CardID; dayscovered_both=0; array drug1 (180) day1 - day180; array drug2 (180) drug2day1 - drug2day180;
array drug3 (180) drug3day1-drug3day180;

if drug1(i) = 1 AND drug2(i) = 1 then dayscovered_both = dayscovered_both + 1;
end;
label dayscovered_both = ‘Total Days Covered on Both Drugs’;
p_dayscovered_both= dayscovered_both / 180;
label p_dayscovered_both = ‘Proportion of Days Covered on Both Drugs’;
run;

Please help, I am desperate and stuck!
Thanks.

 


Occasional Contributor 127
Occasional Contributor
Posts: 7

Re: Split dataset, rename variables, and put it back together

I posted noted the questions in the code lines where I need help. This code works but only for a single ID, I don't know how to do for thousands for records and how to rename the variables and put the dataset back together.

Super User
Posts: 23,296

Re: Split dataset, rename variables, and put it back together

What happens if you don't split the data, run it as is and merge by CardID and Drug instead at the end?

Occasional Contributor 127
Occasional Contributor
Posts: 7

Re: Split dataset, rename variables, and put it back together

Thanks Reeza for the question. 

When your suggestion it didn't work. I think I have to split the data for each ID and drug b/c I need to calculate the PDC for each one of the separately and then when I merge all the files, it just crates one record for that ID runs the last array and looks at all the days to see if the prescription days are overlapping based on all the different medication that are filled: example (this example i am including is shorten version, but it would go all the way to 180 days):  

Example.PNG

 

I am still fairly new to SAS so probably there's a  better way. 

Super User
Posts: 6,632

Re: Split dataset, rename variables, and put it back together

General impressions only due to a lack of time on my part:

 

Why go through the first step at all?  Sort your data by CardID Drug.  Instead of keeping a separate data set, roll up those observations up into a single observation, similar to the way you are processing in Step 2.  You would have to reset your variables when beginning a new drug, and retain variables as you move from one observation to the next for the same drug.  But that's much easier than what you are doing now.

 

Why should day5 for one drug be the same as day5 for another drug?  They could easily have different start dates, and a different range of dates.  Is your program treating day5 as being the same day for all drugs?

 

 

Occasional Contributor 127
Occasional Contributor
Posts: 7

Re: Split dataset, rename variables, and put it back together

Posted in reply to Astounding

Thanks "Astounding" for your suggestion. 

You're right the drug 1 day5 maybe different than drug 2 day5 depending on their refill dates. If it splits the datasets for each person for each drug then day5 variable would be in each of those data sets with either 0 (for no drug prescribed for that day) or 1 (had drug prescribed during that day).

I posted an image of what it would look like and what the end results would be. I just don't know how to get there..

 

Super User
Posts: 23,296

Re: Split dataset, rename variables, and put it back together

That's how your data looks in Excel, I'm assuming it's not in SAS in that format. How does it look like SAS? Or is the point of this exercise to clean it up to make it in a workable format? If so, is converting it to a CSV an option in the IMPORT stage?

Occasional Contributor 127
Occasional Contributor
Posts: 7

Re: Split dataset, rename variables, and put it back together

This would be in SAS after run all the code i have right now, so there would be 3 separate datasets for one ID with 3 different drugs and I want to calculate the score from all those 3 separate datasets but I need to figure out how to concatenate the datasets to count the overlaping days, the problem is that each of those variables have the same name so when I merge them in the last step, Step 3, it only transfers data from one drug with the data values and assigns missing values to the other drugs, therefore, i can figure out the overplaping days with the code. My data in SAS before i run any of my SAS codes looks something like this: 

 

ID NCD (drug)  filldate1 Supply1 filldate2 Supply2   filldate3     Supplly3

1  10                 1/1/2013  30        2/2/2013   60        4/6/2013      30

1  35                 1/1/2013  30        3/5/2013   30        8/6/2013      60

1   68                5/5/2013  60        8/10/2013  60       10/1/2013    60

2   50                2/2/2013  30        5/10/2013  30       11/1/2013     30

 

Here's what i want to do: 

1. Figure out the number of days per person each drug covers during a 180 day period

2. Calculate the numbers of days overlapping for each person for all their drugs combined, then calculate the score for Total of days covered for all drugs /180 days  

Super User
Posts: 23,296

Re: Split dataset, rename variables, and put it back together

Ok, so you have a working program after you split your file. If you want to call it for all other data sets you could wrap it in a macro and use CALL EXECUTE to call it for every data set and append them all at the end. 

 

However, this approach is problematic for several reasons, which is why I and others have suggested you modify it so that you don't split your data set in the first place. You're also describing your entire process/problem at once, when it may help a lot to ask simpler questions one at a time. A big question takes time to process, read and figure out answers. Smaller questions can have answers in minutes or hours. 

 

 

Occasional Contributor 127
Occasional Contributor
Posts: 7

Re: Split dataset, rename variables, and put it back together

You're right, I apologize for that, it's my first time posting and I have been looking at this forever. I should have just posted my last posting first. 

 

Can any one please help/provide me with code example that i can work off? 

I really appropriate it.  

Super User
Posts: 23,296

Re: Split dataset, rename variables, and put it back together

Add what you would expect as output from that three records and it would probably be a good idea to include more than one ID to make it more realistic. 

 


127 wrote:

You're right, I apologize for that, it's my first time posting and I have been looking at this forever. I should have just posted my last posting first. 

 

Can any one please help/provide me with code example that i can work off? 

I really appropriate it.  


 

Occasional Contributor 127
Occasional Contributor
Posts: 7

Re: Split dataset, rename variables, and put it back together

here's a bigger fake dataset. I would expect the final dataset to have one line per ID with calculated variables PDC (number of days covered by all drugs/180 days=PDC) 

 

 

IDNDCFilldate1Supply1Filldate2Supply2Filldate3Supply3Filldate4Supply4Filldate5Supply5Filldate6Supply6Filldate7Supply7
1101/1/2013302/2/2013604/6/201330        
1351/1/2013303/5/2013308/6/201360        
1685/5/2013608/10/20136010/1/201360        
2501/7/2013302/8/2013303/11/201330        
2604/5/2013304/29/2013306/3/2013307/3/2013308/21/20133    
2703/11/2013284/17/2013286/3/2013167/17/2013338/21/2013339/19/20133311/12/201333
2809/30/20133010/24/20133012/2/201330        
                
Ask a Question
Discussion stats
  • 12 replies
  • 170 views
  • 0 likes
  • 3 in conversation