Re: Split dataset, rename variables, and put it back together

Report Inappropriate Content · Posted 02-14-2018 02:34 PM

Hello,

I am struggling write a SAS code that would do the following :

Step 1: Split the data by ID and Drug for example:

new1: CardID1, drug1

new2: CadrID1, durg2

new3: CardID1, drug3

new4: CardID2, drug1

new5: CardID2, drug1

new6: CardID2, durg2

new7: CardID2, drug3

new8: CardID2, durg4

I have thousands of CardIDs and they all have different number of drugs. Here's my code:

data _null_;
if 0 then set both;
declare hash dog(ordered: 'a');
dog.definekey ('UniqueID2','NDC','Card'); /*NDC is a drug name charter variables, Card is just a short ID i crated to split the data*/
dog.definedata ('CardID','NDC', 'DaysSupply1','DaysSupply2', 'DaysSupply3', 'DaysSupply4', 'DaysSupply5', 'DaysSupply6', 'DaysSupply7',
				'filldate1', 'filldate2', 'filldate3', 'filldate4', 'filldate5', 'filldate6', 'filldate7',
       			'end_dt','start_dt');
dog.definedone();
do _n_=1 by 1 until (last.UniqueID2);
set both;
by UniqueID2;
dog.add();
end;
dog.output (dataset: 'new'||compress(UniqueID2));
run;

Step 2: Assign unique names to drug days and drug fill and calculate PDC score for all the data sets created in Step 1.
I have the following code that needs help: 

data new1; 
set new1 ; /*This needs to call out all the datasets for each person created in Step1. Then repeat this for every other person. I am not sure how to do this w/o typing hundred of dataset name*/ 
array daydummy(180) day1-day180; /*Because in the next data step, I use merge and an array, for each new dataset the variable names need to change so that dataset New2 would have variables names drug2day1-drug2day180, dataset  New3 drug3day1-drug3day180 */
array filldates(*) filldate1 - filldate11; 
array dayssupply(*) dayssupply1-dayssupply11;
do ii=1 to 180; daydummy(ii)=0;end;
do ii=1 to 180;
do i = 1 to dim(filldates) while (filldates(i) ne .);
if filldates(i)<= start_dt + ii -1 <= filldates(i)+dayssupply(i)-1
then daydummy(ii)=1;
end;
end;

do u=2 to 11 while (filldates(u) ne .);
if filldates(u)<filldates(u-1)+dayssupply(u-1)
then filldates(u)=filldates(u-1)+dayssupply(u-1);
end;

drop i ii u;
dayscovered=sum(of day1-day180);label dayscovered='Total Days Covered';
p_dayscovered=dayscovered/180;label p_dayscovered='Proportion of Days Covered';
run;

Step 3: Combines the files back by CardID and calculates cumulative PDC score. 
/*These 3 datasets are just for one person, I need to do that for all CardIDs for all different number of datasets each person will have*/
proc sort data=new1; 
by cardID; 
run; 

proc sort data=new2; 
by cardID; 
run; 

proc sort data=new3; 
by cardID; 
run; 

data pdc_1_2;
merge new1 new2 new3;
by CardID;
dayscovered_both=0;
array drug1 (180) day1 - day180;
array drug2 (180) drug2day1 - drug2day180;
array drug3 (180) drug3day1-drug3day180;

if drug1(i) = 1 AND drug2(i) = 1 then dayscovered_both = dayscovered_both + 1;
end;
label dayscovered_both = ‘Total Days Covered on Both Drugs’;
p_dayscovered_both= dayscovered_both / 180;
label p_dayscovered_both = ‘Proportion of Days Covered on Both Drugs’;
run;

Please help, I am desperate and stuck!
Thanks.

Reeza · Posted 02-14-2018 02:37 PM

What's your question?

Spoiler

@127 wrote:

Hello,

I am struggling write a SAS code that would do the following :

Step 1: Split the data by ID and Drug for example:

new1: CardID1, drug1

new2: CadrID1, durg2

new3: CardID1, drug3

new4: CardID2, drug1

new5: CardID2, drug1

new6: CardID2, durg2

new7: CardID2, drug3

new8: CardID2, durg4

I have thousands of CardIDs and they all have different number of drugs. Here's my code:

data _null_;
if 0 then set both;
declare hash dog(ordered: 'a');
dog.definekey ('UniqueID2','NDC','Card'); /*NDC is a drug name charter variables, Card is just a short ID i crated to split the data*/
dog.definedata ('CardID','NDC', 'DaysSupply1','DaysSupply2', 'DaysSupply3', 'DaysSupply4', 'DaysSupply5', 'DaysSupply6', 'DaysSupply7',
				'filldate1', 'filldate2', 'filldate3', 'filldate4', 'filldate5', 'filldate6', 'filldate7',
       			'end_dt','start_dt');
dog.definedone();
do _n_=1 by 1 until (last.UniqueID2);
set both;
by UniqueID2;
dog.add();
end;
dog.output (dataset: 'new'||compress(UniqueID2));
run;

Step 2: Assign unique names to drug days and drug fill and calculate PDC score for all the data sets created in Step 1.
I have the following code that needs help: 

data new1; 
set new1 ; /*This needs to call out all the datasets for each person created in Step1. Then repeat this for every other person. I am not sure how to do this w/o typing hundred of dataset name*/ 
array daydummy(180) day1-day180; /*Because in the next data step, I use merge and an array, for each new dataset the variable names need to change so that dataset New2 would have variables names drug2day1-drug2day180, dataset  New3 drug3day1-drug3day180 */
array filldates(*) filldate1 - filldate11; 
array dayssupply(*) dayssupply1-dayssupply11;
do ii=1 to 180; daydummy(ii)=0;end;
do ii=1 to 180;
do i = 1 to dim(filldates) while (filldates(i) ne .);
if filldates(i)<= start_dt + ii -1 <= filldates(i)+dayssupply(i)-1
then daydummy(ii)=1;
end;
end;

do u=2 to 11 while (filldates(u) ne .);
if filldates(u)<filldates(u-1)+dayssupply(u-1)
then filldates(u)=filldates(u-1)+dayssupply(u-1);
end;

drop i ii u;
dayscovered=sum(of day1-day180);label dayscovered='Total Days Covered';
p_dayscovered=dayscovered/180;label p_dayscovered='Proportion of Days Covered';
run;

Step 3: Combines the files back by CardID and calculates cumulative PDC score. 
/*These 3 datasets are just for one person, I need to do that for all CardIDs for all different number of datasets each person will have*/
proc sort data=new1; 
by cardID; 
run; 

proc sort data=new2; 
by cardID; 
run; 

proc sort data=new3; 
by cardID; 
run; 

data pdc_1_2;
merge new1 new2 new3;
by CardID;
dayscovered_both=0;
array drug1 (180) day1 - day180;
array drug2 (180) drug2day1 - drug2day180;
array drug3 (180) drug3day1-drug3day180;

if drug1(i) = 1 AND drug2(i) = 1 then dayscovered_both = dayscovered_both + 1;
end;
label dayscovered_both = ‘Total Days Covered on Both Drugs’;
p_dayscovered_both= dayscovered_both / 180;
label p_dayscovered_both = ‘Proportion of Days Covered on Both Drugs’;
run;

Please help, I am desperate and stuck! 
Thanks.

wrote: Hello, I am struggling write a SAS code that would do the following : Step 1: Split the data by ID and Drug for example: new1: CardID1, drug1 new2: CadrID1, durg2 new3: CardID1, drug3 new4: CardID2, drug1 new5: CardID2, drug1 new6: CardID2, durg2 new7: CardID2, drug3 new8: CardID2, durg4 I have thousands of CardIDs and they all have different number of drugs. Here's my code: data _null_; if 0 then set both; declare hash dog(ordered: 'a'); dog.definekey ('UniqueID2','NDC','Card'); /*NDC is a drug name charter variables, Card is just a short ID i crated to split the data*/ dog.definedata ('CardID','NDC', 'DaysSupply1','DaysSupply2', 'DaysSupply3', 'DaysSupply4', 'DaysSupply5', 'DaysSupply6', 'DaysSupply7', 'filldate1', 'filldate2', 'filldate3', 'filldate4', 'filldate5', 'filldate6', 'filldate7', 'end_dt','start_dt'); dog.definedone(); do _n_=1 by 1 until (last.UniqueID2); set both; by UniqueID2; dog.add(); end; dog.output (dataset: 'new'||compress(UniqueID2)); run; Step 2: Assign unique names to drug days and drug fill and calculate PDC score for all the data sets created in Step 1.I have the following code that needs help: data new1; set new1 ; /*This needs to call out all the datasets for each person created in Step1. Then repeat this for every other person. I am not sure how to do this w/o typing hundred of dataset name*/ array daydummy(180) day1-day180; /*Because in the next data step, I use merge and an array, for each new dataset the variable names need to change so that dataset New2 would have variables names drug2day1-drug2day180, dataset New3 drug3day1-drug3day180 */ array filldates(*) filldate1 - filldate11; array dayssupply(*) dayssupply1-dayssupply11; do ii=1 to 180; daydummy(ii)=0;end; do ii=1 to 180; do i = 1 to dim(filldates) while (filldates(i) ne .); if filldates(i)<= start_dt + ii -1 <= filldates(i)+dayssupply(i)-1 then daydummy(ii)=1; end; end; do u=2 to 11 while (filldates(u) ne .); if filldates(u)<filldates(u-1)+dayssupply(u-1) then filldates(u)=filldates(u-1)+dayssupply(u-1); end; drop i ii u; dayscovered=sum(of day1-day180);label dayscovered='Total Days Covered'; p_dayscovered=dayscovered/180;label p_dayscovered='Proportion of Days Covered'; run;Step 3: Combines the files back by CardID and calculates cumulative PDC score. /*These 3 datasets are just for one person, I need to do that for all CardIDs for all different number of datasets each person will have*/proc sort data=new1; by cardID; run; proc sort data=new2; by cardID; run; proc sort data=new3; by cardID; run; data pdc_1_2; merge new1 new2 new3; by CardID; dayscovered_both=0; array drug1 (180) day1 - day180; array drug2 (180) drug2day1 - drug2day180;array drug3 (180) drug3day1-drug3day180;if drug1(i) = 1 AND drug2(i) = 1 then dayscovered_both = dayscovered_both + 1;end;label dayscovered_both = ‘Total Days Covered on Both Drugs’;p_dayscovered_both= dayscovered_both / 180;label p_dayscovered_both = ‘Proportion of Days Covered on Both Drugs’;run;Please help, I am desperate and stuck! Thanks.

Report Inappropriate Content · Posted 02-14-2018 02:40 PM

I posted noted the questions in the code lines where I need help. This code works but only for a single ID, I don't know how to do for thousands for records and how to rename the variables and put the dataset back together.

Reeza · Posted 02-14-2018 02:42 PM

What happens if you don't split the data, run it as is and merge by CardID and Drug instead at the end?

Report Inappropriate Content · Posted 02-14-2018 03:21 PM

Thanks Reeza for the question.

When your suggestion it didn't work. I think I have to split the data for each ID and drug b/c I need to calculate the PDC for each one of the separately and then when I merge all the files, it just crates one record for that ID runs the last array and looks at all the days to see if the prescription days are overlapping based on all the different medication that are filled: example (this example i am including is shorten version, but it would go all the way to 180 days):

I am still fairly new to SAS so probably there's a better way.

Astounding · Posted 02-14-2018 02:54 PM

General impressions only due to a lack of time on my part:

Why go through the first step at all? Sort your data by CardID Drug. Instead of keeping a separate data set, roll up those observations up into a single observation, similar to the way you are processing in Step 2. You would have to reset your variables when beginning a new drug, and retain variables as you move from one observation to the next for the same drug. But that's much easier than what you are doing now.

Why should day5 for one drug be the same as day5 for another drug? They could easily have different start dates, and a different range of dates. Is your program treating day5 as being the same day for all drugs?

Report Inappropriate Content · Posted 02-14-2018 03:50 PM

Thanks "Astounding" for your suggestion.

You're right the drug 1 day5 maybe different than drug 2 day5 depending on their refill dates. If it splits the datasets for each person for each drug then day5 variable would be in each of those data sets with either 0 (for no drug prescribed for that day) or 1 (had drug prescribed during that day).

I posted an image of what it would look like and what the end results would be. I just don't know how to get there..

Reeza · Posted 02-14-2018 04:01 PM

That's how your data looks in Excel, I'm assuming it's not in SAS in that format. How does it look like SAS? Or is the point of this exercise to clean it up to make it in a workable format? If so, is converting it to a CSV an option in the IMPORT stage?

Report Inappropriate Content · Posted 02-14-2018 04:24 PM

This would be in SAS after run all the code i have right now, so there would be 3 separate datasets for one ID with 3 different drugs and I want to calculate the score from all those 3 separate datasets but I need to figure out how to concatenate the datasets to count the overlaping days, the problem is that each of those variables have the same name so when I merge them in the last step, Step 3, it only transfers data from one drug with the data values and assigns missing values to the other drugs, therefore, i can figure out the overplaping days with the code. My data in SAS before i run any of my SAS codes looks something like this:

ID NCD (drug) filldate1 Supply1 filldate2 Supply2 filldate3 Supplly3

1 10 1/1/2013 30 2/2/2013 60 4/6/2013 30

1 35 1/1/2013 30 3/5/2013 30 8/6/2013 60

1 68 5/5/2013 60 8/10/2013 60 10/1/2013 60

2 50 2/2/2013 30 5/10/2013 30 11/1/2013 30

Here's what i want to do:

1. Figure out the number of days per person each drug covers during a 180 day period

2. Calculate the numbers of days overlapping for each person for all their drugs combined, then calculate the score for Total of days covered for all drugs /180 days

Reeza · Posted 02-14-2018 04:27 PM

Ok, so you have a working program after you split your file. If you want to call it for all other data sets you could wrap it in a macro and use CALL EXECUTE to call it for every data set and append them all at the end.

However, this approach is problematic for several reasons, which is why I and others have suggested you modify it so that you don't split your data set in the first place. You're also describing your entire process/problem at once, when it may help a lot to ask simpler questions one at a time. A big question takes time to process, read and figure out answers. Smaller questions can have answers in minutes or hours.

Report Inappropriate Content · Posted 02-14-2018 04:33 PM

You're right, I apologize for that, it's my first time posting and I have been looking at this forever. I should have just posted my last posting first.

Can any one please help/provide me with code example that i can work off?

I really appropriate it.

Reeza · Posted 02-14-2018 04:40 PM

Add what you would expect as output from that three records and it would probably be a good idea to include more than one ID to make it more realistic.

@127 wrote:

You're right, I apologize for that, it's my first time posting and I have been looking at this forever. I should have just posted my last posting first.

Can any one please help/provide me with code example that i can work off?

I really appropriate it.

Report Inappropriate Content · Posted 02-14-2018 05:03 PM

here's a bigger fake dataset. I would expect the final dataset to have one line per ID with calculated variables PDC (number of days covered by all drugs/180 days=PDC)

ID	NDC	Filldate1	Supply1	Filldate2	Supply2	Filldate3	Supply3	Filldate4	Supply4	Filldate5	Supply5	Filldate6	Supply6	Filldate7	Supply7
1	10	1/1/2013	30	2/2/2013	60	4/6/2013	30
1	35	1/1/2013	30	3/5/2013	30	8/6/2013	60
1	68	5/5/2013	60	8/10/2013	60	10/1/2013	60
2	50	1/7/2013	30	2/8/2013	30	3/11/2013	30
2	60	4/5/2013	30	4/29/2013	30	6/3/2013	30	7/3/2013	30	8/21/2013	3
2	70	3/11/2013	28	4/17/2013	28	6/3/2013	16	7/17/2013	33	8/21/2013	33	9/19/2013	33	11/12/2013	33
2	80	9/30/2013	30	10/24/2013	30	12/2/2013	30

SAS Innovate 2025: Call for Content

Classroom Training Available!