Hello,
I am struggling write a SAS code that would do the following :
Step 1: Split the data by ID and Drug for example:
new1: CardID1, drug1
new2: CadrID1, durg2
new3: CardID1, drug3
new4: CardID2, drug1
new5: CardID2, drug1
new6: CardID2, durg2
new7: CardID2, drug3
new8: CardID2, durg4
I have thousands of CardIDs and they all have different number of drugs. Here's my code:
data _null_;
if 0 then set both;
declare hash dog(ordered: 'a');
dog.definekey ('UniqueID2','NDC','Card'); /*NDC is a drug name charter variables, Card is just a short ID i crated to split the data*/
dog.definedata ('CardID','NDC', 'DaysSupply1','DaysSupply2', 'DaysSupply3', 'DaysSupply4', 'DaysSupply5', 'DaysSupply6', 'DaysSupply7',
'filldate1', 'filldate2', 'filldate3', 'filldate4', 'filldate5', 'filldate6', 'filldate7',
'end_dt','start_dt');
dog.definedone();
do _n_=1 by 1 until (last.UniqueID2);
set both;
by UniqueID2;
dog.add();
end;
dog.output (dataset: 'new'||compress(UniqueID2));
run;
Step 2: Assign unique names to drug days and drug fill and calculate PDC score for all the data sets created in Step 1. I have the following code that needs help: data new1; set new1 ; /*This needs to call out all the datasets for each person created in Step1. Then repeat this for every other person. I am not sure how to do this w/o typing hundred of dataset name*/ array daydummy(180) day1-day180; /*Because in the next data step, I use merge and an array, for each new dataset the variable names need to change so that dataset New2 would have variables names drug2day1-drug2day180, dataset New3 drug3day1-drug3day180 */ array filldates(*) filldate1 - filldate11; array dayssupply(*) dayssupply1-dayssupply11; do ii=1 to 180; daydummy(ii)=0;end; do ii=1 to 180; do i = 1 to dim(filldates) while (filldates(i) ne .); if filldates(i)<= start_dt + ii -1 <= filldates(i)+dayssupply(i)-1 then daydummy(ii)=1; end; end; do u=2 to 11 while (filldates(u) ne .); if filldates(u)<filldates(u-1)+dayssupply(u-1) then filldates(u)=filldates(u-1)+dayssupply(u-1); end; drop i ii u; dayscovered=sum(of day1-day180);label dayscovered='Total Days Covered'; p_dayscovered=dayscovered/180;label p_dayscovered='Proportion of Days Covered'; run;
Step 3: Combines the files back by CardID and calculates cumulative PDC score.
/*These 3 datasets are just for one person, I need to do that for all CardIDs for all different number of datasets each person will have*/
proc sort data=new1; by cardID; run; proc sort data=new2; by cardID; run;
proc sort data=new3;
by cardID;
run;
data pdc_1_2; merge new1 new2 new3; by CardID; dayscovered_both=0; array drug1 (180) day1 - day180; array drug2 (180) drug2day1 - drug2day180;
array drug3 (180) drug3day1-drug3day180;
if drug1(i) = 1 AND drug2(i) = 1 then dayscovered_both = dayscovered_both + 1;
end;
label dayscovered_both = ‘Total Days Covered on Both Drugs’;
p_dayscovered_both= dayscovered_both / 180;
label p_dayscovered_both = ‘Proportion of Days Covered on Both Drugs’;
run;
Please help, I am desperate and stuck!
Thanks.
What's your question?
@127 wrote:
Hello,
I am struggling write a SAS code that would do the following :
Step 1: Split the data by ID and Drug for example:
new1: CardID1, drug1
new2: CadrID1, durg2
new3: CardID1, drug3
new4: CardID2, drug1
new5: CardID2, drug1
new6: CardID2, durg2
new7: CardID2, drug3
new8: CardID2, durg4
I have thousands of CardIDs and they all have different number of drugs. Here's my code:
data _null_; if 0 then set both; declare hash dog(ordered: 'a'); dog.definekey ('UniqueID2','NDC','Card'); /*NDC is a drug name charter variables, Card is just a short ID i crated to split the data*/ dog.definedata ('CardID','NDC', 'DaysSupply1','DaysSupply2', 'DaysSupply3', 'DaysSupply4', 'DaysSupply5', 'DaysSupply6', 'DaysSupply7', 'filldate1', 'filldate2', 'filldate3', 'filldate4', 'filldate5', 'filldate6', 'filldate7', 'end_dt','start_dt'); dog.definedone(); do _n_=1 by 1 until (last.UniqueID2); set both; by UniqueID2; dog.add(); end; dog.output (dataset: 'new'||compress(UniqueID2)); run;
Step 2: Assign unique names to drug days and drug fill and calculate PDC score for all the data sets created in Step 1.
I have the following code that needs help:
data new1; set new1 ; /*This needs to call out all the datasets for each person created in Step1. Then repeat this for every other person. I am not sure how to do this w/o typing hundred of dataset name*/ array daydummy(180) day1-day180; /*Because in the next data step, I use merge and an array, for each new dataset the variable names need to change so that dataset New2 would have variables names drug2day1-drug2day180, dataset New3 drug3day1-drug3day180 */ array filldates(*) filldate1 - filldate11; array dayssupply(*) dayssupply1-dayssupply11; do ii=1 to 180; daydummy(ii)=0;end; do ii=1 to 180; do i = 1 to dim(filldates) while (filldates(i) ne .); if filldates(i)<= start_dt + ii -1 <= filldates(i)+dayssupply(i)-1 then daydummy(ii)=1; end; end; do u=2 to 11 while (filldates(u) ne .); if filldates(u)<filldates(u-1)+dayssupply(u-1) then filldates(u)=filldates(u-1)+dayssupply(u-1); end; drop i ii u; dayscovered=sum(of day1-day180);label dayscovered='Total Days Covered'; p_dayscovered=dayscovered/180;label p_dayscovered='Proportion of Days Covered'; run;Step 3: Combines the files back by CardID and calculates cumulative PDC score.
/*These 3 datasets are just for one person, I need to do that for all CardIDs for all different number of datasets each person will have*/
proc sort data=new1; by cardID; run; proc sort data=new2; by cardID; run;
proc sort data=new3;
by cardID;
run;
data pdc_1_2; merge new1 new2 new3; by CardID; dayscovered_both=0; array drug1 (180) day1 - day180; array drug2 (180) drug2day1 - drug2day180;
array drug3 (180) drug3day1-drug3day180;
if drug1(i) = 1 AND drug2(i) = 1 then dayscovered_both = dayscovered_both + 1;
end;
label dayscovered_both = ‘Total Days Covered on Both Drugs’;
p_dayscovered_both= dayscovered_both / 180;
label p_dayscovered_both = ‘Proportion of Days Covered on Both Drugs’;
run;
Please help, I am desperate and stuck!
Thanks.
What happens if you don't split the data, run it as is and merge by CardID and Drug instead at the end?
Thanks Reeza for the question.
When your suggestion it didn't work. I think I have to split the data for each ID and drug b/c I need to calculate the PDC for each one of the separately and then when I merge all the files, it just crates one record for that ID runs the last array and looks at all the days to see if the prescription days are overlapping based on all the different medication that are filled: example (this example i am including is shorten version, but it would go all the way to 180 days):
I am still fairly new to SAS so probably there's a better way.
General impressions only due to a lack of time on my part:
Why go through the first step at all? Sort your data by CardID Drug. Instead of keeping a separate data set, roll up those observations up into a single observation, similar to the way you are processing in Step 2. You would have to reset your variables when beginning a new drug, and retain variables as you move from one observation to the next for the same drug. But that's much easier than what you are doing now.
Why should day5 for one drug be the same as day5 for another drug? They could easily have different start dates, and a different range of dates. Is your program treating day5 as being the same day for all drugs?
Thanks "Astounding" for your suggestion.
You're right the drug 1 day5 maybe different than drug 2 day5 depending on their refill dates. If it splits the datasets for each person for each drug then day5 variable would be in each of those data sets with either 0 (for no drug prescribed for that day) or 1 (had drug prescribed during that day).
I posted an image of what it would look like and what the end results would be. I just don't know how to get there..
That's how your data looks in Excel, I'm assuming it's not in SAS in that format. How does it look like SAS? Or is the point of this exercise to clean it up to make it in a workable format? If so, is converting it to a CSV an option in the IMPORT stage?
This would be in SAS after run all the code i have right now, so there would be 3 separate datasets for one ID with 3 different drugs and I want to calculate the score from all those 3 separate datasets but I need to figure out how to concatenate the datasets to count the overlaping days, the problem is that each of those variables have the same name so when I merge them in the last step, Step 3, it only transfers data from one drug with the data values and assigns missing values to the other drugs, therefore, i can figure out the overplaping days with the code. My data in SAS before i run any of my SAS codes looks something like this:
ID NCD (drug) filldate1 Supply1 filldate2 Supply2 filldate3 Supplly3
1 10 1/1/2013 30 2/2/2013 60 4/6/2013 30
1 35 1/1/2013 30 3/5/2013 30 8/6/2013 60
1 68 5/5/2013 60 8/10/2013 60 10/1/2013 60
2 50 2/2/2013 30 5/10/2013 30 11/1/2013 30
Here's what i want to do:
1. Figure out the number of days per person each drug covers during a 180 day period
2. Calculate the numbers of days overlapping for each person for all their drugs combined, then calculate the score for Total of days covered for all drugs /180 days
Ok, so you have a working program after you split your file. If you want to call it for all other data sets you could wrap it in a macro and use CALL EXECUTE to call it for every data set and append them all at the end.
However, this approach is problematic for several reasons, which is why I and others have suggested you modify it so that you don't split your data set in the first place. You're also describing your entire process/problem at once, when it may help a lot to ask simpler questions one at a time. A big question takes time to process, read and figure out answers. Smaller questions can have answers in minutes or hours.
You're right, I apologize for that, it's my first time posting and I have been looking at this forever. I should have just posted my last posting first.
Can any one please help/provide me with code example that i can work off?
I really appropriate it.
Add what you would expect as output from that three records and it would probably be a good idea to include more than one ID to make it more realistic.
@127 wrote:
You're right, I apologize for that, it's my first time posting and I have been looking at this forever. I should have just posted my last posting first.
Can any one please help/provide me with code example that i can work off?
I really appropriate it.
here's a bigger fake dataset. I would expect the final dataset to have one line per ID with calculated variables PDC (number of days covered by all drugs/180 days=PDC)
ID | NDC | Filldate1 | Supply1 | Filldate2 | Supply2 | Filldate3 | Supply3 | Filldate4 | Supply4 | Filldate5 | Supply5 | Filldate6 | Supply6 | Filldate7 | Supply7 |
1 | 10 | 1/1/2013 | 30 | 2/2/2013 | 60 | 4/6/2013 | 30 | ||||||||
1 | 35 | 1/1/2013 | 30 | 3/5/2013 | 30 | 8/6/2013 | 60 | ||||||||
1 | 68 | 5/5/2013 | 60 | 8/10/2013 | 60 | 10/1/2013 | 60 | ||||||||
2 | 50 | 1/7/2013 | 30 | 2/8/2013 | 30 | 3/11/2013 | 30 | ||||||||
2 | 60 | 4/5/2013 | 30 | 4/29/2013 | 30 | 6/3/2013 | 30 | 7/3/2013 | 30 | 8/21/2013 | 3 | ||||
2 | 70 | 3/11/2013 | 28 | 4/17/2013 | 28 | 6/3/2013 | 16 | 7/17/2013 | 33 | 8/21/2013 | 33 | 9/19/2013 | 33 | 11/12/2013 | 33 |
2 | 80 | 9/30/2013 | 30 | 10/24/2013 | 30 | 12/2/2013 | 30 | ||||||||
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.