the following drug data ;
the goal is to count days overlap of at least two drugs taken and list drug names per patient id(PID). i am looking forward the output data should look like the following.
PID | drug_name | drug_cnt | overlapping_flag | start_date | end_date | count_days |
A | ARIPIPRAZOLE | 1 | 0 | 1/1/15 | 2/1/15 | 31 |
A | ARIPIPRAZOLE_DESIPRAMINE | 2 | 1 | 2/1/15 | 2/20/15 | 19 |
A | ARIPIPRAZOLE_DESIPRAMINE_QUETIAPINE | 3 | 1 | 2/20/15 | 3/1/15 | 9 |
A | DESIPRAMINE_QUETIAPINE | 2 | 1 | 2/20/15 | 4/1/15 | 40 |
A | QUETIAPINE | 1 | 0 | 4/10/15 | 5/1/15 | 21 |
the data source as the below:
PID | drug | med_taken_begin_dt | med_taken_finish_dt | MD_taken_by_days |
1 | HALOPERIDOL DECANOATE | 11Dec2013 | 15Dec2017 | 1465 |
1 | LITHIUM CARBONATE | 23Jan2014 | 07Feb2018 | 1476 |
1 | CLOZAPINE | 03Mar2014 | 08Feb2018 | 1438 |
2 | SERTRALINE | 25Feb2014 | 30Jun2014 | 125 |
2 | LITHIUM CARBONATE | 27Feb2014 | 20Jan2017 | 1058 |
2 | QUETIAPINE | 03Mar2014 | 14Feb2017 | 1079 |
2 | HALOPERIDOL | 03Apr2014 | 08Sep2016 | 889 |
2 | TRAZODONE | 01May2014 | 20Jan2017 | 995 |
2 | BUPROPION HCL | 10Jul2014 | 31Jul2014 | 21 |
2 | BUPROPION HCL | 31Jul2014 | 09Sep2014 | 40 |
2 | DULOXETINE | 11Sep2014 | 17Oct2014 | 36 |
2 | ARIPIPRAZOLE | 23Apr2015 | 25Jun2015 | 63 |
2 | LITHIUM CARBONATE | 26Apr2015 | 01May2015 | 5 |
2 | LORAZEPAM | 27Apr2015 | 27Apr2015 | 0 |
2 | FLUOXETINE | 15Oct2015 | 09Nov2015 | 25 |
2 | QUETIAPINE | 14Apr2016 | 15Sep2016 | 154 |
2 | SEROQUEL | 13Jun2016 | 20Jun2016 | 7 |
2 | OLANZAPINE | 08Sep2016 | 24Mar2017 | 197 |
2 | RISPERIDONE | 01Feb2017 | 19Mar2017 | 46 |
2 | CLOZAPINE | 01Mar2017 | 31Mar2017 | 30 |
3 | DESIPRAMINE | 11Nov2013 | 22Sep2014 | 315 |
3 | QUETIAPINE | 26Nov2013 | 26Jan2017 | 1157 |
3 | ARIPIPRAZOLE | 10Jan2014 | 26Jan2017 | 1112 |
3 | VENLAFAXINE | 28Mar2014 | 26Jan2017 | 1035 |
Thanks lots!
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Some questions:
Do you have your data is SAS yet?
Are your existing date variables SAS date values or character?
Do you have a maximum number of "overlaps" you want to consider? Since your example start end periods cover 4 or 5 years there could be a great many drugs at the same time.
Your apparent rule is to combine the drug names by inserting an underscore between the overlaps. Is that the actual case?
Also your example output is for a PID not included in your input so it is a tad difficult to check.
But this may provide enough to get you started. NOTE: the date values must be SAS date values.
data have; infile datalines dlm='|'; informat PID $5. drug $25. med_taken_begin_dt med_taken_finish_dt date9. MD_taken_by_days best5. ; format med_taken_begin_dt med_taken_finish_dt date9.; input PID drug med_taken_begin_dt med_taken_finish_dt MD_taken_by_days; datalines; 1|HALOPERIDOL DECANOATE|11Dec2013|15Dec2017|1465| 1|LITHIUM CARBONATE|23Jan2014|07Feb2018|1476| 1|CLOZAPINE|03Mar2014|08Feb2018|1438| 2|SERTRALINE|25Feb2014|30Jun2014|125| 2|LITHIUM CARBONATE|27Feb2014|20Jan2017|1058| 2|QUETIAPINE|03Mar2014|14Feb2017|1079| 2|HALOPERIDOL|03Apr2014|08Sep2016|889| 2|TRAZODONE|01May2014|20Jan2017|995| 2|BUPROPION HCL|10Jul2014|31Jul2014|21| 2|BUPROPION HCL|31Jul2014|09Sep2014|40| 2|DULOXETINE|11Sep2014|17Oct2014|36| 2|ARIPIPRAZOLE|23Apr2015|25Jun2015|63| 2|LITHIUM CARBONATE|26Apr2015|01May2015|5| 2|LORAZEPAM|27Apr2015|27Apr2015|0| 2|FLUOXETINE|15Oct2015|09Nov2015|25| 2|QUETIAPINE|14Apr2016|15Sep2016|154| 2|SEROQUEL|13Jun2016|20Jun2016|7| 2|OLANZAPINE|08Sep2016|24Mar2017|197| 2|RISPERIDONE|01Feb2017|19Mar2017|46| 2|CLOZAPINE|01Mar2017|31Mar2017|30| 3|DESIPRAMINE|11Nov2013|22Sep2014|315| 3|QUETIAPINE|26Nov2013|26Jan2017|1157| 3|ARIPIPRAZOLE|10Jan2014|26Jan2017|1112| 3|VENLAFAXINE|28Mar2014|26Jan2017|1035| ; run; data temp; set have; do takedt = med_taken_begin_dt to med_taken_finish_dt; output; end; format takedt date9.; drop med_taken_begin_dt med_taken_finish_dt MD_taken_by_days; run; proc sort data=temp; by pid takedt; run; proc transpose data=temp out=temptrans (drop=_name_); by pid takedt; var drug; run; data combineddrug; set temptrans; combineddrug= catx('_',of col:); drop col: ; run; proc summary data=combineddrug nway; class pid combineddrug; var takedt; output out= combinesummary (drop=_:) min= start_date max=end_date; run; data want; set combinesummary; drug_cnt = countw(combineddrug,'_'); overlapping_flag = (drug_cnt > 1); count_days = end_date - start_date +1; format start_date end_date mmddyy6.; run;
The most problematic issue is with the data step that creates combineddrug. You may well need to set a length statement to accomodate the combined lengths of all of the names, making sure to allow for the inserted _ . The default as used is set to 200 characters. if that isn't enough then add something like Length combineddrug $ 500; Keep increasing the number until everything fits.
This is a common question, perhaps one of the solutions above will help you out.
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Some questions:
Do you have your data is SAS yet?
Are your existing date variables SAS date values or character?
Do you have a maximum number of "overlaps" you want to consider? Since your example start end periods cover 4 or 5 years there could be a great many drugs at the same time.
Your apparent rule is to combine the drug names by inserting an underscore between the overlaps. Is that the actual case?
Also your example output is for a PID not included in your input so it is a tad difficult to check.
But this may provide enough to get you started. NOTE: the date values must be SAS date values.
data have; infile datalines dlm='|'; informat PID $5. drug $25. med_taken_begin_dt med_taken_finish_dt date9. MD_taken_by_days best5. ; format med_taken_begin_dt med_taken_finish_dt date9.; input PID drug med_taken_begin_dt med_taken_finish_dt MD_taken_by_days; datalines; 1|HALOPERIDOL DECANOATE|11Dec2013|15Dec2017|1465| 1|LITHIUM CARBONATE|23Jan2014|07Feb2018|1476| 1|CLOZAPINE|03Mar2014|08Feb2018|1438| 2|SERTRALINE|25Feb2014|30Jun2014|125| 2|LITHIUM CARBONATE|27Feb2014|20Jan2017|1058| 2|QUETIAPINE|03Mar2014|14Feb2017|1079| 2|HALOPERIDOL|03Apr2014|08Sep2016|889| 2|TRAZODONE|01May2014|20Jan2017|995| 2|BUPROPION HCL|10Jul2014|31Jul2014|21| 2|BUPROPION HCL|31Jul2014|09Sep2014|40| 2|DULOXETINE|11Sep2014|17Oct2014|36| 2|ARIPIPRAZOLE|23Apr2015|25Jun2015|63| 2|LITHIUM CARBONATE|26Apr2015|01May2015|5| 2|LORAZEPAM|27Apr2015|27Apr2015|0| 2|FLUOXETINE|15Oct2015|09Nov2015|25| 2|QUETIAPINE|14Apr2016|15Sep2016|154| 2|SEROQUEL|13Jun2016|20Jun2016|7| 2|OLANZAPINE|08Sep2016|24Mar2017|197| 2|RISPERIDONE|01Feb2017|19Mar2017|46| 2|CLOZAPINE|01Mar2017|31Mar2017|30| 3|DESIPRAMINE|11Nov2013|22Sep2014|315| 3|QUETIAPINE|26Nov2013|26Jan2017|1157| 3|ARIPIPRAZOLE|10Jan2014|26Jan2017|1112| 3|VENLAFAXINE|28Mar2014|26Jan2017|1035| ; run; data temp; set have; do takedt = med_taken_begin_dt to med_taken_finish_dt; output; end; format takedt date9.; drop med_taken_begin_dt med_taken_finish_dt MD_taken_by_days; run; proc sort data=temp; by pid takedt; run; proc transpose data=temp out=temptrans (drop=_name_); by pid takedt; var drug; run; data combineddrug; set temptrans; combineddrug= catx('_',of col:); drop col: ; run; proc summary data=combineddrug nway; class pid combineddrug; var takedt; output out= combinesummary (drop=_:) min= start_date max=end_date; run; data want; set combinesummary; drug_cnt = countw(combineddrug,'_'); overlapping_flag = (drug_cnt > 1); count_days = end_date - start_date +1; format start_date end_date mmddyy6.; run;
The most problematic issue is with the data step that creates combineddrug. You may well need to set a length statement to accomodate the combined lengths of all of the names, making sure to allow for the inserted _ . The default as used is set to 200 characters. if that isn't enough then add something like Length combineddrug $ 500; Keep increasing the number until everything fits.
@ ballardw thanks a lot!!!!!!!!!!!!
That is exact what i want!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.