BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shasank
Quartz | Level 8

Hi SAS community,

 

Thank you for looking.

 

I would like your help in the following question.

 

I have a dataset with ID Drug and Startdate variables. Each observation means a drug given to each person on that specific data. I wanted to have a final dataset with a new variables that will be generated using following logic :-

 

To create 'Regimen1' variable :-

Drug1 given to the person + all the drugs given within a period of 10 days of drug1 Startdate

 

To create 'Regimen2' variable :-

Scan to see if there is any new Drug given to the person after Regimen1. If any found not in Regimen1 (lerts assume that new drug is name DrugX)  then Regimen2 is all drugs given in the interval of DrugX Startdate minus 3 days to Startdate of DrugX plus 10 days.

 

This should go on until there is no new drug given to the person.

 

The dataset looks like this.

 

PIDStartDateDrug
19-May-12A
110-May-12A
110-May-12B
130-May-12B
131-May-12A
11-Jun-12A
12-Jun-12B
130-Jun-12C
13-Jul-12A
127-Jul-12A
128-Jul-12D
217-May-12A
222-May-12B
225-May-12B
21-Jun-12C
22-Jun-12D
23-Jun-12B
326-Jun-12A
329-Jun-12A
327-Jul-12B
325-Aug-12B
426-Jun-12A
427-Jun-12B
428-Jun-12C
425-Aug-12A
426-Aug-12B
427-Aug-12B

 

The output should look like this.

 

PIDRegimen1Regimen2Regimen3
1A + B A + CA+D
2A+BB+C+D.
3AB .
4A+B+C . .

 

As told each ID represents 1 person.

Drug1 namely A was the 1st drug given to person 1. The Startdate of A was on 9-May-12. So, all drugs in the interval of  is 9/May/12 + 10days = 19/May/12 is Regimen1. only Drug A and Drug B were in that intervel. So, Regimen1 will have drugs A and B under it. as shown in the output table.

Next, Scan for other drugs given to person 1 and if any drug other than A or B (Regimen1). Under person 1 Drug C was given on 30/Jun/12 which is not in Regimen1. 

So, Regimen2 should have all the drugs given in the interval of drug C Startdate (30/Jun/12) minus 3 = 27/Jun/12 to Drug C startdate plus 10days = 10/Jul/12. So, Regimen2 should have Drugs A and C.

This logic is same for creation of Regimen3.

Regimen2 should contain drug not in Regimen1.

Regimen3 should trigger when there is a drug not in Regimen2  and  trigger the chain of Newdrug startdate - 3 to Newdrugstartdate+10.

 

This should be looped until there are no new drugs given to each person.

The sequence of drugs under the variable 'Regimen#' is not important. it can be A + B or B + A. 

 

Thank you for your time and help SAS Masters.

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

@shasank Try this and let me know

 

data have;

    input PID StartDate :anydtdte. Drug $;

    format startDate yymmdd10.;

    datalines;

1 10MAY2012 arb

1 10MAY2012 acl

1 17MAY2012 arb

1 17MAY2012 acl

1 30MAY2012 rlo

1 30MAY2012 rlo

1 30MAY2012 rlo

1 26JUN2012 rlo

1 19JUL2012 rlo

1 27AUG2012 rlo

1 25SEP2012 rlo

2 19FEB2008 eva

2 19FEB2008 eva

2 19FEB2008 arb

2 19FEB2008 arb

2 19FEB2008 emc

2 28FEB2010 ymc

3 09NOV2011 isp

3 27NOV2011 emc

;

run;

 

proc sql;
select count(distinct Drug) into : reg_limit TRIMMED
from have;
quit;

 

 

data want;

if _N_ = 1 then do;

      if 0 then set have;

    declare hash h(dataset: "have",multidata: 'y');

    h.defineKey('pid');

    h.defineData('StartDate','drug');

    h.defineDone();

 end;

do until(last.pid);

      set have(rename=(startdate=_startdate drug=_drug));

      by pid;

      array r(*) $100 reg1-reg&reg_limit;

if first.pid then

            do;

                  call missing(of r(*));

                  _count=1;

                  r(_count)=_drug;

                  do while(h.do_over(key: pid) eq 0);

                  if    intck('day', _startdate, startdate)<=10 and findw(r(_count), strip(drug))=0  then

                        do;                          

                              r(_count)=catx('+', r(_count), drug);                      

                        end;

                  end;

            end;

else do;

      if findw(r(_count), strip(_drug))=0 then

            do;

                  _count+1;

                  r(_count)=_drug;

                  do while(h.do_over(key: pid) eq 0); 

                  if intnx('day', _startdate, -3)<=startdate<=intnx('day', _startdate, 10) and findw(r(_count), strip(drug))=0 then

                        do;

                              r(_count)=catx('+', r(_count), drug);

                        end;

                  end;

            end;

      end;

end;

keep pid reg:;

run;

View solution in original post

27 REPLIES 27
Reeza
Super User

Here's a basic way, probably not as efficient as a hash solution. 

 

1. Sort by Person, Drug

2. Identify the first of each Drug - assumes no prior history not included which is unlikely realistically. You can create a data set here with only the first drug start date. 

3. Use a SQL join to find any other drugs assigned within 10 days

4. Create your regiments by combing the rows using CATX. 

5. Transpose to a wide format -> this approach doesn't require any knowledge of how many possible combinations there are ahead of time. 

 

Steps 1, 2 & 5 should be relatively easy. 

For 

#3

https://communities.sas.com/t5/Base-SAS-Programming/finding-values-happened-30-days-before-a-referen...

Lots of examples on here

 

#4:

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

 

If you have questions or issues, post your code and explain the issue. 

PS. Please make sure to include your data as a data step:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 


@shasank wrote:

Hi SAS community,

 

Thank you for looking.

 

I would like your help in the following question.

 

I have a dataset with ID Drug and Startdate variables. Each observation means a drug given to each person on that specific data. I wanted to have a final dataset with a new variables that will be generated using following logic :-

 

To create 'Regimen1' variable :-

Drug1 given to the person + all the drugs given within a period of 10 days of drug1 Startdate

 

To create 'Regimen2' variable :-

Scan to see if there is any new Drug given to the person after Regimen1. If any found not in Regimen1 (lerts assume that new drug is name DrugX)  then Regimen2 is all drugs given in the interval of DrugX Startdate minus 3 days to Startdate of DrugX plus 10 days.

 

This should go on until there is no new drug given to the person.

 

The dataset looks like this.

 

PID StartDate Drug
1 9-May-12 A
1 10-May-12 A
1 10-May-12 B
1 30-May-12 B
1 31-May-12 A
1 1-Jun-12 A
1 2-Jun-12 B
1 30-Jun-12 C
1 3-Jul-12 A
1 27-Jul-12 A
1 28-Jul-12 D
2 17-May-12 A
2 22-May-12 B
2 25-May-12 B
2 1-Jun-12 C
2 2-Jun-12 D
2 3-Jun-12 B
3 26-Jun-12 A
3 29-Jun-12 A
3 27-Jul-12 B
3 25-Aug-12 B
4 26-Jun-12 A
4 27-Jun-12 B
4 28-Jun-12 C
4 25-Aug-12 A
4 26-Aug-12 B
4 27-Aug-12 B

 

The output should look like this.

 

PID Regimen1 Regimen2 Regimen3
1 A + B  A + C A+D
2 A+B B+C+D .
3 A B  .
4 A+B+C  .  .

 

As told each ID represents 1 person.

Drug1 namely A was the 1st drug given to person 1. The Startdate of A was on 9-May-12. So, all drugs in the interval of  is 9/May/12 + 10days = 19/May/12 is Regimen1. only Drug A and Drug B were in that intervel. So, Regimen1 will have drugs A and B under it. as shown in the output table.

Next, Scan for other drugs given to person 1 and if any drug other than A or B (Regimen1). Under person 1 Drug C was given on 30/Jun/12 which is not in Regimen1. 

So, Regimen2 should have all the drugs given in the interval of drug C Startdate (30/Jun/12) minus 3 = 27/Jun/12 to Drug C startdate plus 10days = 10/Jul/12. So, Regimen2 should have Drugs A and C.

This logic is same for creation of Regimen3.

Regimen2 should contain drug not in Regimen1.

Regimen3 should trigger when there is a drug not in Regimen2  and  trigger the chain of Newdrug startdate - 3 to Newdrugstartdate+10.

 

This should be looped until there are no new drugs given to each person.

The sequence of drugs under the variable 'Regimen#' is not important. it can be A + B or B + A. 

 

Thank you for your time and help SAS Masters.

 


 

 

shasank
Quartz | Level 8

Hi Reeza,

 

This is what I have worked on. I get zero observations performing Proc SQL. 

 

please you identify any errors?

 

data Fulldata;
input ID Startdate Drug $;
informat Startdate date9.;
format startdate date9.;
cards;
1	9-May-12	A
1	10-May-12	A
1	10-May-12	B
1	30-May-12	B
1	31-May-12	A
1	1-Jun-12	A
1	2-Jun-12	B
1	30-Jun-12	C
1	3-Jul-12	A
1	27-Jul-12	A
1	28-Jul-12	D
2	17-May-12	A
2	22-May-12	B
2	25-May-12	B
2	1-Jun-12	C
2	2-Jun-12	D
2	3-Jun-12	B
3	26-Jun-12	A
3	29-Jun-12	A
3	27-Jul-12	B
3	25-Aug-12	B
4	26-Jun-12	A
4	27-Jun-12	B
4	28-Jun-12	C
4	25-Aug-12	A
4	26-Aug-12	B
4	27-Aug-12	B
run;

proc sort data = Fulldata out = b;
by ID drug ;
run;

data firstchem;
set b;
by ID Drug;
if first.drug and First.drug then output;
Firstdate = startdate;
drop startdate;
run;

proc sql;
create table preD as
select a.*, b.*, a.Startdate-b.Firstdate as days_before
from Fulldata as a
join firstchem as b
on a.id=b.id
where 0< (a.Startdate-b.Firstdate) < 3;
quit;

proc sql;
create table postD as 
select a.*, b.*, b.Firstdate-a.Startdate  as days_after
from Fulldata as a
join firstchem as b
on a.id=b.id
where 0< (b.Firstdate-a.Startdate) < 10;
quit;
Reeza
Super User
Verify your data is being read in correctly, given what I see here, I suspect not. You'll also want to join on making sure the drug isn't the same drug.
shasank
Quartz | Level 8
Hi Reeza,
That was a typo. This is the syntax I used.
data firstchem;
set b;
by ID Drug;
if first.id and First.drug then output;
Firstdate = startdate;
drop startdate;
run;
Yet I get same missing
Reeza
Super User

Here's a start then, I'm working on Reg1, within 10 days. 

Notice the 4th & 5th records in the output. You didn't specify how you'd want to deal with that situation. 

 

data firstchem;
set b;
by ID Drug;
if first.drug then output;
rename StartDate = FirstDate;
run;


proc sql;
create table pre10 as
select FC.ID, FC.FirstDate, FD.StartDate, FC.Drug as Drug1, FD.Drug as Drug2, FD.StartDate-FC.Firstdate as Days_Diff
from firstchem as FC
join fulldata as FD
on FC.id=FD.id
and FC.drug ne FD.drug
and FD.StartDate >= FC.FirstDate
and FD.DRUG > FC.DRUG 
where calculated Days_Diff between 0 and 10;
quit;
shasank
Quartz | Level 8
Hi Reeza,

The logic is the same for the all the variable. Regimen2 should be compared
to regimen1. Regimen3 with Regimen2. Regimen4 with Regimen3. The regimen
creation also has the same logic of new drug minus 3 to new drug plus 10
for all regimens.
I hope I am clear.
novinosrin
Tourmaline | Level 20

@shasank  As much as I appreciate to have taken the effort to clarify your requirement in a much better way than what you did in the other thread - https://communities.sas.com/t5/SAS-Data-Management/How-to-create-a-drug-course-using-dates/m-p/42469... , your desperation is rather worrying.

Well, well, well here you go

I challenge you to clean my code and make it easily readable as i am tired 

data have;

input PID   StartDate : date7.      Drug $;

format StartDate date7.;

datalines;

1     9-May-12    A

1     10-May-12   A

1     10-May-12   B

1     30-May-12   B

1     31-May-12   A

1     1-Jun-12    A

1     2-Jun-12    B

1     30-Jun-12   C

1     3-Jul-12    A

1     27-Jul-12   A

1     28-Jul-12   D

2     17-May-12   A

2     22-May-12   B

2     25-May-12   B

2     1-Jun-12    C

2     2-Jun-12    D

2     3-Jun-12    B

3     26-Jun-12   A

3     29-Jun-12   A

3     27-Jul-12   B

3     25-Aug-12   B

4     26-Jun-12   A

4     27-Jun-12   B

4     28-Jun-12   C

4     25-Aug-12   A

4     26-Aug-12   B

4     27-Aug-12   B

;

 

 

proc sql;

select count(distinct Drug) into : reg_limit TRIMMED

from have;

quit;

 

%put &reg_limit;

 

data want2;

if _N_ = 1 then do;

      if 0 then set have;

    declare hash h(dataset: "have",multidata: 'y');

    h.defineKey('pid');

    h.defineData('StartDate','drug');

    h.defineDone();

 end;

      length _t $500;

      set have(rename=(startdate=_startdate drug=_drug));

      by pid;

      array t(*) $100 reg1-reg&reg_limit;

      retain _t t;

      if first.pid then do;

            call missing(of t(*),_t);

            _count=0;

            _count+1;

            t(_count)=_drug;

            _n=0;

            do while(h.do_over(key: pid) eq 0);

                  k=intck('day', _startdate, startdate);

                  if k<=10 and drug ne _drug then  f=1;

                  else f=0;

                  if f=1 then

                        do;

                              _n+1;

                              if _n=1 then  t(_count)=catx('+', _drug, drug);

                              else if countc(strip(_t), strip(drug))=0 then   t(_count)=catx('+', t(_count), drug);

                        end;

                       

                        _t=cats(of t(*));

                  *output;

            end;

           

            _t=cats(of t(*));

      end;

      else

            do;

                  check=countc(strip(_t), strip(_drug));

                  if countc(strip(_t), strip(_drug))=0 then

                  do;

                        _count+1;

                        t(_count)=_drug;

                        do while(h.do_over(key: pid) eq 0);                   

                              if intnx('day', _startdate, -3)<=startdate<=intnx('day', _startdate, 10) and drug ne _drug  then t(_count)=catx('+', t(_count), drug);

                               

                              _t=cats(of t(*));

                                    *output;

                        end;

                       

                  end;

            end;

     

  if last.pid then output;

  keep pid reg: ;

     

run;

 

shasank
Quartz | Level 8
Thank you very much for you help and also for sharing your expertise.
novinosrin
Tourmaline | Level 20

@shasank No worries. I know where to shorten my code, remove flag variables and make it a touch more classy and neat. I am too tired for anything now as I had classes all day. Since I saw your repeated requests mentioning every super user/proc star in the morning till afternoon and a new thread with the same question of course with good effort in explaining the requirement, I was emotionally moved to help. So I did after coming back from class. 

 

Either way, I am gonna go home now and sleep. I'll see if I can clean the code and add some comments tomorrow if I don't have much of my own homework. Take care and good night!

shasank
Quartz | Level 8

HI @novinosrin , 

 

Thank you for your reply and I have been trying on your code and making small changes to make it work for me but the code doesn't seem to work my actual data. I donot understand why.

 

Here is the raw data which is de-identified

 

data have;
    input PID StartDate :anydtdte. Drug $;
    format startDate yymmdd10.;
    datalines;
1 10MAY2012 arb
1 10MAY2012 acl
1 17MAY2012 arb
1 17MAY2012 acl
1 30MAY2012 rlo
1 30MAY2012 rlo
1 30MAY2012 rlo
1 26JUN2012 rlo
1 19JUL2012 rlo
1 27AUG2012 rlo
1 25SEP2012 rlo
2 19FEB2008 eva
2 19FEB2008 eva
2 19FEB2008 arb
2 19FEB2008 arb
2 19FEB2008 emc
2 28FEB2010 ymc
3 09NOV2011 isp
3 27NOV2011 emc
;
run;

proc sort data=have;
    by PID Startdate Drug;
run;

proc sql;
    select count(distinct Drug) into : reg_limit TRIMMED from have;
quit;

%put &reg_limit;

data want2;
    if _N_=1 then
        do;

            if 0 then
                set have;
            declare hash h(dataset: "have", multidata: 'y');
            h.defineKey('pid');
            h.defineData('StartDate', 'drug');
            h.defineDone();
        end;
    length _t $500;
    set have(rename=(startdate=_startdate drug=_drug));
    by pid;
    array t(*) $100 Line1-Line&reg_limit;
    retain _t t;

    if first.pid then
        do;
            call missing(of t(*), _t);
            _count=0;
            _count+1;
            t(_count)=_drug;
            _n=0;

            do while(h.do_over(key: pid) eq 0);
                k=intck('day', _startdate, startdate);

                if k<=10 and drug ne _drug then
                    f=1;
                else
                    f=0;

                if f=1 then
                    do;
                        _n+1;

                        if _n=1 then
                            t(_count)=catx('+', _drug, drug);
                        else if countc(strip(_t), strip(drug))=0 then
                            t(_count)=catx('+', t(_count), drug);
                    end;
                _t=cats(of t(*));
                *output;
            end;
            _t=cats(of t(*));
        end;
    else
        do;
            check=countc(strip(_t), strip(_drug));

            if countc(strip(_t), strip(_drug))=0 then
                do;
                    _count+1;
                    t(_count)=_drug;

                    do while(h.do_over(key: pid) eq 0);

                        if intnx('day', _startdate, -3)<=startdate<=intnx('day', _startdate, 10) 
                            and drug ne _drug then
                                t(_count)=catx('+', t(_count), drug);
                        _t=cats(of t(*));
                        *output;
                    end;
                end;
        end;

    if last.pid then
        output;
    keep pid Line:;
run;

/////***********************************************************//////

 

 

 This data doesn't seem to yield the same result as expected from the model dataset I have posted before. Can you let me know why is that happening.

If you observe the below output I renamed Reg to line. The drug rlo is not coming up under line2/reg2 for PID 1 ans same goes for PID 2

This is the output I get is below:

Please suggest.

 

image.png

 

Reeza
Super User
Your posted code was garbled in the post, please modify it.
shasank
Quartz | Level 8
Hi Reeza,

Here is the edited post. I hope this is clear.
Reeza
Super User

@shasank I edited it to make it back into the code editor and formatted it. 

 

Regarding your question, check that your variable types and lengths are the same as what you get in the sample data you provided. Essentially, there's a difference in the source data, since only you have the source data, it's harder for us to say how it's different.

 

PS. I did edit this post to add some content.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 27 replies
  • 3398 views
  • 0 likes
  • 5 in conversation