Solved: How to identify a change across observations and flag it to create a n...

shasank · Posted 01-05-2018 03:28 PM

Hi SAS community,

Thank you for looking.

I would like your help in the following question.

I have a dataset with ID Drug and Startdate variables. Each observation means a drug given to each person on that specific data. I wanted to have a final dataset with a new variables that will be generated using following logic :-

To create 'Regimen1' variable :-

Drug1 given to the person + all the drugs given within a period of 10 days of drug1 Startdate

To create 'Regimen2' variable :-

Scan to see if there is any new Drug given to the person after Regimen1. If any found not in Regimen1 (lerts assume that new drug is name DrugX) then Regimen2 is all drugs given in the interval of DrugX Startdate minus 3 days to Startdate of DrugX plus 10 days.

This should go on until there is no new drug given to the person.

The dataset looks like this.

PID	StartDate	Drug
1	9-May-12	A
1	10-May-12	A
1	10-May-12	B
1	30-May-12	B
1	31-May-12	A
1	1-Jun-12	A
1	2-Jun-12	B
1	30-Jun-12	C
1	3-Jul-12	A
1	27-Jul-12	A
1	28-Jul-12	D
2	17-May-12	A
2	22-May-12	B
2	25-May-12	B
2	1-Jun-12	C
2	2-Jun-12	D
2	3-Jun-12	B
3	26-Jun-12	A
3	29-Jun-12	A
3	27-Jul-12	B
3	25-Aug-12	B
4	26-Jun-12	A
4	27-Jun-12	B
4	28-Jun-12	C
4	25-Aug-12	A
4	26-Aug-12	B
4	27-Aug-12	B

The output should look like this.

PID	Regimen1	Regimen2	Regimen3
1	A + B	A + C	A+D
2	A+B	B+C+D	.
3	A	B	.
4	A+B+C	.	.

As told each ID represents 1 person.

Drug1 namely A was the 1st drug given to person 1. The Startdate of A was on 9-May-12. So, all drugs in the interval of is 9/May/12 + 10days = 19/May/12 is Regimen1. only Drug A and Drug B were in that intervel. So, Regimen1 will have drugs A and B under it. as shown in the output table.

Next, Scan for other drugs given to person 1 and if any drug other than A or B (Regimen1). Under person 1 Drug C was given on 30/Jun/12 which is not in Regimen1.

So, Regimen2 should have all the drugs given in the interval of drug C Startdate (30/Jun/12) minus 3 = 27/Jun/12 to Drug C startdate plus 10days = 10/Jul/12. So, Regimen2 should have Drugs A and C.

This logic is same for creation of Regimen3.

Regimen2 should contain drug not in Regimen1.

Regimen3 should trigger when there is a drug not in Regimen2 and trigger the chain of Newdrug startdate - 3 to Newdrugstartdate+10.

This should be looped until there are no new drugs given to each person.

The sequence of drugs under the variable 'Regimen#' is not important. it can be A + B or B + A.

Thank you for your time and help SAS Masters.

novinosrin · Posted 01-09-2018 04:32 PM

@shasank Try this and let me know

data have;

input PID StartDate :anydtdte. Drug $;

format startDate yymmdd10.;

datalines;

1 10MAY2012 arb

1 10MAY2012 acl

1 17MAY2012 arb

1 17MAY2012 acl

1 30MAY2012 rlo

1 26JUN2012 rlo

1 19JUL2012 rlo

1 27AUG2012 rlo

1 25SEP2012 rlo

2 19FEB2008 eva

2 19FEB2008 arb

2 19FEB2008 emc

2 28FEB2010 ymc

3 09NOV2011 isp

3 27NOV2011 emc

;

run;

proc sql;
select count(distinct Drug) into : reg_limit TRIMMED
from have;
quit;

data want;

if _N_ = 1 then do;

if 0 then set have;

declare hash h(dataset: "have",multidata: 'y');

h.defineKey('pid');

h.defineData('StartDate','drug');

h.defineDone();

end;

do until(last.pid);

set have(rename=(startdate=_startdate drug=_drug));

by pid;

array r(*) $100 reg1-reg&reg_limit;

if first.pid then

do;

call missing(of r(*));

_count=1;

r(_count)=_drug;

do while(h.do_over(key: pid) eq 0);

if intck('day', _startdate, startdate)<=10 and findw(r(_count), strip(drug))=0 then

do;

r(_count)=catx('+', r(_count), drug);

end;

else do;

if findw(r(_count), strip(_drug))=0 then

do;

_count+1;

r(_count)=_drug;

do while(h.do_over(key: pid) eq 0);

if intnx('day', _startdate, -3)<=startdate<=intnx('day', _startdate, 10) and findw(r(_count), strip(drug))=0 then

do;

r(_count)=catx('+', r(_count), drug);

end;

keep pid reg:;

run;

View solution in original post

Reeza · Posted 01-05-2018 03:41 PM

Here's a basic way, probably not as efficient as a hash solution.

1. Sort by Person, Drug

2. Identify the first of each Drug - assumes no prior history not included which is unlikely realistically. You can create a data set here with only the first drug start date.

3. Use a SQL join to find any other drugs assigned within 10 days

4. Create your regiments by combing the rows using CATX.

5. Transpose to a wide format -> this approach doesn't require any knowledge of how many possible combinations there are ahead of time.

Steps 1, 2 & 5 should be relatively easy.

For

#3

https://communities.sas.com/t5/Base-SAS-Programming/finding-values-happened-30-days-before-a-referen...

Lots of examples on here

#4:

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

If you have questions or issues, post your code and explain the issue.

PS. Please make sure to include your data as a data step:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

@shasank wrote:

Hi SAS community,

Thank you for looking.

I would like your help in the following question.

I have a dataset with ID Drug and Startdate variables. Each observation means a drug given to each person on that specific data. I wanted to have a final dataset with a new variables that will be generated using following logic :-

To create 'Regimen1' variable :-

Drug1 given to the person + all the drugs given within a period of 10 days of drug1 Startdate

To create 'Regimen2' variable :-

Scan to see if there is any new Drug given to the person after Regimen1. If any found not in Regimen1 (lerts assume that new drug is name DrugX) then Regimen2 is all drugs given in the interval of DrugX Startdate minus 3 days to Startdate of DrugX plus 10 days.

This should go on until there is no new drug given to the person.

The dataset looks like this.

PID StartDate Drug

1 9-May-12 A

1 10-May-12 A

1 10-May-12 B

1 30-May-12 B

1 31-May-12 A

1 1-Jun-12 A

1 2-Jun-12 B

1 30-Jun-12 C

1 3-Jul-12 A

1 27-Jul-12 A

1 28-Jul-12 D

2 17-May-12 A

2 22-May-12 B

2 25-May-12 B

2 1-Jun-12 C

2 2-Jun-12 D

2 3-Jun-12 B

3 26-Jun-12 A

3 29-Jun-12 A

3 27-Jul-12 B

3 25-Aug-12 B

4 26-Jun-12 A

4 27-Jun-12 B

4 28-Jun-12 C

4 25-Aug-12 A

4 26-Aug-12 B

4 27-Aug-12 B

The output should look like this.

PID Regimen1 Regimen2 Regimen3

1 A + B A + C A+D

2 A+B B+C+D .

3 A B .

4 A+B+C . .

As told each ID represents 1 person.

Drug1 namely A was the 1st drug given to person 1. The Startdate of A was on 9-May-12. So, all drugs in the interval of is 9/May/12 + 10days = 19/May/12 is Regimen1. only Drug A and Drug B were in that intervel. So, Regimen1 will have drugs A and B under it. as shown in the output table.

Next, Scan for other drugs given to person 1 and if any drug other than A or B (Regimen1). Under person 1 Drug C was given on 30/Jun/12 which is not in Regimen1.

So, Regimen2 should have all the drugs given in the interval of drug C Startdate (30/Jun/12) minus 3 = 27/Jun/12 to Drug C startdate plus 10days = 10/Jul/12. So, Regimen2 should have Drugs A and C.

This logic is same for creation of Regimen3.

Regimen2 should contain drug not in Regimen1.

Regimen3 should trigger when there is a drug not in Regimen2 and trigger the chain of Newdrug startdate - 3 to Newdrugstartdate+10.

This should be looped until there are no new drugs given to each person.

The sequence of drugs under the variable 'Regimen#' is not important. it can be A + B or B + A.

Thank you for your time and help SAS Masters.

shasank · Posted 01-05-2018 06:21 PM

Hi Reeza,

This is what I have worked on. I get zero observations performing Proc SQL.

please you identify any errors?

data Fulldata;
input ID Startdate Drug $;
informat Startdate date9.;
format startdate date9.;
cards;
1	9-May-12	A
1	10-May-12	A
1	10-May-12	B
1	30-May-12	B
1	31-May-12	A
1	1-Jun-12	A
1	2-Jun-12	B
1	30-Jun-12	C
1	3-Jul-12	A
1	27-Jul-12	A
1	28-Jul-12	D
2	17-May-12	A
2	22-May-12	B
2	25-May-12	B
2	1-Jun-12	C
2	2-Jun-12	D
2	3-Jun-12	B
3	26-Jun-12	A
3	29-Jun-12	A
3	27-Jul-12	B
3	25-Aug-12	B
4	26-Jun-12	A
4	27-Jun-12	B
4	28-Jun-12	C
4	25-Aug-12	A
4	26-Aug-12	B
4	27-Aug-12	B
run;

proc sort data = Fulldata out = b;
by ID drug ;
run;

data firstchem;
set b;
by ID Drug;
if first.drug and First.drug then output;
Firstdate = startdate;
drop startdate;
run;

proc sql;
create table preD as
select a.*, b.*, a.Startdate-b.Firstdate as days_before
from Fulldata as a
join firstchem as b
on a.id=b.id
where 0< (a.Startdate-b.Firstdate) < 3;
quit;

proc sql;
create table postD as 
select a.*, b.*, b.Firstdate-a.Startdate  as days_after
from Fulldata as a
join firstchem as b
on a.id=b.id
where 0< (b.Firstdate-a.Startdate) < 10;
quit;

Reeza · Posted 01-05-2018 06:23 PM

Verify your data is being read in correctly, given what I see here, I suspect not. You'll also want to join on making sure the drug isn't the same drug.

shasank · Posted 01-05-2018 06:46 PM

Hi Reeza,
That was a typo. This is the syntax I used.
data firstchem;
set b;
by ID Drug;
if first.id and First.drug then output;
Firstdate = startdate;
drop startdate;
run;
Yet I get same missing

Reeza · Posted 01-05-2018 06:59 PM

Here's a start then, I'm working on Reg1, within 10 days.

Notice the 4th & 5th records in the output. You didn't specify how you'd want to deal with that situation.

data firstchem;
set b;
by ID Drug;
if first.drug then output;
rename StartDate = FirstDate;
run;


proc sql;
create table pre10 as
select FC.ID, FC.FirstDate, FD.StartDate, FC.Drug as Drug1, FD.Drug as Drug2, FD.StartDate-FC.Firstdate as Days_Diff
from firstchem as FC
join fulldata as FD
on FC.id=FD.id
and FC.drug ne FD.drug
and FD.StartDate >= FC.FirstDate
and FD.DRUG > FC.DRUG 
where calculated Days_Diff between 0 and 10;
quit;

shasank · Posted 01-05-2018 07:24 PM

Hi Reeza,

The logic is the same for the all the variable. Regimen2 should be compared
to regimen1. Regimen3 with Regimen2. Regimen4 with Regimen3. The regimen
creation also has the same logic of new drug minus 3 to new drug plus 10
for all regimens.
I hope I am clear.

novinosrin · Posted 01-05-2018 08:16 PM

@shasank As much as I appreciate to have taken the effort to clarify your requirement in a much better way than what you did in the other thread - https://communities.sas.com/t5/SAS-Data-Management/How-to-create-a-drug-course-using-dates/m-p/42469... , your desperation is rather worrying.

Well, well, well here you go

I challenge you to clean my code and make it easily readable as i am tired

data have;

input PID StartDate : date7. Drug $;

format StartDate date7.;

datalines;

1 9-May-12 A

1 10-May-12 A

1 10-May-12 B

1 30-May-12 B

1 31-May-12 A

1 1-Jun-12 A

1 2-Jun-12 B

1 30-Jun-12 C

1 3-Jul-12 A

1 27-Jul-12 A

1 28-Jul-12 D

2 17-May-12 A

2 22-May-12 B

2 25-May-12 B

2 1-Jun-12 C

2 2-Jun-12 D

2 3-Jun-12 B

3 26-Jun-12 A

3 29-Jun-12 A

3 27-Jul-12 B

3 25-Aug-12 B

4 26-Jun-12 A

4 27-Jun-12 B

4 28-Jun-12 C

4 25-Aug-12 A

4 26-Aug-12 B

4 27-Aug-12 B

;

proc sql;

select count(distinct Drug) into : reg_limit TRIMMED

from have;

quit;

%put &reg_limit;

data want2;

if _N_ = 1 then do;

if 0 then set have;

declare hash h(dataset: "have",multidata: 'y');

h.defineKey('pid');

h.defineData('StartDate','drug');

h.defineDone();

end;

length _t $500;

set have(rename=(startdate=_startdate drug=_drug));

by pid;

array t(*) $100 reg1-reg&reg_limit;

retain _t t;

if first.pid then do;

call missing(of t(*),_t);

_count=0;

_count+1;

t(_count)=_drug;

_n=0;

do while(h.do_over(key: pid) eq 0);

k=intck('day', _startdate, startdate);

if k<=10 and drug ne _drug then f=1;

else f=0;

if f=1 then

do;

_n+1;

if _n=1 then t(_count)=catx('+', _drug, drug);

else if countc(strip(_t), strip(drug))=0 then t(_count)=catx('+', t(_count), drug);

end;

_t=cats(of t(*));

*output;

end;

_t=cats(of t(*));

end;

else

do;

check=countc(strip(_t), strip(_drug));

if countc(strip(_t), strip(_drug))=0 then

do;

_count+1;

t(_count)=_drug;

do while(h.do_over(key: pid) eq 0);

if intnx('day', _startdate, -3)<=startdate<=intnx('day', _startdate, 10) and drug ne _drug then t(_count)=catx('+', t(_count), drug);

_t=cats(of t(*));

*output;

end;

if last.pid then output;

keep pid reg: ;

run;

shasank · Posted 01-05-2018 08:31 PM

Thank you very much for you help and also for sharing your expertise.

novinosrin · Posted 01-05-2018 08:46 PM

@shasank No worries. I know where to shorten my code, remove flag variables and make it a touch more classy and neat. I am too tired for anything now as I had classes all day. Since I saw your repeated requests mentioning every super user/proc star in the morning till afternoon and a new thread with the same question of course with good effort in explaining the requirement, I was emotionally moved to help. So I did after coming back from class.

Either way, I am gonna go home now and sleep. I'll see if I can clean the code and add some comments tomorrow if I don't have much of my own homework. Take care and good night!

shasank · Posted 01-05-2018 08:49 PM

Thank you Soo much.

shasank · Posted 01-08-2018 03:07 PM

HI @novinosrin ,

Thank you for your reply and I have been trying on your code and making small changes to make it work for me but the code doesn't seem to work my actual data. I donot understand why.

Here is the raw data which is de-identified

data have;
    input PID StartDate :anydtdte. Drug $;
    format startDate yymmdd10.;
    datalines;
1 10MAY2012 arb
1 10MAY2012 acl
1 17MAY2012 arb
1 17MAY2012 acl
1 30MAY2012 rlo
1 30MAY2012 rlo
1 30MAY2012 rlo
1 26JUN2012 rlo
1 19JUL2012 rlo
1 27AUG2012 rlo
1 25SEP2012 rlo
2 19FEB2008 eva
2 19FEB2008 eva
2 19FEB2008 arb
2 19FEB2008 arb
2 19FEB2008 emc
2 28FEB2010 ymc
3 09NOV2011 isp
3 27NOV2011 emc
;
run;

proc sort data=have;
    by PID Startdate Drug;
run;

proc sql;
    select count(distinct Drug) into : reg_limit TRIMMED from have;
quit;

%put &reg_limit;

data want2;
    if _N_=1 then
        do;

            if 0 then
                set have;
            declare hash h(dataset: "have", multidata: 'y');
            h.defineKey('pid');
            h.defineData('StartDate', 'drug');
            h.defineDone();
        end;
    length _t $500;
    set have(rename=(startdate=_startdate drug=_drug));
    by pid;
    array t(*) $100 Line1-Line&reg_limit;
    retain _t t;

    if first.pid then
        do;
            call missing(of t(*), _t);
            _count=0;
            _count+1;
            t(_count)=_drug;
            _n=0;

            do while(h.do_over(key: pid) eq 0);
                k=intck('day', _startdate, startdate);

                if k<=10 and drug ne _drug then
                    f=1;
                else
                    f=0;

                if f=1 then
                    do;
                        _n+1;

                        if _n=1 then
                            t(_count)=catx('+', _drug, drug);
                        else if countc(strip(_t), strip(drug))=0 then
                            t(_count)=catx('+', t(_count), drug);
                    end;
                _t=cats(of t(*));
                *output;
            end;
            _t=cats(of t(*));
        end;
    else
        do;
            check=countc(strip(_t), strip(_drug));

            if countc(strip(_t), strip(_drug))=0 then
                do;
                    _count+1;
                    t(_count)=_drug;

                    do while(h.do_over(key: pid) eq 0);

                        if intnx('day', _startdate, -3)<=startdate<=intnx('day', _startdate, 10) 
                            and drug ne _drug then
                                t(_count)=catx('+', t(_count), drug);
                        _t=cats(of t(*));
                        *output;
                    end;
                end;
        end;

    if last.pid then
        output;
    keep pid Line:;
run;

/////***********************************************************//////

This data doesn't seem to yield the same result as expected from the model dataset I have posted before. Can you let me know why is that happening.

If you observe the below output I renamed Reg to line. The drug rlo is not coming up under line2/reg2 for PID 1 ans same goes for PID 2

This is the output I get is below:

Please suggest.

Reeza · Posted 01-08-2018 04:05 PM

Your posted code was garbled in the post, please modify it.

shasank · Posted 01-08-2018 04:09 PM

Hi Reeza,

Here is the edited post. I hope this is clear.

Reeza · Posted 01-08-2018 04:11 PM

@shasank I edited it to make it back into the code editor and formatted it.

Regarding your question, check that your variable types and lengths are the same as what you get in the sample data you provided. Essentially, there's a difference in the source data, since only you have the source data, it's harder for us to say how it's different.

PS. I did edit this post to add some content.

How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Re: How to identify a change across observations and flag it to create a new variable

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away