Difference between end of current drug and start of next same drug is more than 60,consider them two different usages otherwise club them as single usages with minimum start and maximum end.
INPUT DATA
DRUG S.DATE E.DATE
A 24-Sep-19 03-Dec-19
A 17-Jan-20 21-Feb-20
A 21 -MAY-20 20-Jun-20
B 30-Jun-20 29-Aug-20
B 13-OCT-20 12-NOV-20
OUPUT DATA
DRUG S.DATE E.DATE
A 24-Sep-19 21-Feb-20
A 21-May-20 21-Jun-20
B 30-Jul-20 12-NOV-20
Something like this should work. Untested as you did not provide the data as a program that we could use.
data WANT;
set HAVE;
by DRUG S_DATE;
retain S;
L=lag(E_DATE);
if first.DRUG then S=S_DATE;
if S_DATE - L > 60 then do;
E=L;
output;
S=S_DATE;
end;
if last.DRUG then do;
E=L;
output;
end;
keep DRUG S E;
run;
data a;
length sdate edate$15.;
input DRUG$ SDATE $ EDATE $;
cards;
A 24-Sep-19 03-Dec-19
A 17-Jan-20 21-Feb-20
A 21-MAY-20 20-Jun-20
B 30-Jun-20 29-Aug-20
B 13-OCT-20 12-NOV-20
;
run;
Here is the data and the solution provided does not provide the desired result it gives the following result
A | 17-Jan-20 | 20-Jun-20 |
B | 13-OCT-20 | 12-NOV-20 |
Hi @AmitParmar
The general idea would be to look ahead using a second set statement with firstobs=2.
Save the start date and compare the end date to the start date in next record.
If the difference > 60 or next record is a new drug then set start = saved start date and output.
Otherwise continue.
If you provide working code to create the input data set I will make a working example for you. Your real data must contain other variables beside the drug and dates, and these will be taken from the last record unless something is done to save these too.
Specify first consecutive pair where the gap between two visit date is >50.
ID VISIT_DATE
A 24-Jul-15
A 13-Aug-15
A 12-Sep-15
A 16-Dec-15
A 15-Mar-16
B 24-Jul-15
B 22-Oct-15
B 25-Jan-16
B 04-Feb-16
B 15-Mar-16
I guess we cannot use INTCK function so how gonna we calculate the time period and happy the condition.
Dates are counts of days, so the simple difference can be used.
Which of the observations do you want in your result?
I have merged, please don't double-post questions.
Hi @AmitParmar
I have tried to make a different code, you could use for experiments. It is split up in two parts:
The first data step performs a full analysis, and the next extracts relevant records based on the analysis.
data have;
informat Visit_Date anydtdte.;
format Visit_Date date.;
input ID$ Visit_Date;
cards;
A 24-Jul-15
A 13-Aug-15
A 12-Sep-15
A 16-Dec-15
A 15-Mar-16
B 24-Jul-15
B 22-Oct-15
B 25-Jan-16
B 04-Feb-16
B 15-Mar-16
;
run;
* Sort data to make sure input is ordered for further processing;
proc sort data=have;
by ID Visit_Date;
run;
* Perform af full analysis;
* Using LAG function to get previous date and Look-ahead SET statement to get nex date;
data temp (drop=Next_ID);
set have end=eof;
by ID;
format Prev_Visit_Date date.;
* Initiate break point variables;
Retain Break_Before Break_After;
if first.ID then call missing(Break_Before, Break_After);
* Calculate difference to previous record within same ID;
Prev_Visit_Date = lag(Visit_Date);
if not first.ID then Diff_To_Prev = (Visit_Date - Prev_Visit_Date);
else Diff_To_Prev = .;
* Get values from next record;
if not eof then set have ( firstobs=2 keep=Visit_Date ID rename=(ID = Next_ID Visit_Date=Next_Visit_Date));
* Calculate difference to next record within same ID;
if not last.ID then Diff_To_Next = (Next_Visit_Date - Visit_Date);
* Set Break Points;
if Diff_To_Prev > 50 then Break_Before + 1;
else Break_Before = .;
if Diff_To_Next > 50 then Break_After + 1;
else Break_After = .;
run;
* Get wanted data;
* This gives the first pair;
data want;
set temp;
keep ID Visit_Date;
if Break_Before = 1 or Break_After = 1;
run;
* Get wanted data;
* This gives all pairs, doubling records being in two pairs;
data want2;
set temp;
keep ID Visit_Date;
if Break_Before ne . then output;
if Break_After ne . then output;
run;
Thank you so much I appreciate your help, sir...
Hi Erik,
I have provided the dataset above please check and help me with the desired output.
Thanks
Hi @AmitParmar
Please check
data a;
length sdate edate$15.;
input DRUG$ SDATE $ EDATE $;
cards;
A 24-Sep-19 03-Dec-19
A 17-Jan-20 21-Feb-20
A 21-MAY-20 20-Jun-20
B 30-Jun-20 29-Aug-20
B 13-OCT-20 12-NOV-20
;
run;
* Convert the dates to SAS dates;
* It must be "real" SAS dates to perform date calculations;
* Internal SAS date value is the number of days since 01jan1960;
data have (drop = sdate edate);
set a;
format Date_Start Date_End date.;
Date_Start = input(sdate,anydtdte.);
Date_End = input(edate,anydtdte.);
run;
* Sort data;
* Date_End included in sort to get highest Date_End
in case of more than one record with same drug/Date_Start;
proc sort data=have;
by DRUG Date_Start Date_End;
run;
data want (drop=Date_Start_current Date_Start_Next Drug_Next);
set have (rename=(Date_Start=Date_Start_current)) end=eof;
by DRUG;
retain Date_Start;
format Date_Start date.;
if not eof then do;
* Get values from next record;
set have (firstobs=2 keep=Date_Start DRUG rename=(Date_Start=Date_Start_Next DRUG=Drug_Next));
end;
if first.DRUG then Date_Start = Date_Start_Current;
if Drug_Next ne DRUG or eof or (Date_Start_Next - Date_End) > 50 then do;
output;
Date_Start = Date_Start_Current;
Date_Start_Next = .;
end;
else do;
if Date_Start_Next = . then Date_Start_Next = Date_Start_Current;
end;
run;
if this helps:
Thank you so much for this explanation it's crystal clear.
Hi Sir, Can you please look this up for me
Specify first consecutive pair where the gap between two visit date is >50.
ID VISIT_DATE
A 24-Jul-15
A 13-Aug-15
A 12-Sep-15
A 16-Dec-15
A 15-Mar-16
B 24-Jul-15
B 22-Oct-15
B 25-Jan-16
B 04-Feb-16
B 15-Mar-16
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.