Hi ,
I have data that looks like -
ID | Drug | Episode | start_date | end_date |
1 | A | 1 | 1-Jan | 30-Mar |
1 | A | 2 | 1-May | 3-Jul |
1 | A | 3 | 28-Sep | 28-Oct |
1 | A | 4 | 1-Nov | 30-Nov |
1 | B | 1 | 1-Apr | 10-May |
1 | B | 2 | 2-Nov | 28-Dec |
1 | C | 1 | 1-Jul | 2-Aug |
data abc;
input ID $ drug $ episode start_date date9. end_date date9.;
format start_date end_date date9.;
informat start_date end_date date9.;
datalines ;
1 A 1 01Jan2012 30Mar2012
1 A 2 01May2012 03Jul2012
1 A 3 28Sep2012 28Oct2012
1 A 4 01Nov2012 30Dec2012
1 B 1 01Apr2012 10May2012
1 B 2 02Nov2012 28Dec2012
1 B 3 01Jan2012 30Mar2012
1 C 1 01Jul2012 02Aug2012
;
run;
Here we have subjects and the the drugs they take. A new episode of one drug means that the person discontinued.
If the start date ( start date of 1st episode) of second drug consumed , lies in between the episodes of first drug , then we will ignore all the further episodes of 1st drug.
Eg. here 1 april ( start date of drug B) lies after the first episode of drug A, so episode 2,3,4 of drug A would be deleted.
Similarly the start date for drug C lies after the end date of episode 1 for drug B then episode 2 of drug B would be deleted.
The maximum number of episodes a subject can have is 15.
The resultant dataset should look like -
ID | Drug | Episode | start_date | end_date |
1 | A | 1 | 1-Jan | 30-Mar |
1 | B | 1 | 1-Apr | 10-May |
1 | C | 1 | 1-Jul | 2-Aug |
Can you please post what you expect as result? And please post data in usable form, especially the date-variables are important, because a date is not a date if the year is missing.
Please post example data in a WORKING data step with datalines, and make sure you have usable data. Dates without years are useless.
Shouldn't we have two observations for drug B as the last observation of B ends (30mar2012) before the start of C?
Anyway, this is what I come up with:
data abc;
input ID $ drug $ episode start_date :date9. end_date :date9.;
format start_date end_date date9.;
informat start_date end_date date9.;
datalines ;
1 A 1 01Jan2012 30Mar2012
1 A 2 01May2012 03Jul2012
1 A 3 28Sep2012 28Oct2012
1 A 4 01Nov2012 30Dec2012
1 B 1 01Apr2012 10May2012
1 B 2 02Nov2012 28Dec2012
1 B 3 01Jan2012 30Mar2012
1 C 1 01Jul2012 02Aug2012
;
data firsts (keep=id count start_date rename=(start_date=first_date));
set abc;
retain count;
by id drug;
if first.id then count = 0;
if first.drug;
count + 1;
run;
data want;
set abc;
by id drug;
retain count;
if _n_ = 1
then do;
declare hash f (dataset:"firsts");
f.definekey("id","count");
f.definedata("first_date");
f.definedone();
call missing(first_date);
end;
if first.id then count = 1;
if first.drug then count + 1;
rc = f.find();
put id= drug= count= rc= start_date first_date;
if rc = 0 and start_date > first_date then delete;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.