Dear
Hi Colleagues,
I am new to SAS (and to programming)! I am dealing with a dataset that contains patients (id) and medications (var=medicines) and their startdates and end_dates. Kindly note that the medicines are quite many (but won’t exceed 10) and are separated by a comma. I would like to select rows when a patient was exposed to atleast one medicine for the first time. The order of medicines should not matter since I am interested in retaining patients exposed to one or more medicines for the first time (once a patient is exposed to a single medicine that they haven’t been exposed before, I would like to retain that row).
Data I have
id | medicines | Start_date | End_date |
|
1 | abc,tdf,rtv,lpv | 01jan2012 | 02dec2012 |
|
1 | abc,tfd,drv | 03dec2012 | 03dec2014 |
|
1 | lpv,tdf | 04dec2014 | . |
|
2 | abc,tdf,rtv,lpv, idv | 01jan2012 | 02dec2012 |
|
2 | abc,tfd,drv, | 03dec2012 | 03dec2014 |
|
2 | tdf,abc ,rtv,lpv | 04dec2014 | 06dec2016 |
|
2 | Idv,abc | 07dec2017 | 11march2019 |
|
2 | idv | 12march 2019 | . |
|
Data I want;
id | medicines | Start_date | End_date |
|
1 | abc,tdf,rtv,lpv | 01jan2012 | 02dec2012 |
|
1 | Abc,tfd,drv | 03dec2012 | 03dec2014 |
|
2 | abc,tdf,rtv,lpv, idv | 01jan2012 | 02dec2012 |
|
2 | abc,tfd,drv | 03dec2012 | 03dec2014 |
|
Will someone hack the code for me please?
Thanks a bunch.
Dathan Byonanebye
Hi @DathanMD
Here is an attempt to do this:
data vital;
infile datalines dlm="09"x dsd missover;
input id medicines:$100. Start_date:date9. End_date:date9.;
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv 01jan2012 02dec2012
1 abc,tfd,drv 03dec2012 03dec2014
1 lpv,tdf 04dec2014 .
2 abc,tdf,rtv,lpv, idv 01jan2012 02dec2012
2 abc,tfd,drv, 03dec2012 03dec2014
2 tdf,abc ,rtv,lpv 04dec2014 06dec2016
2 Idv,abc 07dec2017 11mar2019
2 idv 12march2019 .
;
run;
data vital_exp;
set vital;
count + 1;
do i=1 to countw(medicines);
medicine_unit = scan(upcase(medicines),i);
output;
end;
drop i;
run;
proc sort data=vital_exp;
by id medicine_unit Start_date;
run;
data vital_select;
set vital_exp;
by id medicine_unit Start_date;
medicines=propcase(medicines);
if first.medicine_unit then output;
run;
proc sort data=vital_select out=want(drop=count medicine_unit) nodupkey;
by id medicines count;
run;
You can do something like this..
data have;
input id medicines $ 3-22 (Start_date End_date) (:date9.);
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv 01jan2012 02dec2012
1 abc,tfd,drv 03dec2012 03dec2014
1 lpv,tdf 04dec2014 .
2 abc,tdf,rtv,lpv,idv 01jan2012 02dec2012
2 abc,tfd,drv 03dec2012 03dec2014
2 tdf,abc,rtv,lpv 04dec2014 06dec2016
2 idv,abc 07dec2017 11mar2019
2 idv 12mar2019 .
;
data want(drop=m i flag);
declare hash h ();
h.definekey ('id', 'm');
h.definedone();
do until (lr);
set have end=lr;
flag=0;
do i = 1 to countw(medicines, ',');
m = scan(medicines, i, ',');
if h.check() ne 0 then flag=1;
h.ref();
end;
if flag then output;
end;
run;
Result:
id medicines Start_date End_date 1 abc,tdf,rtv,lpv 01JAN2012 02DEC2012 1 abc,tfd,drv 03DEC2012 03DEC2014 2 abc,tdf,rtv,lpv,idv 01JAN2012 02DEC2012 2 abc,tfd,drv 03DEC2012 03DEC2014
Nice use of hash table! 🙂
Small additional thing, if I may.
I would add Strip() function to
m = STRIP(scan(medicines, i, ','));
When I was playing your code I've noticed that if there is space between comma and drug name SAS is keeping that space, e.g.
data have;
input id medicines $ 3-22 (Start_date End_date) (:date9.);
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv 01jan2012 02dec2012
1 abc,tfd,drv 03dec2012 03dec2014
1 lpv,tdf 04dec2014 .
2 abc,tdf,rtv,lpv,idv 01jan2012 02dec2012
2 abc,tfd,drv 03dec2012 03dec2014
2 tdf,abc,rtv,lpv 04dec2014 06dec2016
2 idv,abc 07dec2017 11mar2019
2 idv, xxx 12mar2019 .
2 xxx . .
;
run;
last two rows.
All the best
Bart
@yabwon thanks 🙂 And yes, I agree completely.
Hi @DathanMD
Here is an attempt to do this:
data vital;
infile datalines dlm="09"x dsd missover;
input id medicines:$100. Start_date:date9. End_date:date9.;
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv 01jan2012 02dec2012
1 abc,tfd,drv 03dec2012 03dec2014
1 lpv,tdf 04dec2014 .
2 abc,tdf,rtv,lpv, idv 01jan2012 02dec2012
2 abc,tfd,drv, 03dec2012 03dec2014
2 tdf,abc ,rtv,lpv 04dec2014 06dec2016
2 Idv,abc 07dec2017 11mar2019
2 idv 12march2019 .
;
run;
data vital_exp;
set vital;
count + 1;
do i=1 to countw(medicines);
medicine_unit = scan(upcase(medicines),i);
output;
end;
drop i;
run;
proc sort data=vital_exp;
by id medicine_unit Start_date;
run;
data vital_select;
set vital_exp;
by id medicine_unit Start_date;
medicines=propcase(medicines);
if first.medicine_unit then output;
run;
proc sort data=vital_select out=want(drop=count medicine_unit) nodupkey;
by id medicines count;
run;
Likes this?
data WANT;
set HAVE;
length OLDMEDS NEWMEDS MED $20;
retain OLDMEDS;
keep ID MEDS START_DATE END_DATE NEWMEDS;
by ID;
if first.ID then call missing(OLDMEDS);
do MEDNO=1 to countw(MEDS);
MED=scan(MEDS,MEDNO);
if ^find(OLDMEDS,MED,'t') then do;
OLDMEDS=catx(',',OLDMEDS,MED);
NEWMEDS=catx(',',NEWMEDS,MED);
end;
end;
if NEWMEDS ne ' ' then output;
run;
ID | MEDS | Start_date | End_date | NEWMEDS |
---|---|---|---|---|
1 | abc,tdf,rtv,lpv | 01JAN2012 | 02DEC2012 | abc,tdf,rtv,lpv |
1 | abc,tfd,drv | 03DEC2012 | 03DEC2014 | tfd,drv |
2 | abc,tdf,rtv,lpv,idv | 01JAN2012 | 02DEC2012 | abc,tdf,rtv,lpv,idv |
2 | abc,tfd,drv | 03DEC2012 | 03DEC2014 | tfd,drv |
Really like tracking the new meds, as they are the core feature of the processing.
You might want to make OLDMEDS longer to accomodate the new meds it is accumulating.
length OLDMEDS $500 NEWMEDS MED $20;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.