- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear
Hi Colleagues,
I am new to SAS (and to programming)! I am dealing with a dataset that contains patients (id) and medications (var=medicines) and their startdates and end_dates. Kindly note that the medicines are quite many (but won’t exceed 10) and are separated by a comma. I would like to select rows when a patient was exposed to atleast one medicine for the first time. The order of medicines should not matter since I am interested in retaining patients exposed to one or more medicines for the first time (once a patient is exposed to a single medicine that they haven’t been exposed before, I would like to retain that row).
Data I have
id | medicines | Start_date | End_date |
|
1 | abc,tdf,rtv,lpv | 01jan2012 | 02dec2012 |
|
1 | abc,tfd,drv | 03dec2012 | 03dec2014 |
|
1 | lpv,tdf | 04dec2014 | . |
|
2 | abc,tdf,rtv,lpv, idv | 01jan2012 | 02dec2012 |
|
2 | abc,tfd,drv, | 03dec2012 | 03dec2014 |
|
2 | tdf,abc ,rtv,lpv | 04dec2014 | 06dec2016 |
|
2 | Idv,abc | 07dec2017 | 11march2019 |
|
2 | idv | 12march 2019 | . |
|
Data I want;
id | medicines | Start_date | End_date |
|
1 | abc,tdf,rtv,lpv | 01jan2012 | 02dec2012 |
|
1 | Abc,tfd,drv | 03dec2012 | 03dec2014 |
|
2 | abc,tdf,rtv,lpv, idv | 01jan2012 | 02dec2012 |
|
2 | abc,tfd,drv | 03dec2012 | 03dec2014 |
|
Will someone hack the code for me please?
Thanks a bunch.
Dathan Byonanebye
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DathanMD
Here is an attempt to do this:
data vital;
infile datalines dlm="09"x dsd missover;
input id medicines:$100. Start_date:date9. End_date:date9.;
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv 01jan2012 02dec2012
1 abc,tfd,drv 03dec2012 03dec2014
1 lpv,tdf 04dec2014 .
2 abc,tdf,rtv,lpv, idv 01jan2012 02dec2012
2 abc,tfd,drv, 03dec2012 03dec2014
2 tdf,abc ,rtv,lpv 04dec2014 06dec2016
2 Idv,abc 07dec2017 11mar2019
2 idv 12march2019 .
;
run;
data vital_exp;
set vital;
count + 1;
do i=1 to countw(medicines);
medicine_unit = scan(upcase(medicines),i);
output;
end;
drop i;
run;
proc sort data=vital_exp;
by id medicine_unit Start_date;
run;
data vital_select;
set vital_exp;
by id medicine_unit Start_date;
medicines=propcase(medicines);
if first.medicine_unit then output;
run;
proc sort data=vital_select out=want(drop=count medicine_unit) nodupkey;
by id medicines count;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can do something like this..
data have;
input id medicines $ 3-22 (Start_date End_date) (:date9.);
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv 01jan2012 02dec2012
1 abc,tfd,drv 03dec2012 03dec2014
1 lpv,tdf 04dec2014 .
2 abc,tdf,rtv,lpv,idv 01jan2012 02dec2012
2 abc,tfd,drv 03dec2012 03dec2014
2 tdf,abc,rtv,lpv 04dec2014 06dec2016
2 idv,abc 07dec2017 11mar2019
2 idv 12mar2019 .
;
data want(drop=m i flag);
declare hash h ();
h.definekey ('id', 'm');
h.definedone();
do until (lr);
set have end=lr;
flag=0;
do i = 1 to countw(medicines, ',');
m = scan(medicines, i, ',');
if h.check() ne 0 then flag=1;
h.ref();
end;
if flag then output;
end;
run;
Result:
id medicines Start_date End_date 1 abc,tdf,rtv,lpv 01JAN2012 02DEC2012 1 abc,tfd,drv 03DEC2012 03DEC2014 2 abc,tdf,rtv,lpv,idv 01JAN2012 02DEC2012 2 abc,tfd,drv 03DEC2012 03DEC2014
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Nice use of hash table! 🙂
Small additional thing, if I may.
I would add Strip() function to
m = STRIP(scan(medicines, i, ','));
When I was playing your code I've noticed that if there is space between comma and drug name SAS is keeping that space, e.g.
data have;
input id medicines $ 3-22 (Start_date End_date) (:date9.);
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv 01jan2012 02dec2012
1 abc,tfd,drv 03dec2012 03dec2014
1 lpv,tdf 04dec2014 .
2 abc,tdf,rtv,lpv,idv 01jan2012 02dec2012
2 abc,tfd,drv 03dec2012 03dec2014
2 tdf,abc,rtv,lpv 04dec2014 06dec2016
2 idv,abc 07dec2017 11mar2019
2 idv, xxx 12mar2019 .
2 xxx . .
;
run;
last two rows.
All the best
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@yabwon thanks 🙂 And yes, I agree completely.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DathanMD
Here is an attempt to do this:
data vital;
infile datalines dlm="09"x dsd missover;
input id medicines:$100. Start_date:date9. End_date:date9.;
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv 01jan2012 02dec2012
1 abc,tfd,drv 03dec2012 03dec2014
1 lpv,tdf 04dec2014 .
2 abc,tdf,rtv,lpv, idv 01jan2012 02dec2012
2 abc,tfd,drv, 03dec2012 03dec2014
2 tdf,abc ,rtv,lpv 04dec2014 06dec2016
2 Idv,abc 07dec2017 11mar2019
2 idv 12march2019 .
;
run;
data vital_exp;
set vital;
count + 1;
do i=1 to countw(medicines);
medicine_unit = scan(upcase(medicines),i);
output;
end;
drop i;
run;
proc sort data=vital_exp;
by id medicine_unit Start_date;
run;
data vital_select;
set vital_exp;
by id medicine_unit Start_date;
medicines=propcase(medicines);
if first.medicine_unit then output;
run;
proc sort data=vital_select out=want(drop=count medicine_unit) nodupkey;
by id medicines count;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Likes this?
data WANT;
set HAVE;
length OLDMEDS NEWMEDS MED $20;
retain OLDMEDS;
keep ID MEDS START_DATE END_DATE NEWMEDS;
by ID;
if first.ID then call missing(OLDMEDS);
do MEDNO=1 to countw(MEDS);
MED=scan(MEDS,MEDNO);
if ^find(OLDMEDS,MED,'t') then do;
OLDMEDS=catx(',',OLDMEDS,MED);
NEWMEDS=catx(',',NEWMEDS,MED);
end;
end;
if NEWMEDS ne ' ' then output;
run;
ID | MEDS | Start_date | End_date | NEWMEDS |
---|---|---|---|---|
1 | abc,tdf,rtv,lpv | 01JAN2012 | 02DEC2012 | abc,tdf,rtv,lpv |
1 | abc,tfd,drv | 03DEC2012 | 03DEC2014 | tfd,drv |
2 | abc,tdf,rtv,lpv,idv | 01JAN2012 | 02DEC2012 | abc,tdf,rtv,lpv,idv |
2 | abc,tfd,drv | 03DEC2012 | 03DEC2014 | tfd,drv |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Really like tracking the new meds, as they are the core feature of the processing.
You might want to make OLDMEDS longer to accomodate the new meds it is accumulating.
length OLDMEDS $500 NEWMEDS MED $20;