BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DathanMD
Obsidian | Level 7

Dear

Hi Colleagues,

I am new to SAS (and to programming)! I am dealing with a dataset that contains patients (id) and medications (var=medicines) and their startdates and end_dates. Kindly note that the medicines are quite many (but won’t exceed 10) and are separated by a comma. I would like to select rows when a patient was exposed to atleast one medicine for the first time.  The order of medicines should not matter since I am interested in retaining patients exposed to one or more medicines for the first time (once a patient is exposed to a single medicine that they haven’t been exposed before, I would like to retain that row).

Data I have

id

medicines

Start_date

End_date

 

1

abc,tdf,rtv,lpv

01jan2012

02dec2012

 

1

abc,tfd,drv

03dec2012

03dec2014

 

1

lpv,tdf

04dec2014

.

 

2

abc,tdf,rtv,lpv, idv

01jan2012

02dec2012

 

2

abc,tfd,drv,

03dec2012

03dec2014

 

2

tdf,abc ,rtv,lpv

04dec2014

06dec2016

 

2

Idv,abc

07dec2017

11march2019

 

2

idv

12march 2019

.

 

 

Data I want;

id

medicines

Start_date

End_date

 

1

abc,tdf,rtv,lpv

01jan2012

02dec2012

 

1

Abc,tfd,drv

03dec2012

03dec2014

 

2

abc,tdf,rtv,lpv, idv

01jan2012

02dec2012

 

2

abc,tfd,drv

03dec2012

03dec2014

 

 

Will someone hack the code for me please?

Thanks a bunch.

Dathan Byonanebye

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @DathanMD 

Here is an attempt to do this:

data vital;
	infile datalines dlm="09"x dsd missover;
	input id medicines:$100. Start_date:date9. End_date:date9.;
	format Start_date End_date date9.;
	datalines;
1	abc,tdf,rtv,lpv	01jan2012	02dec2012
1	abc,tfd,drv	03dec2012	03dec2014 
1	lpv,tdf	04dec2014	.
2	abc,tdf,rtv,lpv, idv	01jan2012	02dec2012
2	abc,tfd,drv,	03dec2012	03dec2014
2	tdf,abc ,rtv,lpv	04dec2014	06dec2016 
2	Idv,abc	07dec2017	11mar2019
2	idv	12march2019	.
;
run;

data vital_exp;
	set vital;
	count + 1;
	do i=1 to countw(medicines);
		medicine_unit = scan(upcase(medicines),i);
		output;
	end;
	drop i;
run;

proc sort data=vital_exp;
	by id medicine_unit Start_date;
run;

data vital_select;
	set vital_exp;
	by id medicine_unit Start_date;
	medicines=propcase(medicines);
	if first.medicine_unit then output;
run;

proc sort data=vital_select out=want(drop=count medicine_unit) nodupkey;
	by id medicines count;
run;

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

You can do something like this..

 

data have;
input id medicines $ 3-22 (Start_date End_date) (:date9.);
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv     01jan2012 02dec2012
1 abc,tfd,drv         03dec2012 03dec2014
1 lpv,tdf             04dec2014 .        
2 abc,tdf,rtv,lpv,idv 01jan2012 02dec2012
2 abc,tfd,drv         03dec2012 03dec2014
2 tdf,abc,rtv,lpv     04dec2014 06dec2016
2 idv,abc             07dec2017 11mar2019
2 idv                 12mar2019 .        
;

data want(drop=m i flag);
    declare hash h ();
    h.definekey ('id', 'm');
    h.definedone();

    do until (lr);
        set have end=lr;
        flag=0;
        do i = 1 to countw(medicines, ',');
           m = scan(medicines, i, ',');
           if h.check() ne 0 then flag=1;
           h.ref();
        end;
        if flag then output;
    end;
run;

 

Result:

 

id  medicines            Start_date  End_date 
1   abc,tdf,rtv,lpv      01JAN2012   02DEC2012 
1   abc,tfd,drv          03DEC2012   03DEC2014 
2   abc,tdf,rtv,lpv,idv  01JAN2012   02DEC2012 
2   abc,tfd,drv          03DEC2012   03DEC2014 
yabwon
Onyx | Level 15

Nice use of hash table! 🙂

 

Small additional thing, if I may.

I would add Strip() function to 

m = STRIP(scan(medicines, i, ','));

When I was playing your code I've noticed that if there is space between comma and drug name SAS is keeping that space, e.g.

data have;
input id medicines $ 3-22 (Start_date End_date) (:date9.);
format Start_date End_date date9.;
datalines;
1 abc,tdf,rtv,lpv     01jan2012 02dec2012
1 abc,tfd,drv         03dec2012 03dec2014
1 lpv,tdf             04dec2014 .        
2 abc,tdf,rtv,lpv,idv 01jan2012 02dec2012
2 abc,tfd,drv         03dec2012 03dec2014
2 tdf,abc,rtv,lpv     04dec2014 06dec2016
2 idv,abc             07dec2017 11mar2019
2 idv, xxx            12mar2019 .        
2 xxx                 .         .
;
run;

last two rows.

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



PeterClemmensen
Tourmaline | Level 20

@yabwon thanks 🙂 And yes, I agree completely.

ed_sas_member
Meteorite | Level 14

Hi @DathanMD 

Here is an attempt to do this:

data vital;
	infile datalines dlm="09"x dsd missover;
	input id medicines:$100. Start_date:date9. End_date:date9.;
	format Start_date End_date date9.;
	datalines;
1	abc,tdf,rtv,lpv	01jan2012	02dec2012
1	abc,tfd,drv	03dec2012	03dec2014 
1	lpv,tdf	04dec2014	.
2	abc,tdf,rtv,lpv, idv	01jan2012	02dec2012
2	abc,tfd,drv,	03dec2012	03dec2014
2	tdf,abc ,rtv,lpv	04dec2014	06dec2016 
2	Idv,abc	07dec2017	11mar2019
2	idv	12march2019	.
;
run;

data vital_exp;
	set vital;
	count + 1;
	do i=1 to countw(medicines);
		medicine_unit = scan(upcase(medicines),i);
		output;
	end;
	drop i;
run;

proc sort data=vital_exp;
	by id medicine_unit Start_date;
run;

data vital_select;
	set vital_exp;
	by id medicine_unit Start_date;
	medicines=propcase(medicines);
	if first.medicine_unit then output;
run;

proc sort data=vital_select out=want(drop=count medicine_unit) nodupkey;
	by id medicines count;
run;
ChrisNZ
Tourmaline | Level 20

Likes this?

data WANT;
 set HAVE;
 length OLDMEDS NEWMEDS MED $20;
 retain OLDMEDS;
 keep ID MEDS START_DATE END_DATE NEWMEDS;
 by ID;
 if first.ID then call missing(OLDMEDS);
 do MEDNO=1 to countw(MEDS);
   MED=scan(MEDS,MEDNO);
   if ^find(OLDMEDS,MED,'t') then do;
     OLDMEDS=catx(',',OLDMEDS,MED);
     NEWMEDS=catx(',',NEWMEDS,MED);
   end;
 end;
 if NEWMEDS ne ' ' then output;
run; 
ID MEDS Start_date End_date NEWMEDS
1 abc,tdf,rtv,lpv 01JAN2012 02DEC2012 abc,tdf,rtv,lpv
1 abc,tfd,drv 03DEC2012 03DEC2014 tfd,drv
2 abc,tdf,rtv,lpv,idv 01JAN2012 02DEC2012 abc,tdf,rtv,lpv,idv
2 abc,tfd,drv 03DEC2012 03DEC2014 tfd,drv

 

 

RichardDeVen
Barite | Level 11

Really like tracking the new meds, as they are the core feature of the processing.

 

You might want to make OLDMEDS longer to accomodate the new meds it is accumulating.

 

 length OLDMEDS $500 NEWMEDS MED $20;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1017 views
  • 5 likes
  • 6 in conversation