BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
DathanMD
Obsidian | Level 7

Hi there,

I am a junior sas user and I would like your help to crack some code! I have a dataset of antibiotic use by patients <pt_id> over time in a long data format. The antibiotics (<treatment>) have start and end dates (table “have”).

Table have

pt_id

start_date

end_date

treatment

1

01JAN2019

15MAR2019

cipro,flagy,amoxicillin

1

16MAR2019

15APRIL2019

cipro,flagy,amoxicillin,midazolam

1

17DEC2019

18DEC2019

cipro,flagy,amoxicillin,penicillin

1

22FEB2020

26FEB2020

cipro,flagy,

2

01JAN2018

15MAR2018

cipro,amoxicillin

2

16MAR2018

15APR2018

flagy,amoxicillin

2

17DEC2018

18DEC2018

cipro,flagy,amoxicillin,pethidine

2

22FEB2019

26FEB2019

cipro,flagy,amoxicillin,clopidogrel

2

23JUL2020

29JUL2020

cipro,flagy,amoxicillin,warfarin

2

01SEPT2021

05SEPT2021

cipro,flagy,amoxicillin,tazo

2

06SEPT2021

16SEPT2021

cipro,flagy,amoxicillin,piperacin

3

05DEC2019

18DEC2019

cipro,flagy,amoxicillin

3

22FEB2021

26FEB2021

cipro,flagy,amoxicillin

3

01JAN2022

15MAR2022

cipro,flagy,amoxicillin

 

 

 

 

 

 

 

 

 

 

 

I would like to determine if a participant received a specific antibiotic (in this case Flagyl and amoxicillin within six months prior to the start of a given treatment (so the prior end date for that antibiotic falls within six months from the start of a new treatment combination).  So, table “want” creates new variables, <expo_flagyl_6m> and <expo_amoxacillin_6m> for prior exposure to flagyl and amoxicillin, respectively, within six months

Data want

pt_id

start_date

end_date

treatment

Expo_flagyl_6m

Expo_amoxacillin_6m

1

01JAN2019

15MAR2019

cipro,flagy,amoxicillin

0

0

1

16MAR2019

15APRIL2019

cipro,flagy,amoxicillin,midazolam

1

1

1

17DEC2019

18DEC2019

cipro,flagy,amoxicillin,penicillin

0

0

1

22FEB2020

26FEB2020

cipro,flagy,

1

1

2

01JAN2018

15MAR2018

cipro,amoxicillin

0

0

2

16MAR2018

15APR2018

flagy,amoxicillin

0

1

2

17DEC2018

18DEC2018

cipro,flagy,amoxicillin,pethidine

0

0

2

22FEB2019

26FEB2019

cipro,flagy,amoxicillin,clopidogrel

1

1

2

23JUL2020

29JUL2020

cipro,flagy,amoxicillin,warfarin

0

0

2

01SEPT2021

05SEPT2021

cipro,flagy,amoxicillin,tazo

0

0

2

06SEPT2021

16SEPT2021

cipro,flagy,amoxicillin,piperacin

1

1

3

05DEC2019

18DEC2019

cipro,flagy,amoxicillin

0

0

3

22FEB2021

26FEB2021

cipro,flagy,amoxicillin

0

0

3

01JAN2022

15MAR2022

cipro,flagy,amoxicillin

0

0

 

Is it possible to suggest a macro so that I can run other columns for exposure to other antibiotics?  Thank you so much for your help.

 

Dathan Byonanebye

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You apparently want to set a flag for a given antibiotic if the preceding end_date for that given PT_ID and antibiotic is less than six months prior to the current start_date.

 

Given that data is sorted by PT_ID/START_DATE, and there are no overlapping data ranges, you can:

 

data want (drop=_:) ;
  set have;
  by pt_id;
  retain _mr_flagy    _mr_amox       ;   /*Most Recent flagy, amoxicillin END_DATES*/

  _cutoff_date=intnx('month',start_date,-6,'s');
  if first.pt_id then call missing(of _mr_:);

  fl_dummy=(first.pt_id=0 and _mr_flagy>=_cutoff_date);
  am_dummy=(first.pt_id=0 and _mr_amox>=_cutoff_date);

  if findw(treatment,'flagy',',') then _mr_flagy=end_date;
  if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
run;

 

Edited note: If you have several antibiotics to check for, you might want to use arrays for the most-recent dates, the list of antibiotic names, and the set of dummy variables.  It will make for a much smaller program, and much less risk of typing errors.  Below just uses arrays of size 2:

 

data want (drop=i _:) ;
  set have;
  by pt_id;

  array _mr_date{2}    _temporary_ ;
  array ablist{2} $11  _temporary_ ('flagy'     ,'amoxicillin'    );
  array ab_dummy{2}                  flagy_dummy  amoxicillin_dummy ;

  _cutoff_date=intnx('month',start_date,-6,'s');
  if first.pt_id then call missing(of _mr_date{*});
 

  do i=1 to 2;
    ab_dummy{i}=(first.pt_id=0 and _mr_date{i}>=_cutoff_date);
    if findw(treatment,trim(ablist{i}),',') then _mr_date{i}=end_date;
  end;
run;

Just be sure that the dummy variable assignment statement PRECEDES the updating of the most-recent-dates for the corresponding antibiotic.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
andreas_lds
Jade | Level 19

Please post data in usable form: a data step using datalines.

Before developing a macro write code without any macro statements, so that you have working code when trying to generalize it.

 

 

Ksharp
Super User
data have;
input pt_id start_date : date9. end_date : date9.  treatment $80.;
format start_date end_date : date9.;
cards;
1
01JAN2019
15MAR2019
cipro,flagy,amoxicillin
1
16MAR2019
15APR2019
cipro,flagy,amoxicillin,midazolam
1
17DEC2019
18DEC2019
cipro,flagy,amoxicillin,penicillin
1
22FEB2020
26FEB2020
cipro,flagy,
2
01JAN2018
15MAR2018
cipro,amoxicillin
2
16MAR2018
15APR2018
flagy,amoxicillin
2
17DEC2018
18DEC2018
cipro,flagy,amoxicillin,pethidine
2
22FEB2019
26FEB2019
cipro,flagy,amoxicillin,clopidogrel
2
23JUL2020
29JUL2020
cipro,flagy,amoxicillin,warfarin
2
01SEP2021
05SEP2021
cipro,flagy,amoxicillin,tazo
2
06SEP2021
16SEP2021
cipro,flagy,amoxicillin,piperacin
3
05DEC2019
18DEC2019
cipro,flagy,amoxicillin
3
22FEB2021
26FEB2021
cipro,flagy,amoxicillin
3
01JAN2022
15MAR2022
cipro,flagy,amoxicillin
;

data flagy amoxicillin ;
set have;
do i=1 to countw(treatment,',');
 name=scan(treatment,i,',');
 do date=start_date to end_date;
   if name='flagy' then  output flagy;
   if name='amoxicillin' then  output amoxicillin;
 end;
end;
keep date pt_id ;
format date date9.;
run;

data want;
 if _n_=1 then do;
  declare hash h1(dataset:'flagy');
  h1.definekey('pt_id','date');
  h1.definedone();
  declare hash h2(dataset:'amoxicillin');
  h2.definekey('pt_id','date');
  h2.definedone();
 end;
set have;
Expo_flagyl_6m=0;
Expo_amoxacillin_6m=0;
do date=intnx('month',start_date,-6,'s') to start_date-1;
 if h1.check()=0 then Expo_flagyl_6m=1;
 if h2.check()=0 then Expo_amoxacillin_6m=1;
end;
drop date;
run;
DathanMD
Obsidian | Level 7

Thanks Ksharp,

I think your solution works best for a small dataset, I realise it involves splitting dates into days and this is creating a massive dataset and  "memory failure". I have about 50,000 patients with over 20 years of follow-up data

 

Dathan

 

Ksharp
Super User
You could split the big table into 100 small tables according to patient's ID and running my code 100 times . All these could be done by making a macro .
mkeintz
PROC Star

You apparently want to set a flag for a given antibiotic if the preceding end_date for that given PT_ID and antibiotic is less than six months prior to the current start_date.

 

Given that data is sorted by PT_ID/START_DATE, and there are no overlapping data ranges, you can:

 

data want (drop=_:) ;
  set have;
  by pt_id;
  retain _mr_flagy    _mr_amox       ;   /*Most Recent flagy, amoxicillin END_DATES*/

  _cutoff_date=intnx('month',start_date,-6,'s');
  if first.pt_id then call missing(of _mr_:);

  fl_dummy=(first.pt_id=0 and _mr_flagy>=_cutoff_date);
  am_dummy=(first.pt_id=0 and _mr_amox>=_cutoff_date);

  if findw(treatment,'flagy',',') then _mr_flagy=end_date;
  if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
run;

 

Edited note: If you have several antibiotics to check for, you might want to use arrays for the most-recent dates, the list of antibiotic names, and the set of dummy variables.  It will make for a much smaller program, and much less risk of typing errors.  Below just uses arrays of size 2:

 

data want (drop=i _:) ;
  set have;
  by pt_id;

  array _mr_date{2}    _temporary_ ;
  array ablist{2} $11  _temporary_ ('flagy'     ,'amoxicillin'    );
  array ab_dummy{2}                  flagy_dummy  amoxicillin_dummy ;

  _cutoff_date=intnx('month',start_date,-6,'s');
  if first.pt_id then call missing(of _mr_date{*});
 

  do i=1 to 2;
    ab_dummy{i}=(first.pt_id=0 and _mr_date{i}>=_cutoff_date);
    if findw(treatment,trim(ablist{i}),',') then _mr_date{i}=end_date;
  end;
run;

Just be sure that the dummy variable assignment statement PRECEDES the updating of the most-recent-dates for the corresponding antibiotic.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DathanMD
Obsidian | Level 7

Thanks Mkeintz,

I would like to set a flag for a given antibiotic if the preceding end_date(s) for that given PT_ID and antibiotic is less than six months prior to the current start_date. So I am not only scanning the recent preceding entry but also any prior treatments whose end dates fall within the given pt_id and treatment.

 

It looks like your code looks out for the antibiotic in the most recent treatment (if within six months).

 

Thanks 

 

Dathan

 

mkeintz
PROC Star

@DathanMD wrote:

....

 

It looks like your code looks out for the antibiotic in the most recent treatment (if within six months).

Not true.  Look at the statements:

  if findw(treatment,'flagy',',') then _mr_flagy=end_date;
  if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;

They update the most recent date only if the incoming record has that particular drug.  So, for example,  a FLAGY has end_date=31mar2021, and is followed by a two month range (01apr2021-31may2021) without FLAGY, the 31mar2021 will be retained.  Given that (1) your data are sorted by PT_ID/START_DATE, and (2) you do not have overlapping date ranges, that means that your will always have available the most recent previous END_DATE for each drug, even if it is not the immediately preceding record.

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 657 views
  • 1 like
  • 4 in conversation