BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
DathanMD
Obsidian | Level 7

Hi there,

I am a junior sas user and I would like your help to crack some code! I have a dataset of antibiotic use by patients <pt_id> over time in a long data format. The antibiotics (<treatment>) have start and end dates (table “have”).

Table have

pt_id

start_date

end_date

treatment

1

01JAN2019

15MAR2019

cipro,flagy,amoxicillin

1

16MAR2019

15APRIL2019

cipro,flagy,amoxicillin,midazolam

1

17DEC2019

18DEC2019

cipro,flagy,amoxicillin,penicillin

1

22FEB2020

26FEB2020

cipro,flagy,

2

01JAN2018

15MAR2018

cipro,amoxicillin

2

16MAR2018

15APR2018

flagy,amoxicillin

2

17DEC2018

18DEC2018

cipro,flagy,amoxicillin,pethidine

2

22FEB2019

26FEB2019

cipro,flagy,amoxicillin,clopidogrel

2

23JUL2020

29JUL2020

cipro,flagy,amoxicillin,warfarin

2

01SEPT2021

05SEPT2021

cipro,flagy,amoxicillin,tazo

2

06SEPT2021

16SEPT2021

cipro,flagy,amoxicillin,piperacin

3

05DEC2019

18DEC2019

cipro,flagy,amoxicillin

3

22FEB2021

26FEB2021

cipro,flagy,amoxicillin

3

01JAN2022

15MAR2022

cipro,flagy,amoxicillin

 

 

 

 

 

 

 

 

 

 

 

I would like to determine if a participant received a specific antibiotic (in this case Flagyl and amoxicillin within six months prior to the start of a given treatment (so the prior end date for that antibiotic falls within six months from the start of a new treatment combination).  So, table “want” creates new variables, <expo_flagyl_6m> and <expo_amoxacillin_6m> for prior exposure to flagyl and amoxicillin, respectively, within six months

Data want

pt_id

start_date

end_date

treatment

Expo_flagyl_6m

Expo_amoxacillin_6m

1

01JAN2019

15MAR2019

cipro,flagy,amoxicillin

0

0

1

16MAR2019

15APRIL2019

cipro,flagy,amoxicillin,midazolam

1

1

1

17DEC2019

18DEC2019

cipro,flagy,amoxicillin,penicillin

0

0

1

22FEB2020

26FEB2020

cipro,flagy,

1

1

2

01JAN2018

15MAR2018

cipro,amoxicillin

0

0

2

16MAR2018

15APR2018

flagy,amoxicillin

0

1

2

17DEC2018

18DEC2018

cipro,flagy,amoxicillin,pethidine

0

0

2

22FEB2019

26FEB2019

cipro,flagy,amoxicillin,clopidogrel

1

1

2

23JUL2020

29JUL2020

cipro,flagy,amoxicillin,warfarin

0

0

2

01SEPT2021

05SEPT2021

cipro,flagy,amoxicillin,tazo

0

0

2

06SEPT2021

16SEPT2021

cipro,flagy,amoxicillin,piperacin

1

1

3

05DEC2019

18DEC2019

cipro,flagy,amoxicillin

0

0

3

22FEB2021

26FEB2021

cipro,flagy,amoxicillin

0

0

3

01JAN2022

15MAR2022

cipro,flagy,amoxicillin

0

0

 

Is it possible to suggest a macro so that I can run other columns for exposure to other antibiotics?  Thank you so much for your help.

 

Dathan Byonanebye

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You apparently want to set a flag for a given antibiotic if the preceding end_date for that given PT_ID and antibiotic is less than six months prior to the current start_date.

 

Given that data is sorted by PT_ID/START_DATE, and there are no overlapping data ranges, you can:

 

data want (drop=_:) ;
  set have;
  by pt_id;
  retain _mr_flagy    _mr_amox       ;   /*Most Recent flagy, amoxicillin END_DATES*/

  _cutoff_date=intnx('month',start_date,-6,'s');
  if first.pt_id then call missing(of _mr_:);

  fl_dummy=(first.pt_id=0 and _mr_flagy>=_cutoff_date);
  am_dummy=(first.pt_id=0 and _mr_amox>=_cutoff_date);

  if findw(treatment,'flagy',',') then _mr_flagy=end_date;
  if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
run;

 

Edited note: If you have several antibiotics to check for, you might want to use arrays for the most-recent dates, the list of antibiotic names, and the set of dummy variables.  It will make for a much smaller program, and much less risk of typing errors.  Below just uses arrays of size 2:

 

data want (drop=i _:) ;
  set have;
  by pt_id;

  array _mr_date{2}    _temporary_ ;
  array ablist{2} $11  _temporary_ ('flagy'     ,'amoxicillin'    );
  array ab_dummy{2}                  flagy_dummy  amoxicillin_dummy ;

  _cutoff_date=intnx('month',start_date,-6,'s');
  if first.pt_id then call missing(of _mr_date{*});
 

  do i=1 to 2;
    ab_dummy{i}=(first.pt_id=0 and _mr_date{i}>=_cutoff_date);
    if findw(treatment,trim(ablist{i}),',') then _mr_date{i}=end_date;
  end;
run;

Just be sure that the dummy variable assignment statement PRECEDES the updating of the most-recent-dates for the corresponding antibiotic.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
andreas_lds
Jade | Level 19

Please post data in usable form: a data step using datalines.

Before developing a macro write code without any macro statements, so that you have working code when trying to generalize it.

 

 

Ksharp
Super User
data have;
input pt_id start_date : date9. end_date : date9.  treatment $80.;
format start_date end_date : date9.;
cards;
1
01JAN2019
15MAR2019
cipro,flagy,amoxicillin
1
16MAR2019
15APR2019
cipro,flagy,amoxicillin,midazolam
1
17DEC2019
18DEC2019
cipro,flagy,amoxicillin,penicillin
1
22FEB2020
26FEB2020
cipro,flagy,
2
01JAN2018
15MAR2018
cipro,amoxicillin
2
16MAR2018
15APR2018
flagy,amoxicillin
2
17DEC2018
18DEC2018
cipro,flagy,amoxicillin,pethidine
2
22FEB2019
26FEB2019
cipro,flagy,amoxicillin,clopidogrel
2
23JUL2020
29JUL2020
cipro,flagy,amoxicillin,warfarin
2
01SEP2021
05SEP2021
cipro,flagy,amoxicillin,tazo
2
06SEP2021
16SEP2021
cipro,flagy,amoxicillin,piperacin
3
05DEC2019
18DEC2019
cipro,flagy,amoxicillin
3
22FEB2021
26FEB2021
cipro,flagy,amoxicillin
3
01JAN2022
15MAR2022
cipro,flagy,amoxicillin
;

data flagy amoxicillin ;
set have;
do i=1 to countw(treatment,',');
 name=scan(treatment,i,',');
 do date=start_date to end_date;
   if name='flagy' then  output flagy;
   if name='amoxicillin' then  output amoxicillin;
 end;
end;
keep date pt_id ;
format date date9.;
run;

data want;
 if _n_=1 then do;
  declare hash h1(dataset:'flagy');
  h1.definekey('pt_id','date');
  h1.definedone();
  declare hash h2(dataset:'amoxicillin');
  h2.definekey('pt_id','date');
  h2.definedone();
 end;
set have;
Expo_flagyl_6m=0;
Expo_amoxacillin_6m=0;
do date=intnx('month',start_date,-6,'s') to start_date-1;
 if h1.check()=0 then Expo_flagyl_6m=1;
 if h2.check()=0 then Expo_amoxacillin_6m=1;
end;
drop date;
run;
DathanMD
Obsidian | Level 7

Thanks Ksharp,

I think your solution works best for a small dataset, I realise it involves splitting dates into days and this is creating a massive dataset and  "memory failure". I have about 50,000 patients with over 20 years of follow-up data

 

Dathan

 

Ksharp
Super User
You could split the big table into 100 small tables according to patient's ID and running my code 100 times . All these could be done by making a macro .
mkeintz
PROC Star

You apparently want to set a flag for a given antibiotic if the preceding end_date for that given PT_ID and antibiotic is less than six months prior to the current start_date.

 

Given that data is sorted by PT_ID/START_DATE, and there are no overlapping data ranges, you can:

 

data want (drop=_:) ;
  set have;
  by pt_id;
  retain _mr_flagy    _mr_amox       ;   /*Most Recent flagy, amoxicillin END_DATES*/

  _cutoff_date=intnx('month',start_date,-6,'s');
  if first.pt_id then call missing(of _mr_:);

  fl_dummy=(first.pt_id=0 and _mr_flagy>=_cutoff_date);
  am_dummy=(first.pt_id=0 and _mr_amox>=_cutoff_date);

  if findw(treatment,'flagy',',') then _mr_flagy=end_date;
  if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
run;

 

Edited note: If you have several antibiotics to check for, you might want to use arrays for the most-recent dates, the list of antibiotic names, and the set of dummy variables.  It will make for a much smaller program, and much less risk of typing errors.  Below just uses arrays of size 2:

 

data want (drop=i _:) ;
  set have;
  by pt_id;

  array _mr_date{2}    _temporary_ ;
  array ablist{2} $11  _temporary_ ('flagy'     ,'amoxicillin'    );
  array ab_dummy{2}                  flagy_dummy  amoxicillin_dummy ;

  _cutoff_date=intnx('month',start_date,-6,'s');
  if first.pt_id then call missing(of _mr_date{*});
 

  do i=1 to 2;
    ab_dummy{i}=(first.pt_id=0 and _mr_date{i}>=_cutoff_date);
    if findw(treatment,trim(ablist{i}),',') then _mr_date{i}=end_date;
  end;
run;

Just be sure that the dummy variable assignment statement PRECEDES the updating of the most-recent-dates for the corresponding antibiotic.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DathanMD
Obsidian | Level 7

Thanks Mkeintz,

I would like to set a flag for a given antibiotic if the preceding end_date(s) for that given PT_ID and antibiotic is less than six months prior to the current start_date. So I am not only scanning the recent preceding entry but also any prior treatments whose end dates fall within the given pt_id and treatment.

 

It looks like your code looks out for the antibiotic in the most recent treatment (if within six months).

 

Thanks 

 

Dathan

 

mkeintz
PROC Star

@DathanMD wrote:

....

 

It looks like your code looks out for the antibiotic in the most recent treatment (if within six months).

Not true.  Look at the statements:

  if findw(treatment,'flagy',',') then _mr_flagy=end_date;
  if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;

They update the most recent date only if the incoming record has that particular drug.  So, for example,  a FLAGY has end_date=31mar2021, and is followed by a two month range (01apr2021-31may2021) without FLAGY, the 31mar2021 will be retained.  Given that (1) your data are sorted by PT_ID/START_DATE, and (2) you do not have overlapping date ranges, that means that your will always have available the most recent previous END_DATE for each drug, even if it is not the immediately preceding record.

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1558 views
  • 1 like
  • 4 in conversation