Hi there,
I am a junior sas user and I would like your help to crack some code! I have a dataset of antibiotic use by patients <pt_id> over time in a long data format. The antibiotics (<treatment>) have start and end dates (table “have”).
Table have
pt_id |
start_date |
end_date |
treatment |
1 |
01JAN2019 |
15MAR2019 |
cipro,flagy,amoxicillin |
1 |
16MAR2019 |
15APRIL2019 |
cipro,flagy,amoxicillin,midazolam |
1 |
17DEC2019 |
18DEC2019 |
cipro,flagy,amoxicillin,penicillin |
1 |
22FEB2020 |
26FEB2020 |
cipro,flagy, |
2 |
01JAN2018 |
15MAR2018 |
cipro,amoxicillin |
2 |
16MAR2018 |
15APR2018 |
flagy,amoxicillin |
2 |
17DEC2018 |
18DEC2018 |
cipro,flagy,amoxicillin,pethidine |
2 |
22FEB2019 |
26FEB2019 |
cipro,flagy,amoxicillin,clopidogrel |
2 |
23JUL2020 |
29JUL2020 |
cipro,flagy,amoxicillin,warfarin |
2 |
01SEPT2021 |
05SEPT2021 |
cipro,flagy,amoxicillin,tazo |
2 |
06SEPT2021 |
16SEPT2021 |
cipro,flagy,amoxicillin,piperacin |
3 |
05DEC2019 |
18DEC2019 |
cipro,flagy,amoxicillin |
3 |
22FEB2021 |
26FEB2021 |
cipro,flagy,amoxicillin |
3 |
01JAN2022 |
15MAR2022 |
cipro,flagy,amoxicillin |
I would like to determine if a participant received a specific antibiotic (in this case Flagyl and amoxicillin within six months prior to the start of a given treatment (so the prior end date for that antibiotic falls within six months from the start of a new treatment combination). So, table “want” creates new variables, <expo_flagyl_6m> and <expo_amoxacillin_6m> for prior exposure to flagyl and amoxicillin, respectively, within six months
Data want
pt_id |
start_date |
end_date |
treatment |
Expo_flagyl_6m |
Expo_amoxacillin_6m |
1 |
01JAN2019 |
15MAR2019 |
cipro,flagy,amoxicillin |
0 |
0 |
1 |
16MAR2019 |
15APRIL2019 |
cipro,flagy,amoxicillin,midazolam |
1 |
1 |
1 |
17DEC2019 |
18DEC2019 |
cipro,flagy,amoxicillin,penicillin |
0 |
0 |
1 |
22FEB2020 |
26FEB2020 |
cipro,flagy, |
1 |
1 |
2 |
01JAN2018 |
15MAR2018 |
cipro,amoxicillin |
0 |
0 |
2 |
16MAR2018 |
15APR2018 |
flagy,amoxicillin |
0 |
1 |
2 |
17DEC2018 |
18DEC2018 |
cipro,flagy,amoxicillin,pethidine |
0 |
0 |
2 |
22FEB2019 |
26FEB2019 |
cipro,flagy,amoxicillin,clopidogrel |
1 |
1 |
2 |
23JUL2020 |
29JUL2020 |
cipro,flagy,amoxicillin,warfarin |
0 |
0 |
2 |
01SEPT2021 |
05SEPT2021 |
cipro,flagy,amoxicillin,tazo |
0 |
0 |
2 |
06SEPT2021 |
16SEPT2021 |
cipro,flagy,amoxicillin,piperacin |
1 |
1 |
3 |
05DEC2019 |
18DEC2019 |
cipro,flagy,amoxicillin |
0 |
0 |
3 |
22FEB2021 |
26FEB2021 |
cipro,flagy,amoxicillin |
0 |
0 |
3 |
01JAN2022 |
15MAR2022 |
cipro,flagy,amoxicillin |
0 |
0 |
Is it possible to suggest a macro so that I can run other columns for exposure to other antibiotics? Thank you so much for your help.
Dathan Byonanebye
You apparently want to set a flag for a given antibiotic if the preceding end_date for that given PT_ID and antibiotic is less than six months prior to the current start_date.
Given that data is sorted by PT_ID/START_DATE, and there are no overlapping data ranges, you can:
data want (drop=_:) ;
set have;
by pt_id;
retain _mr_flagy _mr_amox ; /*Most Recent flagy, amoxicillin END_DATES*/
_cutoff_date=intnx('month',start_date,-6,'s');
if first.pt_id then call missing(of _mr_:);
fl_dummy=(first.pt_id=0 and _mr_flagy>=_cutoff_date);
am_dummy=(first.pt_id=0 and _mr_amox>=_cutoff_date);
if findw(treatment,'flagy',',') then _mr_flagy=end_date;
if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
run;
Edited note: If you have several antibiotics to check for, you might want to use arrays for the most-recent dates, the list of antibiotic names, and the set of dummy variables. It will make for a much smaller program, and much less risk of typing errors. Below just uses arrays of size 2:
data want (drop=i _:) ;
set have;
by pt_id;
array _mr_date{2} _temporary_ ;
array ablist{2} $11 _temporary_ ('flagy' ,'amoxicillin' );
array ab_dummy{2} flagy_dummy amoxicillin_dummy ;
_cutoff_date=intnx('month',start_date,-6,'s');
if first.pt_id then call missing(of _mr_date{*});
do i=1 to 2;
ab_dummy{i}=(first.pt_id=0 and _mr_date{i}>=_cutoff_date);
if findw(treatment,trim(ablist{i}),',') then _mr_date{i}=end_date;
end;
run;
Just be sure that the dummy variable assignment statement PRECEDES the updating of the most-recent-dates for the corresponding antibiotic.
Please post data in usable form: a data step using datalines.
Before developing a macro write code without any macro statements, so that you have working code when trying to generalize it.
data have;
input pt_id start_date : date9. end_date : date9. treatment $80.;
format start_date end_date : date9.;
cards;
1
01JAN2019
15MAR2019
cipro,flagy,amoxicillin
1
16MAR2019
15APR2019
cipro,flagy,amoxicillin,midazolam
1
17DEC2019
18DEC2019
cipro,flagy,amoxicillin,penicillin
1
22FEB2020
26FEB2020
cipro,flagy,
2
01JAN2018
15MAR2018
cipro,amoxicillin
2
16MAR2018
15APR2018
flagy,amoxicillin
2
17DEC2018
18DEC2018
cipro,flagy,amoxicillin,pethidine
2
22FEB2019
26FEB2019
cipro,flagy,amoxicillin,clopidogrel
2
23JUL2020
29JUL2020
cipro,flagy,amoxicillin,warfarin
2
01SEP2021
05SEP2021
cipro,flagy,amoxicillin,tazo
2
06SEP2021
16SEP2021
cipro,flagy,amoxicillin,piperacin
3
05DEC2019
18DEC2019
cipro,flagy,amoxicillin
3
22FEB2021
26FEB2021
cipro,flagy,amoxicillin
3
01JAN2022
15MAR2022
cipro,flagy,amoxicillin
;
data flagy amoxicillin ;
set have;
do i=1 to countw(treatment,',');
name=scan(treatment,i,',');
do date=start_date to end_date;
if name='flagy' then output flagy;
if name='amoxicillin' then output amoxicillin;
end;
end;
keep date pt_id ;
format date date9.;
run;
data want;
if _n_=1 then do;
declare hash h1(dataset:'flagy');
h1.definekey('pt_id','date');
h1.definedone();
declare hash h2(dataset:'amoxicillin');
h2.definekey('pt_id','date');
h2.definedone();
end;
set have;
Expo_flagyl_6m=0;
Expo_amoxacillin_6m=0;
do date=intnx('month',start_date,-6,'s') to start_date-1;
if h1.check()=0 then Expo_flagyl_6m=1;
if h2.check()=0 then Expo_amoxacillin_6m=1;
end;
drop date;
run;
Thanks Ksharp,
I think your solution works best for a small dataset, I realise it involves splitting dates into days and this is creating a massive dataset and "memory failure". I have about 50,000 patients with over 20 years of follow-up data
Dathan
You apparently want to set a flag for a given antibiotic if the preceding end_date for that given PT_ID and antibiotic is less than six months prior to the current start_date.
Given that data is sorted by PT_ID/START_DATE, and there are no overlapping data ranges, you can:
data want (drop=_:) ;
set have;
by pt_id;
retain _mr_flagy _mr_amox ; /*Most Recent flagy, amoxicillin END_DATES*/
_cutoff_date=intnx('month',start_date,-6,'s');
if first.pt_id then call missing(of _mr_:);
fl_dummy=(first.pt_id=0 and _mr_flagy>=_cutoff_date);
am_dummy=(first.pt_id=0 and _mr_amox>=_cutoff_date);
if findw(treatment,'flagy',',') then _mr_flagy=end_date;
if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
run;
Edited note: If you have several antibiotics to check for, you might want to use arrays for the most-recent dates, the list of antibiotic names, and the set of dummy variables. It will make for a much smaller program, and much less risk of typing errors. Below just uses arrays of size 2:
data want (drop=i _:) ;
set have;
by pt_id;
array _mr_date{2} _temporary_ ;
array ablist{2} $11 _temporary_ ('flagy' ,'amoxicillin' );
array ab_dummy{2} flagy_dummy amoxicillin_dummy ;
_cutoff_date=intnx('month',start_date,-6,'s');
if first.pt_id then call missing(of _mr_date{*});
do i=1 to 2;
ab_dummy{i}=(first.pt_id=0 and _mr_date{i}>=_cutoff_date);
if findw(treatment,trim(ablist{i}),',') then _mr_date{i}=end_date;
end;
run;
Just be sure that the dummy variable assignment statement PRECEDES the updating of the most-recent-dates for the corresponding antibiotic.
Thanks Mkeintz,
I would like to set a flag for a given antibiotic if the preceding end_date(s) for that given PT_ID and antibiotic is less than six months prior to the current start_date. So I am not only scanning the recent preceding entry but also any prior treatments whose end dates fall within the given pt_id and treatment.
It looks like your code looks out for the antibiotic in the most recent treatment (if within six months).
Thanks
Dathan
@DathanMD wrote:
....
It looks like your code looks out for the antibiotic in the most recent treatment (if within six months).
Not true. Look at the statements:
if findw(treatment,'flagy',',') then _mr_flagy=end_date;
if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
They update the most recent date only if the incoming record has that particular drug. So, for example, a FLAGY has end_date=31mar2021, and is followed by a two month range (01apr2021-31may2021) without FLAGY, the 31mar2021 will be retained. Given that (1) your data are sorted by PT_ID/START_DATE, and (2) you do not have overlapping date ranges, that means that your will always have available the most recent previous END_DATE for each drug, even if it is not the immediately preceding record.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.