Hi there,
I am a junior sas user and I would like your help to crack some code! I have a dataset of antibiotic use by patients <pt_id> over time in a long data format. The antibiotics (<treatment>) have start and end dates (table “have”).
Table have
pt_id |
start_date |
end_date |
treatment |
1 |
01JAN2019 |
15MAR2019 |
cipro,flagy,amoxicillin |
1 |
16MAR2019 |
15APRIL2019 |
cipro,flagy,amoxicillin,midazolam |
1 |
17DEC2019 |
18DEC2019 |
cipro,flagy,amoxicillin,penicillin |
1 |
22FEB2020 |
26FEB2020 |
cipro,flagy, |
2 |
01JAN2018 |
15MAR2018 |
cipro,amoxicillin |
2 |
16MAR2018 |
15APR2018 |
flagy,amoxicillin |
2 |
17DEC2018 |
18DEC2018 |
cipro,flagy,amoxicillin,pethidine |
2 |
22FEB2019 |
26FEB2019 |
cipro,flagy,amoxicillin,clopidogrel |
2 |
23JUL2020 |
29JUL2020 |
cipro,flagy,amoxicillin,warfarin |
2 |
01SEPT2021 |
05SEPT2021 |
cipro,flagy,amoxicillin,tazo |
2 |
06SEPT2021 |
16SEPT2021 |
cipro,flagy,amoxicillin,piperacin |
3 |
05DEC2019 |
18DEC2019 |
cipro,flagy,amoxicillin |
3 |
22FEB2021 |
26FEB2021 |
cipro,flagy,amoxicillin |
3 |
01JAN2022 |
15MAR2022 |
cipro,flagy,amoxicillin |
I would like to determine if a participant received a specific antibiotic (in this case Flagyl and amoxicillin within six months prior to the start of a given treatment (so the prior end date for that antibiotic falls within six months from the start of a new treatment combination). So, table “want” creates new variables, <expo_flagyl_6m> and <expo_amoxacillin_6m> for prior exposure to flagyl and amoxicillin, respectively, within six months
Data want
pt_id |
start_date |
end_date |
treatment |
Expo_flagyl_6m |
Expo_amoxacillin_6m |
1 |
01JAN2019 |
15MAR2019 |
cipro,flagy,amoxicillin |
0 |
0 |
1 |
16MAR2019 |
15APRIL2019 |
cipro,flagy,amoxicillin,midazolam |
1 |
1 |
1 |
17DEC2019 |
18DEC2019 |
cipro,flagy,amoxicillin,penicillin |
0 |
0 |
1 |
22FEB2020 |
26FEB2020 |
cipro,flagy, |
1 |
1 |
2 |
01JAN2018 |
15MAR2018 |
cipro,amoxicillin |
0 |
0 |
2 |
16MAR2018 |
15APR2018 |
flagy,amoxicillin |
0 |
1 |
2 |
17DEC2018 |
18DEC2018 |
cipro,flagy,amoxicillin,pethidine |
0 |
0 |
2 |
22FEB2019 |
26FEB2019 |
cipro,flagy,amoxicillin,clopidogrel |
1 |
1 |
2 |
23JUL2020 |
29JUL2020 |
cipro,flagy,amoxicillin,warfarin |
0 |
0 |
2 |
01SEPT2021 |
05SEPT2021 |
cipro,flagy,amoxicillin,tazo |
0 |
0 |
2 |
06SEPT2021 |
16SEPT2021 |
cipro,flagy,amoxicillin,piperacin |
1 |
1 |
3 |
05DEC2019 |
18DEC2019 |
cipro,flagy,amoxicillin |
0 |
0 |
3 |
22FEB2021 |
26FEB2021 |
cipro,flagy,amoxicillin |
0 |
0 |
3 |
01JAN2022 |
15MAR2022 |
cipro,flagy,amoxicillin |
0 |
0 |
Is it possible to suggest a macro so that I can run other columns for exposure to other antibiotics? Thank you so much for your help.
Dathan Byonanebye
You apparently want to set a flag for a given antibiotic if the preceding end_date for that given PT_ID and antibiotic is less than six months prior to the current start_date.
Given that data is sorted by PT_ID/START_DATE, and there are no overlapping data ranges, you can:
data want (drop=_:) ;
set have;
by pt_id;
retain _mr_flagy _mr_amox ; /*Most Recent flagy, amoxicillin END_DATES*/
_cutoff_date=intnx('month',start_date,-6,'s');
if first.pt_id then call missing(of _mr_:);
fl_dummy=(first.pt_id=0 and _mr_flagy>=_cutoff_date);
am_dummy=(first.pt_id=0 and _mr_amox>=_cutoff_date);
if findw(treatment,'flagy',',') then _mr_flagy=end_date;
if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
run;
Edited note: If you have several antibiotics to check for, you might want to use arrays for the most-recent dates, the list of antibiotic names, and the set of dummy variables. It will make for a much smaller program, and much less risk of typing errors. Below just uses arrays of size 2:
data want (drop=i _:) ;
set have;
by pt_id;
array _mr_date{2} _temporary_ ;
array ablist{2} $11 _temporary_ ('flagy' ,'amoxicillin' );
array ab_dummy{2} flagy_dummy amoxicillin_dummy ;
_cutoff_date=intnx('month',start_date,-6,'s');
if first.pt_id then call missing(of _mr_date{*});
do i=1 to 2;
ab_dummy{i}=(first.pt_id=0 and _mr_date{i}>=_cutoff_date);
if findw(treatment,trim(ablist{i}),',') then _mr_date{i}=end_date;
end;
run;
Just be sure that the dummy variable assignment statement PRECEDES the updating of the most-recent-dates for the corresponding antibiotic.
Please post data in usable form: a data step using datalines.
Before developing a macro write code without any macro statements, so that you have working code when trying to generalize it.
data have;
input pt_id start_date : date9. end_date : date9. treatment $80.;
format start_date end_date : date9.;
cards;
1
01JAN2019
15MAR2019
cipro,flagy,amoxicillin
1
16MAR2019
15APR2019
cipro,flagy,amoxicillin,midazolam
1
17DEC2019
18DEC2019
cipro,flagy,amoxicillin,penicillin
1
22FEB2020
26FEB2020
cipro,flagy,
2
01JAN2018
15MAR2018
cipro,amoxicillin
2
16MAR2018
15APR2018
flagy,amoxicillin
2
17DEC2018
18DEC2018
cipro,flagy,amoxicillin,pethidine
2
22FEB2019
26FEB2019
cipro,flagy,amoxicillin,clopidogrel
2
23JUL2020
29JUL2020
cipro,flagy,amoxicillin,warfarin
2
01SEP2021
05SEP2021
cipro,flagy,amoxicillin,tazo
2
06SEP2021
16SEP2021
cipro,flagy,amoxicillin,piperacin
3
05DEC2019
18DEC2019
cipro,flagy,amoxicillin
3
22FEB2021
26FEB2021
cipro,flagy,amoxicillin
3
01JAN2022
15MAR2022
cipro,flagy,amoxicillin
;
data flagy amoxicillin ;
set have;
do i=1 to countw(treatment,',');
name=scan(treatment,i,',');
do date=start_date to end_date;
if name='flagy' then output flagy;
if name='amoxicillin' then output amoxicillin;
end;
end;
keep date pt_id ;
format date date9.;
run;
data want;
if _n_=1 then do;
declare hash h1(dataset:'flagy');
h1.definekey('pt_id','date');
h1.definedone();
declare hash h2(dataset:'amoxicillin');
h2.definekey('pt_id','date');
h2.definedone();
end;
set have;
Expo_flagyl_6m=0;
Expo_amoxacillin_6m=0;
do date=intnx('month',start_date,-6,'s') to start_date-1;
if h1.check()=0 then Expo_flagyl_6m=1;
if h2.check()=0 then Expo_amoxacillin_6m=1;
end;
drop date;
run;
Thanks Ksharp,
I think your solution works best for a small dataset, I realise it involves splitting dates into days and this is creating a massive dataset and "memory failure". I have about 50,000 patients with over 20 years of follow-up data
Dathan
You apparently want to set a flag for a given antibiotic if the preceding end_date for that given PT_ID and antibiotic is less than six months prior to the current start_date.
Given that data is sorted by PT_ID/START_DATE, and there are no overlapping data ranges, you can:
data want (drop=_:) ;
set have;
by pt_id;
retain _mr_flagy _mr_amox ; /*Most Recent flagy, amoxicillin END_DATES*/
_cutoff_date=intnx('month',start_date,-6,'s');
if first.pt_id then call missing(of _mr_:);
fl_dummy=(first.pt_id=0 and _mr_flagy>=_cutoff_date);
am_dummy=(first.pt_id=0 and _mr_amox>=_cutoff_date);
if findw(treatment,'flagy',',') then _mr_flagy=end_date;
if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
run;
Edited note: If you have several antibiotics to check for, you might want to use arrays for the most-recent dates, the list of antibiotic names, and the set of dummy variables. It will make for a much smaller program, and much less risk of typing errors. Below just uses arrays of size 2:
data want (drop=i _:) ;
set have;
by pt_id;
array _mr_date{2} _temporary_ ;
array ablist{2} $11 _temporary_ ('flagy' ,'amoxicillin' );
array ab_dummy{2} flagy_dummy amoxicillin_dummy ;
_cutoff_date=intnx('month',start_date,-6,'s');
if first.pt_id then call missing(of _mr_date{*});
do i=1 to 2;
ab_dummy{i}=(first.pt_id=0 and _mr_date{i}>=_cutoff_date);
if findw(treatment,trim(ablist{i}),',') then _mr_date{i}=end_date;
end;
run;
Just be sure that the dummy variable assignment statement PRECEDES the updating of the most-recent-dates for the corresponding antibiotic.
Thanks Mkeintz,
I would like to set a flag for a given antibiotic if the preceding end_date(s) for that given PT_ID and antibiotic is less than six months prior to the current start_date. So I am not only scanning the recent preceding entry but also any prior treatments whose end dates fall within the given pt_id and treatment.
It looks like your code looks out for the antibiotic in the most recent treatment (if within six months).
Thanks
Dathan
@DathanMD wrote:
....
It looks like your code looks out for the antibiotic in the most recent treatment (if within six months).
Not true. Look at the statements:
if findw(treatment,'flagy',',') then _mr_flagy=end_date;
if findw(treatment,'amoxicillin',',') then _mr_amox=end_date;
They update the most recent date only if the incoming record has that particular drug. So, for example, a FLAGY has end_date=31mar2021, and is followed by a two month range (01apr2021-31may2021) without FLAGY, the 31mar2021 will be retained. Given that (1) your data are sorted by PT_ID/START_DATE, and (2) you do not have overlapping date ranges, that means that your will always have available the most recent previous END_DATE for each drug, even if it is not the immediately preceding record.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.