Hi Team,
I have two datasets one with EXDOSE start and end dates and another one is with ADT(Date)
How to assign the EXDOSE in second datset where ADT is between STDT and ENDT , if ADT is not fall between the STDT and ENDT then assign the EXDOSE which is before ADT like in below.
data exdose;
length USUBJID $7;
format stdt endt date9.;
input USUBJID $ EXDOSE stdt :date9. endt :date9.;
datalines;
1361007 14 16MAY2025 18MAY2025
1361007 14 20MAY2025 30MAY2025
1361007 18 31MAY2025 14JUN2025
1361007 18 16JUN2025 18JUN2025
;
run;
data second_ds;
length USUBJID $7;
format adt date9.;
input USUBJID $ ADT :date9.;
datalines;
1361007 17MAY2025
1361007 19MAY2025
1361007 30MAY2025
1361007 15JUN2025
;
run;
Expect dataset :
for ADT 19MAY2025 which is not fall any date range , so 19AMAY2025 previous dose is 14 , so for this record 14 need to assign same for 15JUN2025 previous dose is 18 , first need to assign between date range , if still missing then assign previous dose
Thank you,
Raja
data exdose;
length USUBJID $7;
format stdt endt date9.;
input USUBJID $ EXDOSE stdt :date9. endt :date9.;
datalines;
1361007 14 16MAY2025 18MAY2025
1361007 14 20MAY2025 30MAY2025
1361007 18 31MAY2025 14JUN2025
1361007 18 16JUN2025 18JUN2025
;
run;
data second_ds;
length USUBJID $7;
format adt date9.;
input USUBJID $ ADT :date9.;
datalines;
1361007 17MAY2025
1361007 19MAY2025
1361007 30MAY2025
1361007 15JUN2025
;
run;
proc sql;
create table part1 as
select a.*,b.EXDOSE
from second_ds as a ,exdose as b
where a.USUBJID=b.USUBJID and a.ADT between b.stdt and b.endt;
create table part2 as
select a.*,b.EXDOSE
from (select * from second_ds where catx(' ',USUBJID,ADT) not in (select catx(' ',USUBJID,ADT) from part1)) as a,exdose as b
where a.USUBJID=b.USUBJID and a.ADT > b.endt
group by a.USUBJID,a.ADT
having b.endt=max(b.endt);
create table want as
select * from part1
union
select * from part2
order by USUBJID,ADT;
quit;
Here's an alternative. I added another record at the beginning of SECOND_DS just to make sure that it was working correctly - for ADT=10MAY, there are no doses available yet from EXDOSE, so that row ends up with missing values.
data exdose;
length USUBJID $7;
format stdt endt date9.;
input USUBJID $ EXDOSE stdt :date9. endt :date9.;
datalines;
1361007 14 16MAY2025 18MAY2025
1361007 14 20MAY2025 30MAY2025
1361007 18 31MAY2025 14JUN2025
1361007 18 16JUN2025 18JUN2025
;
run;
data second_ds;
length USUBJID $7;
format adt date9.;
input USUBJID $ ADT :date9.;
datalines;
1361007 10MAY2025
1361007 17MAY2025
1361007 19MAY2025
1361007 30MAY2025
1361007 15JUN2025
;
run;
proc sql noprint;
select max(nrecs) into :maxrecs trimmed
from (select usubjid, count(*) as nrecs from exdose group by usubjid);
quit;
data want;
set
exdose
second_ds (in=B)
;
sd=B;
run;
proc sort data=want; by usubjid sd stdt adt; run;
data want;
set want;
by usubjid;
array T {0:&maxrecs,3} _temporary_;
retain j;
if first.usubjid then do;
call missing(of T[*], i);
j=0;
end;
if not sd then do;
i+1;
T[i,1]=stdt;
T[i,2]=endt;
T[i,3]=exdose;
end;
else do;
do while (adt>T[j,2] and j<i);
j+1;
end;
if adt<T[j,1] then j=j-1;
stdt=T[j,1];
endt=T[j,2];
exdose=T[j,3];
output;
end;
keep usubjid stdt endt exdose adt;
run;
proc print data=want; run;
RESULT:
Hi @raja777pharma , it looks to me like you have two datasets, one of which is the exposure dosage (exdose) data, which plans to administer two dosage of treatment (14mg and 18mg, twice of each) to a subject on a certain date, and the date range is determined. In the other dataset, a disposition (second_ds) dataset, the date (i.e., the adt) for administer the treatment changed (or had adjustment) and two of the adt date are not in range, so for that two adt date, the dosage should be the one which was planned (in exdose dataset) to assign to the time range before that adt date. My code and output is as follows, and the result is the same as @quickbluefish 's (and a while later I found my steps in fact are the same with @Ksharp 's, except my steps is more basic and simple and @Ksharp combined all proc sql steps in one and use more advanced technique), please kindly let me know if I answered your question, thanks a lot!
data exdose;
input USUBJID $ EXDOSE
stdt :date9. endt :date9.;
format stdt endt date9.;
datalines;
1361007 14 16MAY2025 18MAY2025
1361007 14 20MAY2025 30MAY2025
1361007 18 31MAY2025 14JUN2025
1361007 18 16JUN2025 18JUN2025
;
run;
proc print data=exdose;run;
data second_ds;
input USUBJID $ ADT date9.;
format adt date9.;
datalines;
1361007 17MAY2025
1361007 19MAY2025
1361007 30MAY2025
1361007 15JUN2025
;
run;
proc print data=second_ds;run;
/*if adt date in the second_ds dataset
is not in range, according to request,
find the earliest date and last date
of the exposure dosage, and produce
a dataset that has two date ranges*/
proc sql;
create table t_range as
select usubjid,
exdose,
min(stdt) as stdtmin
format=date9.,
max(endt) as endtmax
format=date9.
from exdose
group by usubjid,
exdose;
select * from t_range;
quit;
proc sql;
create table t_range2 as
select ed.*,
tr.stdtmin,
tr.endtmax
from exdose as ed left join
t_range as tr
on ed.usubjid=tr.usubjid and
ed.exdose=tr.exdose;
select * from t_range2;
quit;
/*determine which date range the adt date
belongs to, and assign the correspondence
dosage to the adt date*/
proc sql;
create table final1 as
select ds2.*,
tr2.exdose
from second_ds as ds2 left join
t_range2 as tr2
on ds2.usubjid=tr2.usubjid
where tr2.stdt<=ds2.adt<=tr2.endt;
select * from final1;
quit;
proc sql;
create table final2 as
select ds2.*,
tr2.exdose
from second_ds as ds2 left join
t_range2 as tr2
on ds2.usubjid=tr2.usubjid
where tr2.stdtmin<=ds2.adt<=tr2.endtmax
except
select ds2.*,
tr2.exdose
from second_ds as ds2 left join
t_range2 as tr2
on ds2.usubjid=tr2.usubjid
where tr2.stdt<=ds2.adt<=tr2.endt;
select * from final2;
quit;
proc sql;
create table ds2_exdose as
select * from final1
union
select * from final2
order by usubjid,adt;
select * from ds2_exdose;
quit;
If you just want to remember the previous dose then you might consider just interleaving the observations instead trying to use SQL joins or data step MERGEs.
Setup datasets:
data exdose;
input USUBJID :$7. EXDOSE stdt :date. endt :date.;
format stdt endt date9.;
datalines;
1361007 14 16MAY2025 18MAY2025
1361007 14 20MAY2025 30MAY2025
1361007 18 31MAY2025 14JUN2025
1361007 18 16JUN2025 18JUN2025
;
data second_ds;
input USUBJID :$7. ADT :date.;
format adt date9.;
datalines;
1361007 17MAY2025
1361007 19MAY2025
1361007 30MAY2025
1361007 15JUN2025
;
Interleave them and remember the EXDOSE value into a new variable that is retained.
data want;
set exdose(in=in1 keep=usubjid exdose stdt rename=(stdt=adt))
second_ds(in=in2)
;
by usubjid adt;
if in1 then dose=exdose;
if in2 then output;
if last.usubjid then call missing(dose);
retain dose;
drop exdose ;
rename dose=exdose;
run;
Results
Obs USUBJID adt exdose 1 1361007 17MAY2025 14 2 1361007 19MAY2025 14 3 1361007 30MAY2025 14 4 1361007 15JUN2025 18
Hi Tom, thanks a lot for reply! I think this is very brilliant, I hope I can write code like this after a lot of practice. In fact, the key word set, retain, and if.first/last (and some other techniques that do not produce a visible output or a product that directly show how they work) are the techniques that I use least because I have not fully understand the logic (how they work) of them. It is good learning experience for me which let me know how to improve my code and thinking.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.