data work.forumq1 ;
infile datalines dsd dlm='|' truncover;
input USUBJID TRT :$3. ASTDT :F. ENDDT :F. DOSE DURATION_ON_DRUG ;
format TRT $3. ASTDT date9. ENDDT date9. ;
informat TRT $3. ASTDT date9. ENDDT date9. ;
label USUBJID='USUBJID' TRT='TRT' ASTDT='ASTDT' ENDDT='ENDDT'
DOSE='DOSE' DURATION_ON_DRUG='DURATION ON DRUG'
;
datalines4;
102|cab|21285|21303|20|19
102|cab|21304|21304|10|1
102|cab|21305|21305|5|1
102|cab|21306|21317|15|12
102|cab|21318|21333|7.5|16
203|cab|21356|21369|60|14
203|cab|21370|21384|50|15
203|cab|21385|21390|40|6
203|cab|21391|21411|30|21
203|cab|21412|21439|25|28
203|cab|21440|21443|30|4
203|cab|21444|21445|25|2
203|cab|21446|21454|50|9
;;;;
hi
i have this task to merge records if the duration on drug is <7 days to next record and do average dose while doing that. i tried to use retain and lag function but i am not able to perform this task. the goal is there should not be any records with duration on drug <7 days. can someone help me with this ?
note: these dates should be continuous then perform the function.
Hi @kk26 Here is a simple and easy two step solution.
1. Groupe the <7 as 1's and the rest as 0's as boolean/binary grouping into a temp dataset
2. sum the group of 1s
3. Output the summed 1s as collapsed into one and 0's groups as all
4. Notsorted is an option to group 1's and 0's in varying order, you will notice the logic when you visualize the temp dataset.
5. Read slowly and carefully. I hope you will grasp the idea
data have;
infile cards expandtabs truncover;
input USUBJID TRT $ (ASTDT ENDDT) (:date9.) DOSE DURATIONONDRUG;
format ASTDT ENDDT date9.;
cards;
102 cab 11-Apr-18 29-Apr-18 20 19
102 cab 30-Apr-18 30-Apr-18 10 1
102 cab 1-May-18 1-May-18 5 1
102 cab 2-May-18 13-May-18 15 12
102 cab 14-May-18 29-May-18 7.5 16
203 cab 21-Jun-18 4-Jul-18 60 14
203 cab 5-Jul-18 19-Jul-18 50 15
203 cab 20-Jul-18 25-Jul-18 40 6
203 cab 26-Jul-18 15-Aug-18 30 21
203 cab 16-Aug-18 12-Sep-18 25 28
203 cab 13-Sep-18 16-Sep-18 30 4
203 cab 17-Sep-18 18-Sep-18 25 2
203 cab 19-Sep-18 27-Sep-18 50 9
;
data temp;
set have;
by USUBJID ;
n=DURATIONONDRUG<7;
if lag(n)=1 and lag(USUBJID)=USUBJID and n=0 or n=1 then n1=1;
else n1=0;
run;
data want;
set temp;
by USUBJID n1 notsorted;
retain dt;
if first.n1 and n1 then do;
dt=astdt;
s=0;
s1=0;
end;
s+DURATIONONDRUG;
s1+dose;
if last.n1 or n1=0;
if n1 then do;
astdt=dt;
DURATIONONDRUG=s;
ndays=intck('day',astdt,ENDDT)+1;
dose=s1/ndays;
end;
drop n: s: dt;
run;
Many of us will not open Excel or other Microsoft Office documents because they are a security threat.
Please provide the original data according to these instructions: How to create a data step version of your data AKA generate sample data for forums
data have;
infile cards expandtabs truncover;
input USUBJID TRT $ (ASTDT ENDDT) (:date9.) DOSE DURATIONONDRUG;
format ASTDT ENDDT date9.;
cards;
102 cab 11-Apr-18 29-Apr-18 20 19
102 cab 30-Apr-18 30-Apr-18 10 1
102 cab 1-May-18 1-May-18 5 1
102 cab 2-May-18 13-May-18 15 12
102 cab 14-May-18 29-May-18 7.5 16
203 cab 21-Jun-18 4-Jul-18 60 14
203 cab 5-Jul-18 19-Jul-18 50 15
203 cab 20-Jul-18 25-Jul-18 40 6
203 cab 26-Jul-18 15-Aug-18 30 21
203 cab 16-Aug-18 12-Sep-18 25 28
203 cab 13-Sep-18 16-Sep-18 30 4
203 cab 17-Sep-18 18-Sep-18 25 2
203 cab 19-Sep-18 27-Sep-18 50 9
;
data want;
do until(last.usubjid);
set have;
by usubjid;
if DURATIONONDRUG<7 then do;
t=sum(t,DURATIONONDRUG);
continue;
end;
DURATIONONDRUG=sum(t,DURATIONONDRUG);
t=.;
output;
end;
drop t;
run;
hi , thank you so much for helping me on this.
still the start and end dates are not coming right , also the dose (need to be average)= sum of dose / days
for example:
i have tried doing this but it did not work. look like i am doing something wrong.
data want;
do until(last.usubjid);
set have;
by usubjid;
if DURATIONONDRUG<7 then do;
t=sum(t,DURATIONONDRUG);
d=sum(d,dose);
continue;
end;
DURATIONONDRUG=sum(t,DURATIONONDRUG);
dose=sum(d,dose)/DURATIONONDRUG;
t=.;
d=.;
output;
end;
drop t d;
run;
My apologies. I totally didn't even read the question or attempt to understand properly. Let me take a look a bit later once I am done at work. Pardon me!
You've described a simple task with 2 components:
data have;
infile cards expandtabs truncover;
input USUBJID TRT $ (ASTDT ENDDT) (:date9.) DOSE DURATIONONDRUG;
format ASTDT ENDDT date9.;
cards;
102 cab 11-Apr-18 29-Apr-18 20 19
102 cab 30-Apr-18 30-Apr-18 10 1
102 cab 1-May-18 1-May-18 5 1
102 cab 2-May-18 13-May-18 15 12
102 cab 14-May-18 29-May-18 7.5 16
203 cab 21-Jun-18 4-Jul-18 60 14
203 cab 5-Jul-18 19-Jul-18 50 15
203 cab 20-Jul-18 25-Jul-18 40 6
203 cab 26-Jul-18 15-Aug-18 30 21
203 cab 16-Aug-18 12-Sep-18 25 28
203 cab 13-Sep-18 16-Sep-18 30 4
203 cab 17-Sep-18 18-Sep-18 25 2
203 cab 19-Sep-18 27-Sep-18 50 9
;
data want (drop=_:);
set have;
by usubjid;
_accum+durationondrug;
if durationondrug>=7 ;
durationondrug=_accum;
output;
_accum=0;
run;
This program assumes that no ID ends with a record having duration_on_drug<7, which would contaminate the accumulator for the next ID. If you want to protect against that, then, just after the BY USUBJID statement add:
if first.usubjid=1 then _accum=0;
Hi @kk26 Here is a simple and easy two step solution.
1. Groupe the <7 as 1's and the rest as 0's as boolean/binary grouping into a temp dataset
2. sum the group of 1s
3. Output the summed 1s as collapsed into one and 0's groups as all
4. Notsorted is an option to group 1's and 0's in varying order, you will notice the logic when you visualize the temp dataset.
5. Read slowly and carefully. I hope you will grasp the idea
data have;
infile cards expandtabs truncover;
input USUBJID TRT $ (ASTDT ENDDT) (:date9.) DOSE DURATIONONDRUG;
format ASTDT ENDDT date9.;
cards;
102 cab 11-Apr-18 29-Apr-18 20 19
102 cab 30-Apr-18 30-Apr-18 10 1
102 cab 1-May-18 1-May-18 5 1
102 cab 2-May-18 13-May-18 15 12
102 cab 14-May-18 29-May-18 7.5 16
203 cab 21-Jun-18 4-Jul-18 60 14
203 cab 5-Jul-18 19-Jul-18 50 15
203 cab 20-Jul-18 25-Jul-18 40 6
203 cab 26-Jul-18 15-Aug-18 30 21
203 cab 16-Aug-18 12-Sep-18 25 28
203 cab 13-Sep-18 16-Sep-18 30 4
203 cab 17-Sep-18 18-Sep-18 25 2
203 cab 19-Sep-18 27-Sep-18 50 9
;
data temp;
set have;
by USUBJID ;
n=DURATIONONDRUG<7;
if lag(n)=1 and lag(USUBJID)=USUBJID and n=0 or n=1 then n1=1;
else n1=0;
run;
data want;
set temp;
by USUBJID n1 notsorted;
retain dt;
if first.n1 and n1 then do;
dt=astdt;
s=0;
s1=0;
end;
s+DURATIONONDRUG;
s1+dose;
if last.n1 or n1=0;
if n1 then do;
astdt=dt;
DURATIONONDRUG=s;
ndays=intck('day',astdt,ENDDT)+1;
dose=s1/ndays;
end;
drop n: s: dt;
run;
hi @novinosrin ,
data have;
infile cards expandtabs truncover;
input USUBJID TRT $ (ASTDT ENDDT) (:date9.) DOSE DURATIONONDRUG;
format ASTDT ENDDT date9.;
cards;
102 cab 11-Apr-18 29-Apr-18 20 19
102 cab 30-Apr-18 30-Apr-18 10 1
102 cab 1-May-18 1-May-18 5 1
102 cab 2-May-18 13-May-18 15 12
102 cab 14-May-18 29-May-18 7.5 16
203 cab 21-Jun-18 4-Jul-18 60 14
203 cab 5-Jul-18 19-Jul-18 50 15
203 cab 20-Jul-18 25-Jul-18 40 6
203 cab 26-Jul-18 15-Aug-18 30 21
203 cab 16-Aug-18 12-Sep-18 25 28
203 cab 13-Sep-18 16-Sep-18 30 4
203 cab 17-Sep-18 18-Sep-18 25 2
203 cab 19-Sep-18 27-Sep-18 50 9
203 cab 28-Sep-18 15-Oct-18 80 18
203 cab 16-Oct-18 18-Oct-18 80 3
203 cab 19-Oct-18 22-Oct-18 80 4
203 cab 23-Oct-18 24-Oct-18 80 2
203 cab 25-Oct-18 26-Oct-18 50 3
204 can 20-Nov-18 24-Nov-18 35 5
;
hi @novinosrin ,
can we add condition so it stops merging to next records once the days reached >=7 count ?
in here the last two records for subject 203 has 5 days after adding but that is fine since there is no extra records to merge.
Hi @kk26 about to doze off, as i'm very tired. Can i take a look right in the morning(Eastern time US) if it's not too urgent? I'm just knackered. I could barely open my eyes
HI @kk26 First off, It appears the values in your latest HAVE(Before) and WANT(After) are not consistent. For example, the ENDT value in before 26-Oct-18
203 cab 25-Oct-18 26-Oct-18 50 3
And in the after equivalent it is 27-Oct-18. Also new USSubjid in before
204 can 20-Nov-18 24-Nov-18 35 5
and the equivalent in after is 301. Though trivial, I failed to notice until I was wondering how SAS can get basic math wrong. So, please going forward verify the details 🙂
Secondly, I have a feeling, AFTER for
204 can 20-Nov-18 24-Nov-18 35 5
should ideally be the average dose 7 in place of 35 to be in consistent with the logic to apply for the previous i.e.
203 cab 23-Oct-18 24-Oct-18 80 2
203 cab 25-Oct-18 27-Oct-18 50 3
resulting in
203 cab 23OCT2018 27OCT2018 26 5
So assuming the above makes sense, the following solution would meet it
data have;
infile cards expandtabs truncover;
input USUBJID TRT $ (ASTDT ENDDT) (:date9.) DOSE DURATIONONDRUG;
format ASTDT ENDDT date9.;
cards;
102 cab 11-Apr-18 29-Apr-18 20 19
102 cab 30-Apr-18 30-Apr-18 10 1
102 cab 1-May-18 1-May-18 5 1
102 cab 2-May-18 13-May-18 15 12
102 cab 14-May-18 29-May-18 7.5 16
203 cab 21-Jun-18 4-Jul-18 60 14
203 cab 5-Jul-18 19-Jul-18 50 15
203 cab 20-Jul-18 25-Jul-18 40 6
203 cab 26-Jul-18 15-Aug-18 30 21
203 cab 16-Aug-18 12-Sep-18 25 28
203 cab 13-Sep-18 16-Sep-18 30 4
203 cab 17-Sep-18 18-Sep-18 25 2
203 cab 19-Sep-18 27-Sep-18 50 9
203 cab 28-Sep-18 15-Oct-18 80 18
203 cab 16-Oct-18 18-Oct-18 80 3
203 cab 19-Oct-18 22-Oct-18 80 4
203 cab 23-Oct-18 24-Oct-18 80 2
203 cab 25-Oct-18 27-Oct-18 50 3
204 can 20-Nov-18 24-Nov-18 35 5
;
data want;
do until(last.USUBJID);
set have;
by USUBJID;
if DURATIONONDRUG<7 then do;
if dt=. then dt=astdt;
s=sum(s,DURATIONONDRUG);
s1=sum(s1,dose);
if s>=7 or last.USUBJID then do;
astdt=dt;
ndays=intck('day',astdt,ENDDT)+1;
dose=s1/ndays;
DURATIONONDRUG=s;
output;
call missing(of s:,dt);
end;
continue;
end;
else if s and dt then do;
s=sum(s,DURATIONONDRUG);
s1=sum(s1,dose);
astdt=dt;
DURATIONONDRUG=s;
ndays=intck('day',astdt,ENDDT)+1;
dose=s1/ndays;
call missing(of s:,dt);
end;
output;
end;
drop s: dt ndays;
run;
Best Regards!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.