BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kk26
Calcite | Level 5
data work.forumq1 ;
  infile datalines dsd dlm='|' truncover;
  input USUBJID TRT :$3. ASTDT :F. ENDDT :F. DOSE DURATION_ON_DRUG ;
  format TRT $3. ASTDT date9. ENDDT date9. ;
  informat TRT $3. ASTDT date9. ENDDT date9. ;
  label USUBJID='USUBJID' TRT='TRT' ASTDT='ASTDT' ENDDT='ENDDT'
    DOSE='DOSE' DURATION_ON_DRUG='DURATION ON DRUG'
  ;
datalines4;
102|cab|21285|21303|20|19
102|cab|21304|21304|10|1
102|cab|21305|21305|5|1
102|cab|21306|21317|15|12
102|cab|21318|21333|7.5|16
203|cab|21356|21369|60|14
203|cab|21370|21384|50|15
203|cab|21385|21390|40|6
203|cab|21391|21411|30|21
203|cab|21412|21439|25|28
203|cab|21440|21443|30|4
203|cab|21444|21445|25|2
203|cab|21446|21454|50|9
;;;;

hi 

i have this task to merge records if the duration on drug is <7 days to next record and do average dose while doing that. i tried to use retain and lag function but i am not able to perform this task. the goal is there should not be any records with duration on drug <7 days.  can someone help me with this ? 

note: these dates should be continuous then perform the function.

 

 

forum question.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @kk26  Here is a simple and easy two step solution. 

 

1. Groupe the <7 as 1's and the rest as 0's as boolean/binary grouping into a temp dataset

2. sum the group of 1s

3. Output the summed 1s as collapsed into one and 0's groups as all

4. Notsorted is an option to group 1's and 0's in varying order, you will notice the logic when you visualize the temp dataset.

5. Read slowly and carefully. I hope you will grasp the idea

 


data have;
infile cards expandtabs truncover;
input USUBJID	TRT $	(ASTDT	ENDDT) (:date9.)	DOSE	DURATIONONDRUG;
format ASTDT	ENDDT date9.;
cards;
102	cab	11-Apr-18	29-Apr-18	20	19
102	cab	30-Apr-18	30-Apr-18	10	1
102	cab	1-May-18	1-May-18	5	1
102	cab	2-May-18	13-May-18	15	12
102	cab	14-May-18	29-May-18	7.5	16
203	cab	21-Jun-18	4-Jul-18	60	14
203	cab	5-Jul-18	19-Jul-18	50	15
203	cab	20-Jul-18	25-Jul-18	40	6
203	cab	26-Jul-18	15-Aug-18	30	21
203	cab	16-Aug-18	12-Sep-18	25	28
203	cab	13-Sep-18	16-Sep-18	30	4
203	cab	17-Sep-18	18-Sep-18	25	2
203	cab	19-Sep-18	27-Sep-18	50	9
;

data temp;
 set have;
 by USUBJID ;
 n=DURATIONONDRUG<7;
 if lag(n)=1 and lag(USUBJID)=USUBJID and n=0 or n=1 then n1=1;
 else n1=0;
run;

data want;
 set temp;
 by USUBJID n1 notsorted;
 retain dt;
 if first.n1 and n1 then do;
  dt=astdt;
  s=0;
  s1=0;
 end;
 s+DURATIONONDRUG;
 s1+dose;
 if last.n1 or n1=0;
  if n1 then do;
   astdt=dt;
   DURATIONONDRUG=s;
   ndays=intck('day',astdt,ENDDT)+1;
   dose=s1/ndays;
  end;
drop n: s: dt;
run;

 

 

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

Many of us will not open Excel or other Microsoft Office documents because they are a security threat.

 

Please provide the original data according to these instructions: How to create a data step version of your data AKA generate sample data for forums

--
Paige Miller
novinosrin
Tourmaline | Level 20



data have;
infile cards expandtabs truncover;
input USUBJID	TRT $	(ASTDT	ENDDT) (:date9.)	DOSE	DURATIONONDRUG;
format ASTDT	ENDDT date9.;
cards;
102	cab	11-Apr-18	29-Apr-18	20	19
102	cab	30-Apr-18	30-Apr-18	10	1
102	cab	1-May-18	1-May-18	5	1
102	cab	2-May-18	13-May-18	15	12
102	cab	14-May-18	29-May-18	7.5	16
203	cab	21-Jun-18	4-Jul-18	60	14
203	cab	5-Jul-18	19-Jul-18	50	15
203	cab	20-Jul-18	25-Jul-18	40	6
203	cab	26-Jul-18	15-Aug-18	30	21
203	cab	16-Aug-18	12-Sep-18	25	28
203	cab	13-Sep-18	16-Sep-18	30	4
203	cab	17-Sep-18	18-Sep-18	25	2
203	cab	19-Sep-18	27-Sep-18	50	9
;

data want;
 do until(last.usubjid);
  set have;
  by usubjid;
  if DURATIONONDRUG<7 then do;
   t=sum(t,DURATIONONDRUG);
   continue;
  end;
  DURATIONONDRUG=sum(t,DURATIONONDRUG);
  t=.;
  output;
 end;
 drop t;
run;

kk26
Calcite | Level 5

hi , thank you so much for helping me on this. 

still the start and end dates are not coming right , also the dose (need to be average)= sum of dose / days

for example:

 i have tried doing this but it did not work. look like i am doing something wrong. 

 

 

after.PNG

data want;
    do until(last.usubjid);
set have;
     by usubjid;
           if DURATIONONDRUG<7 then do;
                t=sum(t,DURATIONONDRUG);
                d=sum(d,dose);
            continue;
          end;
                 DURATIONONDRUG=sum(t,DURATIONONDRUG);
                 dose=sum(d,dose)/DURATIONONDRUG;
                        t=.;
                        d=.;
            output;
        end;
         drop t d;
run;

novinosrin
Tourmaline | Level 20

My apologies. I totally didn't even read the question or attempt to understand properly. Let me take a look a bit later once I am done at work. Pardon me!

mkeintz
PROC Star

You've described a simple task with 2 components:

 

  1. Accumulate days on drugs
  2. If current days on drugs > 7 then output and reset the accumulator.
data have;
infile cards expandtabs truncover;
input USUBJID	TRT $	(ASTDT	ENDDT) (:date9.)	DOSE	DURATIONONDRUG;
format ASTDT	ENDDT date9.;
cards;
102	cab	11-Apr-18	29-Apr-18	20	19
102	cab	30-Apr-18	30-Apr-18	10	1
102	cab	1-May-18	1-May-18	5	1
102	cab	2-May-18	13-May-18	15	12
102	cab	14-May-18	29-May-18	7.5	16
203	cab	21-Jun-18	4-Jul-18	60	14
203	cab	5-Jul-18	19-Jul-18	50	15
203	cab	20-Jul-18	25-Jul-18	40	6
203	cab	26-Jul-18	15-Aug-18	30	21
203	cab	16-Aug-18	12-Sep-18	25	28
203	cab	13-Sep-18	16-Sep-18	30	4
203	cab	17-Sep-18	18-Sep-18	25	2
203	cab	19-Sep-18	27-Sep-18	50	9
;

data want (drop=_:);
  set have;
  by usubjid;
  _accum+durationondrug;
  if durationondrug>=7 ;
  durationondrug=_accum;
  output;
  _accum=0;
run;
  

This program assumes that no ID ends with a record having  duration_on_drug<7, which would contaminate the accumulator for the next ID.  If you want to protect against that, then, just after the BY USUBJID statement add:

 

if first.usubjid=1 then _accum=0;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
kk26
Calcite | Level 5
hi , i am glad that i asked my first question on this forum. i have learn something new today. you are awesome! please check my reply to @novinosrin.
novinosrin
Tourmaline | Level 20

Hi @kk26  Here is a simple and easy two step solution. 

 

1. Groupe the <7 as 1's and the rest as 0's as boolean/binary grouping into a temp dataset

2. sum the group of 1s

3. Output the summed 1s as collapsed into one and 0's groups as all

4. Notsorted is an option to group 1's and 0's in varying order, you will notice the logic when you visualize the temp dataset.

5. Read slowly and carefully. I hope you will grasp the idea

 


data have;
infile cards expandtabs truncover;
input USUBJID	TRT $	(ASTDT	ENDDT) (:date9.)	DOSE	DURATIONONDRUG;
format ASTDT	ENDDT date9.;
cards;
102	cab	11-Apr-18	29-Apr-18	20	19
102	cab	30-Apr-18	30-Apr-18	10	1
102	cab	1-May-18	1-May-18	5	1
102	cab	2-May-18	13-May-18	15	12
102	cab	14-May-18	29-May-18	7.5	16
203	cab	21-Jun-18	4-Jul-18	60	14
203	cab	5-Jul-18	19-Jul-18	50	15
203	cab	20-Jul-18	25-Jul-18	40	6
203	cab	26-Jul-18	15-Aug-18	30	21
203	cab	16-Aug-18	12-Sep-18	25	28
203	cab	13-Sep-18	16-Sep-18	30	4
203	cab	17-Sep-18	18-Sep-18	25	2
203	cab	19-Sep-18	27-Sep-18	50	9
;

data temp;
 set have;
 by USUBJID ;
 n=DURATIONONDRUG<7;
 if lag(n)=1 and lag(USUBJID)=USUBJID and n=0 or n=1 then n1=1;
 else n1=0;
run;

data want;
 set temp;
 by USUBJID n1 notsorted;
 retain dt;
 if first.n1 and n1 then do;
  dt=astdt;
  s=0;
  s1=0;
 end;
 s+DURATIONONDRUG;
 s1+dose;
 if last.n1 or n1=0;
  if n1 then do;
   astdt=dt;
   DURATIONONDRUG=s;
   ndays=intck('day',astdt,ENDDT)+1;
   dose=s1/ndays;
  end;
drop n: s: dt;
run;

 

 

kk26
Calcite | Level 5
hi @novinosrin,
this is amazing, it worked. i am trying to understand your logic and i will try and improve my own programming. thank you so much!
kk26
Calcite | Level 5

hi @novinosrin ,

 

data have;
infile cards expandtabs truncover;
input USUBJID TRT $ (ASTDT ENDDT) (:date9.) DOSE DURATIONONDRUG;
format ASTDT ENDDT date9.;
cards;
102 cab 11-Apr-18 29-Apr-18 20 19
102 cab 30-Apr-18 30-Apr-18 10 1
102 cab 1-May-18 1-May-18 5 1
102 cab 2-May-18 13-May-18 15 12
102 cab 14-May-18 29-May-18 7.5 16
203 cab 21-Jun-18 4-Jul-18 60 14
203 cab 5-Jul-18 19-Jul-18 50 15
203 cab 20-Jul-18 25-Jul-18 40 6
203 cab 26-Jul-18 15-Aug-18 30 21
203 cab 16-Aug-18 12-Sep-18 25 28
203 cab 13-Sep-18 16-Sep-18 30 4
203 cab 17-Sep-18 18-Sep-18 25 2
203 cab 19-Sep-18 27-Sep-18 50 9
203 cab 28-Sep-18 15-Oct-18 80 18
203 cab 16-Oct-18 18-Oct-18 80 3
203 cab 19-Oct-18 22-Oct-18 80 4
203 cab 23-Oct-18 24-Oct-18 80 2
203 cab 25-Oct-18 26-Oct-18 50 3
204 can 20-Nov-18 24-Nov-18 35 5
;

 

hi @novinosrin ,

can we add condition so it stops merging to next records once the days reached >=7 count ? 

in here the last two records  for subject 203 has 5 days after adding but that is fine since there is no extra records to merge.

 

forum question - 2.PNG

 

 

novinosrin
Tourmaline | Level 20

Hi @kk26  about to doze off, as i'm very tired. Can i take a look right in the morning(Eastern time US) if it's not too urgent? I'm just knackered. I could barely open my eyes

kk26
Calcite | Level 5
hi @novinosrin ,
whenever you have time! thank you so much!
novinosrin
Tourmaline | Level 20

HI @kk26  First off, It appears the values in your latest HAVE(Before) and WANT(After) are not consistent. For example, the ENDT value in before 26-Oct-18

203 cab 25-Oct-18 26-Oct-18 50 3

 And in the after equivalent it is 27-Oct-18. Also new USSubjid in before 

204 can 20-Nov-18 24-Nov-18 35 5

 and the equivalent in after is 301. Though trivial, I failed to notice until I was wondering how SAS can get basic math wrong. So, please going forward verify the details 🙂

 

Secondly, I have a feeling, AFTER for 

204 can 20-Nov-18 24-Nov-18 35 5

 

should ideally be the average dose 7 in place of 35 to be in consistent with the logic to apply for  the previous i.e.

 

203 cab 23-Oct-18 24-Oct-18 80 2
203 cab 25-Oct-18 27-Oct-18 50 3

resulting in 

203	cab	23OCT2018	27OCT2018	26	5

So assuming the above makes sense, the following solution would meet it

 


data have;
infile cards expandtabs truncover;
input USUBJID TRT $ (ASTDT ENDDT) (:date9.) DOSE DURATIONONDRUG;
format ASTDT ENDDT date9.;
cards;
102 cab 11-Apr-18 29-Apr-18 20 19
102 cab 30-Apr-18 30-Apr-18 10 1
102 cab 1-May-18 1-May-18 5 1
102 cab 2-May-18 13-May-18 15 12
102 cab 14-May-18 29-May-18 7.5 16
203 cab 21-Jun-18 4-Jul-18 60 14
203 cab 5-Jul-18 19-Jul-18 50 15
203 cab 20-Jul-18 25-Jul-18 40 6
203 cab 26-Jul-18 15-Aug-18 30 21
203 cab 16-Aug-18 12-Sep-18 25 28
203 cab 13-Sep-18 16-Sep-18 30 4
203 cab 17-Sep-18 18-Sep-18 25 2
203 cab 19-Sep-18 27-Sep-18 50 9
203 cab 28-Sep-18 15-Oct-18 80 18
203 cab 16-Oct-18 18-Oct-18 80 3
203 cab 19-Oct-18 22-Oct-18 80 4
203 cab 23-Oct-18 24-Oct-18 80 2
203 cab 25-Oct-18 27-Oct-18 50 3
204 can 20-Nov-18 24-Nov-18 35 5
;


data want;
 do until(last.USUBJID);
  set have;
  by USUBJID;
  if DURATIONONDRUG<7  then do;
   if dt=. then   dt=astdt;
   s=sum(s,DURATIONONDRUG);
   s1=sum(s1,dose);
   if s>=7 or last.USUBJID then do;
    astdt=dt;
    ndays=intck('day',astdt,ENDDT)+1;
    dose=s1/ndays;
	DURATIONONDRUG=s;
	output;
	call missing(of s:,dt);
   end;
   continue;
  end;
  else if s and dt then do;
   s=sum(s,DURATIONONDRUG);
   s1=sum(s1,dose);
   astdt=dt;
   DURATIONONDRUG=s;
   ndays=intck('day',astdt,ENDDT)+1;
   dose=s1/ndays;
   call missing(of s:,dt);
  end;
  output;
 end;
 drop s: dt ndays;
run;

Best Regards!

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1280 views
  • 1 like
  • 4 in conversation