Hi all,
I have a data set as follows:
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20
02 Mar05 11 Nov05 18 Nov05 15
03 Jan10 13
04 Apr11 16 Aug12 17
I need to find a total dose for each ID, but if the date is the same then only the highest dose should be counted, i.e.
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4 Total Dose
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20 52
02 Mar05 11 Nov05 15 Nov05 18 29
03 Jan10 13 13
04 Apr11 16 Aug12 17 33
So far I have tried using arrays to create an indicator for whether the dates are the same, and then wanted to use this to find the maximum value but i haven't been table to figure out how to find a maximum of a subset of the data.
Any help would be appreciate!
Thanks
Cydney
Transpose to long, as it is much easier to find maximum/minimum/sum etc:
data have;
infile datalines truncover;
input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
format date: yymmn6.;
datalines;
01  Jun09   15     Jun09    12      Sep09    17      Jan10   20
02  Mar05   11     Nov05   18      Nov05    15
03  Jan10   13     
04  Apr11    16     Aug12    17
;
proc transpose data=have out=trans1;
by id;
var date: dose:;
run;
data trans2;
set trans1;
where col1 ne .;
if substr(_name_,1,4) = "Date"
then date = col1;
else dose = col1;
event = input(compress(_name_,'','kd'),best.);
format date yymmn6.;
run;
data trans;
merge
  trans2 (keep=id event date where=(date ne .))
  trans2 (keep=id event dose where=(dose ne .))
;
by id event;
drop event;
run;
proc sort data=trans;
by id date dose;
run;
data want;
set trans;
by id date;
if last.date;
run;
proc report data=want;
column id date dose;
define id /group;
define date /display;
define dose /analysis sum;
break after id/ summarize;
run;
Result:
 ID    date       dose
  1  200906         15
     200909         17
     201001         20
  1                 52
  2  200503         11
     200511         18
  2                 29
  3  201001         13
  3                 13
  4  201104         16
     201208         17
  4                 33
					
				
			
			
				
			
			
			
				
			
			
			
			
			
		Are the date variables numeric or character?
They are numeric
So Jun09 represents 01Jun2009, correct?
Yes it does sorry i didn't include the days because they only need to be the same month not the same day, but yes you could say they are all on the 1st
Transpose to long, as it is much easier to find maximum/minimum/sum etc:
data have;
infile datalines truncover;
input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
format date: yymmn6.;
datalines;
01  Jun09   15     Jun09    12      Sep09    17      Jan10   20
02  Mar05   11     Nov05   18      Nov05    15
03  Jan10   13     
04  Apr11    16     Aug12    17
;
proc transpose data=have out=trans1;
by id;
var date: dose:;
run;
data trans2;
set trans1;
where col1 ne .;
if substr(_name_,1,4) = "Date"
then date = col1;
else dose = col1;
event = input(compress(_name_,'','kd'),best.);
format date yymmn6.;
run;
data trans;
merge
  trans2 (keep=id event date where=(date ne .))
  trans2 (keep=id event dose where=(dose ne .))
;
by id event;
drop event;
run;
proc sort data=trans;
by id date dose;
run;
data want;
set trans;
by id date;
if last.date;
run;
proc report data=want;
column id date dose;
define id /group;
define date /display;
define dose /analysis sum;
break after id/ summarize;
run;
Result:
 ID    date       dose
  1  200906         15
     200909         17
     201001         20
  1                 52
  2  200503         11
     200511         18
  2                 29
  3  201001         13
  3                 13
  4  201104         16
     201208         17
  4                 33
					
				
			
			
				
			
			
			
			
			
			
			
		Thank you very much Kurt, with a bit of moving around this worked perfectly!
It should be easy enough to walk through a set of two arrays:
data want;
   set have;
   array date {4};
   array dose {4};
   total_dose = dose1;
   do k=2 to 4;
      if date{k} ne date{k-1} then total_dose + dose{k};
      else do;
         if dose{k} > dose{k-1} then total_dose + (dose{k} - dose{k-1});
      end;
   end;
run;
Note that this does not necessarily address what happens when there are three doses on the same date. That can be done, but requires a little more work.
Also note, this is untested code. So you need to test it out and report back whether it is successful.
Hi Astounding, thank you for the solution.
This is super easy and works really well for most of the observations! However there are a few people who have more than two doses on the same day (the maximum is 4). How would you go about extending this for multiple doses on the same day?
Thanks
Cydney
@CydneyLB: You already have a proc report solution that apparently meets your need, but here is a single datastep solution that I think is easier to modify and creates the Total_Dose variable in the dataset. It uses the qsort macro that Paul Dorfman proposed some years ago:
data have;
  infile datalines truncover;
  input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
  format date: yymmn6.;
  datalines;
01  Jun09   12     Jun09    15      Sep09    17      Jan10   20
02  Mar05   11     Nov05   15       Nov05    18
03  Jan10   13     Jan10   10       Jan10    12
04  Apr11    16     Aug12    17
;
run;
filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas';
%include qs ;
data want (drop=t_:);
  set have;
  array t_dates(4);
  array t_doses(4);
  array dates(4) date:;
  array doses(4) dose:;
  do t_i=1 to 4;
    t_dates(t_i)=dates(t_i);
    t_doses(t_i)=doses(t_i);
  end;
  %Qsort (Arr=t_dates t_doses, By=t_doses,seq=d);
  %Qsort (Arr=t_dates t_doses, By=t_dates);
  
  do t_i=2 to 4;
    if t_dates(t_i) eq t_dates(t_i-1) then call missing(t_doses(t_i));
  end;
  Total_Dose=sum(of t_doses(*));
run;
Art, CEO, AnalystFinder.com
Hi Art,
Thank you for responding another good solution. This works great for looking if the doses were on the same day, however if i edit it to look at doses within 60 days of each other I cant seem to get the code to work?
Thanks
Cydney
@CydneyLB : Please provide an example dataset that shows a case where the code doesn't work and the result you expect from that data.
Art, CEO, AnalystFinder.com
@art297 :
Using an example very similar to the one above:
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20
02 Mar05 11 Nov05 15 Dec05 18
03 Jan10 13
04 Apr11 16 Aug12 17
I need to find a total dose for each ID, but if the date is within 60 days of another dose then only the highest dose should be counted, i.e.
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4 Total Dose
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20 52
02 Mar05 11 Nov05 15 Dec05 18 29
03 Jan10 13 13
04 Apr11 16 Aug12 17 33
Using your code I tried to adjust the condition as follows:
data have;
  infile datalines truncover;
  input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
  format date: yymmn6.;
  datalines;
01  Jun09   12     Jun09    15      Sep09    17      Jan10   20
02  Mar05   11     Nov05   15       Nov05    18
03  Jan10   13     Jan10   10       Jan10    12
04  Apr11    16     Aug12    17
;
run;
filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas';
%include qs ;
data want (drop=t_:);
  set have;
  array t_dates(4);
  array t_doses(4);
  array dates(4) date:;
  array doses(4) dose:;
  do t_i=1 to 4;
    t_dates(t_i)=dates(t_i);
    t_doses(t_i)=doses(t_i);
  end;
  %Qsort (Arr=t_dates t_doses, By=t_doses,seq=d);
  %Qsort (Arr=t_dates t_doses, By=t_dates);
  
  do t_i=2 to 4;
    if t_dates(t_i) - t_dates(t_i-1) < 60 then call missing(t_doses(t_i));
  end;
  Total_Dose=sum(of t_doses(*));
run;However it gave the results:
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4 Total Dose
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20 52
02 Mar05 11 Nov05 15 Dec05 18 18
03 Jan10 13 .
04 Apr11 16 Aug12 17 17
Thanks
Cydney
@CydneyLB : What totals would you want given the following two records:
01  Jun09   15     Jul09    12    Aug09    17      Sep09    20                       ?
01  Dec18   15     Feb19    12    Apr19    17      Jun19    20                       ?
Art, CEO, AnalystFinder.com
@CydneyLB: You didn't answer my question but (I think), regardless of your answer, here is code that I think accomplishes your latest modification of the criteria. It incorporates both one %qsort as well as traversing the arrays with two do loops:
data have;
  infile datalines truncover;
  input ID Date1 :date9. Dose1 Date2 :date9. Dose2 Date3 :date9. Dose3 Date4 :date9. Dose4;
  format date: date9.;
  datalines;
01  1Jun2009   15     1Jun2009    12      1Sep2009    17    1Jan2010    20
02  5Mar2005   11     4Nov2005    15      20Dec2005   18
03  16Jan2010  13
04  14Apr2011  16     12Aug2012   17
05  1Jun2009   15     1Jul2009    12    1Aug2009    17      1Sep2009    20
06  1Dec2018   15     1Feb2019    17    1Apr2019    16      1Jun2019    20
;
run;
filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas';
%include qs ;
data want (drop=t_:);
  set have;
  array t_dates(4);
  array t_doses(4);
  array dates(4) date:;
  array doses(4) dose:;
  do t_i=1 to 4;
    t_dates(t_i)=dates(t_i);
    t_doses(t_i)=doses(t_i);
  end;
  %Qsort (Arr=t_dates t_doses, By=t_doses,seq=d);
  
  do t_i=1 to 3;
    do t_j=t_i+1 to 4;
      if abs(intck('days', t_dates(t_i), t_dates(t_j), 'continuous')) le 60 then call missing(t_doses(t_j));
   end;
  end;
  Total_Dose=sum(of t_doses(*));
run;
Art, CEO, AnalystFinder.com
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.