Hi all,
I have a data set as follows:
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20
02 Mar05 11 Nov05 18 Nov05 15
03 Jan10 13
04 Apr11 16 Aug12 17
I need to find a total dose for each ID, but if the date is the same then only the highest dose should be counted, i.e.
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4 Total Dose
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20 52
02 Mar05 11 Nov05 15 Nov05 18 29
03 Jan10 13 13
04 Apr11 16 Aug12 17 33
So far I have tried using arrays to create an indicator for whether the dates are the same, and then wanted to use this to find the maximum value but i haven't been table to figure out how to find a maximum of a subset of the data.
Any help would be appreciate!
Thanks
Cydney
Transpose to long, as it is much easier to find maximum/minimum/sum etc:
data have;
infile datalines truncover;
input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
format date: yymmn6.;
datalines;
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20
02 Mar05 11 Nov05 18 Nov05 15
03 Jan10 13
04 Apr11 16 Aug12 17
;
proc transpose data=have out=trans1;
by id;
var date: dose:;
run;
data trans2;
set trans1;
where col1 ne .;
if substr(_name_,1,4) = "Date"
then date = col1;
else dose = col1;
event = input(compress(_name_,'','kd'),best.);
format date yymmn6.;
run;
data trans;
merge
trans2 (keep=id event date where=(date ne .))
trans2 (keep=id event dose where=(dose ne .))
;
by id event;
drop event;
run;
proc sort data=trans;
by id date dose;
run;
data want;
set trans;
by id date;
if last.date;
run;
proc report data=want;
column id date dose;
define id /group;
define date /display;
define dose /analysis sum;
break after id/ summarize;
run;
Result:
ID date dose 1 200906 15 200909 17 201001 20 1 52 2 200503 11 200511 18 2 29 3 201001 13 3 13 4 201104 16 201208 17 4 33
Are the date variables numeric or character?
They are numeric
So Jun09 represents 01Jun2009, correct?
Yes it does sorry i didn't include the days because they only need to be the same month not the same day, but yes you could say they are all on the 1st
Transpose to long, as it is much easier to find maximum/minimum/sum etc:
data have;
infile datalines truncover;
input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
format date: yymmn6.;
datalines;
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20
02 Mar05 11 Nov05 18 Nov05 15
03 Jan10 13
04 Apr11 16 Aug12 17
;
proc transpose data=have out=trans1;
by id;
var date: dose:;
run;
data trans2;
set trans1;
where col1 ne .;
if substr(_name_,1,4) = "Date"
then date = col1;
else dose = col1;
event = input(compress(_name_,'','kd'),best.);
format date yymmn6.;
run;
data trans;
merge
trans2 (keep=id event date where=(date ne .))
trans2 (keep=id event dose where=(dose ne .))
;
by id event;
drop event;
run;
proc sort data=trans;
by id date dose;
run;
data want;
set trans;
by id date;
if last.date;
run;
proc report data=want;
column id date dose;
define id /group;
define date /display;
define dose /analysis sum;
break after id/ summarize;
run;
Result:
ID date dose 1 200906 15 200909 17 201001 20 1 52 2 200503 11 200511 18 2 29 3 201001 13 3 13 4 201104 16 201208 17 4 33
Thank you very much Kurt, with a bit of moving around this worked perfectly!
It should be easy enough to walk through a set of two arrays:
data want;
set have;
array date {4};
array dose {4};
total_dose = dose1;
do k=2 to 4;
if date{k} ne date{k-1} then total_dose + dose{k};
else do;
if dose{k} > dose{k-1} then total_dose + (dose{k} - dose{k-1});
end;
end;
run;
Note that this does not necessarily address what happens when there are three doses on the same date. That can be done, but requires a little more work.
Also note, this is untested code. So you need to test it out and report back whether it is successful.
Hi Astounding, thank you for the solution.
This is super easy and works really well for most of the observations! However there are a few people who have more than two doses on the same day (the maximum is 4). How would you go about extending this for multiple doses on the same day?
Thanks
Cydney
@CydneyLB: You already have a proc report solution that apparently meets your need, but here is a single datastep solution that I think is easier to modify and creates the Total_Dose variable in the dataset. It uses the qsort macro that Paul Dorfman proposed some years ago:
data have; infile datalines truncover; input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4; format date: yymmn6.; datalines; 01 Jun09 12 Jun09 15 Sep09 17 Jan10 20 02 Mar05 11 Nov05 15 Nov05 18 03 Jan10 13 Jan10 10 Jan10 12 04 Apr11 16 Aug12 17 ; run; filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas'; %include qs ; data want (drop=t_:); set have; array t_dates(4); array t_doses(4); array dates(4) date:; array doses(4) dose:; do t_i=1 to 4; t_dates(t_i)=dates(t_i); t_doses(t_i)=doses(t_i); end; %Qsort (Arr=t_dates t_doses, By=t_doses,seq=d); %Qsort (Arr=t_dates t_doses, By=t_dates); do t_i=2 to 4; if t_dates(t_i) eq t_dates(t_i-1) then call missing(t_doses(t_i)); end; Total_Dose=sum(of t_doses(*)); run;
Art, CEO, AnalystFinder.com
Hi Art,
Thank you for responding another good solution. This works great for looking if the doses were on the same day, however if i edit it to look at doses within 60 days of each other I cant seem to get the code to work?
Thanks
Cydney
@CydneyLB : Please provide an example dataset that shows a case where the code doesn't work and the result you expect from that data.
Art, CEO, AnalystFinder.com
@art297 :
Using an example very similar to the one above:
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20
02 Mar05 11 Nov05 15 Dec05 18
03 Jan10 13
04 Apr11 16 Aug12 17
I need to find a total dose for each ID, but if the date is within 60 days of another dose then only the highest dose should be counted, i.e.
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4 Total Dose
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20 52
02 Mar05 11 Nov05 15 Dec05 18 29
03 Jan10 13 13
04 Apr11 16 Aug12 17 33
Using your code I tried to adjust the condition as follows:
data have;
infile datalines truncover;
input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
format date: yymmn6.;
datalines;
01 Jun09 12 Jun09 15 Sep09 17 Jan10 20
02 Mar05 11 Nov05 15 Nov05 18
03 Jan10 13 Jan10 10 Jan10 12
04 Apr11 16 Aug12 17
;
run;
filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas';
%include qs ;
data want (drop=t_:);
set have;
array t_dates(4);
array t_doses(4);
array dates(4) date:;
array doses(4) dose:;
do t_i=1 to 4;
t_dates(t_i)=dates(t_i);
t_doses(t_i)=doses(t_i);
end;
%Qsort (Arr=t_dates t_doses, By=t_doses,seq=d);
%Qsort (Arr=t_dates t_doses, By=t_dates);
do t_i=2 to 4;
if t_dates(t_i) - t_dates(t_i-1) < 60 then call missing(t_doses(t_i));
end;
Total_Dose=sum(of t_doses(*));
run;
However it gave the results:
ID Date1 Dose1 Date2 Dose2 Date3 Dose3 Date4 Dose4 Total Dose
01 Jun09 15 Jun09 12 Sep09 17 Jan10 20 52
02 Mar05 11 Nov05 15 Dec05 18 18
03 Jan10 13 .
04 Apr11 16 Aug12 17 17
Thanks
Cydney
@CydneyLB : What totals would you want given the following two records:
01 Jun09 15 Jul09 12 Aug09 17 Sep09 20 ?
01 Dec18 15 Feb19 12 Apr19 17 Jun19 20 ?
Art, CEO, AnalystFinder.com
@CydneyLB: You didn't answer my question but (I think), regardless of your answer, here is code that I think accomplishes your latest modification of the criteria. It incorporates both one %qsort as well as traversing the arrays with two do loops:
data have;
infile datalines truncover;
input ID Date1 :date9. Dose1 Date2 :date9. Dose2 Date3 :date9. Dose3 Date4 :date9. Dose4;
format date: date9.;
datalines;
01 1Jun2009 15 1Jun2009 12 1Sep2009 17 1Jan2010 20
02 5Mar2005 11 4Nov2005 15 20Dec2005 18
03 16Jan2010 13
04 14Apr2011 16 12Aug2012 17
05 1Jun2009 15 1Jul2009 12 1Aug2009 17 1Sep2009 20
06 1Dec2018 15 1Feb2019 17 1Apr2019 16 1Jun2019 20
;
run;
filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas';
%include qs ;
data want (drop=t_:);
set have;
array t_dates(4);
array t_doses(4);
array dates(4) date:;
array doses(4) dose:;
do t_i=1 to 4;
t_dates(t_i)=dates(t_i);
t_doses(t_i)=doses(t_i);
end;
%Qsort (Arr=t_dates t_doses, By=t_doses,seq=d);
do t_i=1 to 3;
do t_j=t_i+1 to 4;
if abs(intck('days', t_dates(t_i), t_dates(t_j), 'continuous')) le 60 then call missing(t_doses(t_j));
end;
end;
Total_Dose=sum(of t_doses(*));
run;
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.