BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CydneyLB
Obsidian | Level 7

Hi all,

 

I have a data set as follows:

 

ID  Date1 Dose1  Date2 Dose2  Date3 Dose3  Date4 Dose4

 

01  Jun09   15     Jun09    12      Sep09    17      Jan10   20

02  Mar05   11     Nov05   18      Nov05    15

03  Jan10   13     

04  Apr11    16     Aug12    17

 

 

 

I need to find a total dose for each ID, but if the date is the same then only the highest dose should be counted, i.e.

 

ID  Date1 Dose1  Date2 Dose2  Date3 Dose3  Date4 Dose4              Total Dose

 

01  Jun09   15     Jun09    12      Sep09    17      Jan10   20                       52

02  Mar05   11     Nov05   15      Nov05    18                                              29

03  Jan10   13                                                                                             13

04  Apr11    16     Aug12    17                                                                     33

 

 

So far I have tried using arrays to create an indicator for whether the dates are the same, and then wanted to use this to find the maximum value but i haven't been table to figure out how to find a maximum of a subset of the data.

 

Any help would be appreciate!

 

Thanks

Cydney

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Transpose to long, as it is much easier to find maximum/minimum/sum etc:

data have;
infile datalines truncover;
input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
format date: yymmn6.;
datalines;
01  Jun09   15     Jun09    12      Sep09    17      Jan10   20
02  Mar05   11     Nov05   18      Nov05    15
03  Jan10   13     
04  Apr11    16     Aug12    17
;

proc transpose data=have out=trans1;
by id;
var date: dose:;
run;

data trans2;
set trans1;
where col1 ne .;
if substr(_name_,1,4) = "Date"
then date = col1;
else dose = col1;
event = input(compress(_name_,'','kd'),best.);
format date yymmn6.;
run;

data trans;
merge
  trans2 (keep=id event date where=(date ne .))
  trans2 (keep=id event dose where=(dose ne .))
;
by id event;
drop event;
run;

proc sort data=trans;
by id date dose;
run;

data want;
set trans;
by id date;
if last.date;
run;

proc report data=want;
column id date dose;
define id /group;
define date /display;
define dose /analysis sum;
break after id/ summarize;
run;

Result:

 ID    date       dose
  1  200906         15
     200909         17
     201001         20
  1                 52
  2  200503         11
     200511         18
  2                 29
  3  201001         13
  3                 13
  4  201104         16
     201208         17
  4                 33

View solution in original post

15 REPLIES 15
PeterClemmensen
Tourmaline | Level 20

Are the date variables numeric or character?

CydneyLB
Obsidian | Level 7

They are numeric

PeterClemmensen
Tourmaline | Level 20

So Jun09 represents 01Jun2009, correct?

CydneyLB
Obsidian | Level 7

Yes it does sorry i didn't include the days because they only need to be the same month not the same day, but yes you could say they are all on the 1st

Kurt_Bremser
Super User

Transpose to long, as it is much easier to find maximum/minimum/sum etc:

data have;
infile datalines truncover;
input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
format date: yymmn6.;
datalines;
01  Jun09   15     Jun09    12      Sep09    17      Jan10   20
02  Mar05   11     Nov05   18      Nov05    15
03  Jan10   13     
04  Apr11    16     Aug12    17
;

proc transpose data=have out=trans1;
by id;
var date: dose:;
run;

data trans2;
set trans1;
where col1 ne .;
if substr(_name_,1,4) = "Date"
then date = col1;
else dose = col1;
event = input(compress(_name_,'','kd'),best.);
format date yymmn6.;
run;

data trans;
merge
  trans2 (keep=id event date where=(date ne .))
  trans2 (keep=id event dose where=(dose ne .))
;
by id event;
drop event;
run;

proc sort data=trans;
by id date dose;
run;

data want;
set trans;
by id date;
if last.date;
run;

proc report data=want;
column id date dose;
define id /group;
define date /display;
define dose /analysis sum;
break after id/ summarize;
run;

Result:

 ID    date       dose
  1  200906         15
     200909         17
     201001         20
  1                 52
  2  200503         11
     200511         18
  2                 29
  3  201001         13
  3                 13
  4  201104         16
     201208         17
  4                 33
CydneyLB
Obsidian | Level 7

Thank you very much Kurt, with a bit of moving around this worked perfectly!

Astounding
PROC Star

It should be easy enough to walk through a set of two arrays:

 

data want;
   set have;
   array date {4};
   array dose {4};
   total_dose = dose1;
   do k=2 to 4;
      if date{k} ne date{k-1} then total_dose + dose{k};
      else do;
         if dose{k} > dose{k-1} then total_dose + (dose{k} - dose{k-1});
      end;
   end;
run;

Note that this does not necessarily address what happens when there are three doses on the same date.  That can be done, but requires a little more work.

 

Also note, this is untested code.  So you need to test it out and report back whether it is successful.

CydneyLB
Obsidian | Level 7

Hi Astounding, thank you for the solution.

 

This is super easy and works really well for most of the observations! However there are a few people who have more than two doses on the same day (the maximum is 4). How would you go about extending this for multiple doses on the same day?

 

Thanks

Cydney

 

 

art297
Opal | Level 21

@CydneyLB: You already have a proc report solution that apparently meets your need, but here is a single datastep solution that I think is easier to modify and creates the Total_Dose variable in the dataset. It uses the qsort macro that Paul Dorfman proposed some years ago:

data have;
  infile datalines truncover;
  input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
  format date: yymmn6.;
  datalines;
01  Jun09   12     Jun09    15      Sep09    17      Jan10   20
02  Mar05   11     Nov05   15       Nov05    18
03  Jan10   13     Jan10   10       Jan10    12
04  Apr11    16     Aug12    17
;
run;

filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas';
%include qs ;


data want (drop=t_:);
  set have;
  array t_dates(4);
  array t_doses(4);
  array dates(4) date:;
  array doses(4) dose:;
  do t_i=1 to 4;
    t_dates(t_i)=dates(t_i);
    t_doses(t_i)=doses(t_i);
  end;
  %Qsort (Arr=t_dates t_doses, By=t_doses,seq=d);
  %Qsort (Arr=t_dates t_doses, By=t_dates);
  
  do t_i=2 to 4;
    if t_dates(t_i) eq t_dates(t_i-1) then call missing(t_doses(t_i));
  end;
  Total_Dose=sum(of t_doses(*));
run;

Art, CEO, AnalystFinder.com

 

 

CydneyLB
Obsidian | Level 7

Hi Art,

 

Thank you for responding another good solution. This works great for looking if the doses were on the same day, however if i edit it to look at doses within 60 days of  each other I cant seem to get the code to work?

 

Thanks

Cydney 

art297
Opal | Level 21

@CydneyLB : Please provide an example dataset that shows a case where the code doesn't work and the result you expect from that data.

 

Art, CEO, AnalystFinder.com

 

CydneyLB
Obsidian | Level 7

@art297 :

 

Using an example very similar to the one above:

 

 

ID  Date1 Dose1  Date2 Dose2  Date3 Dose3  Date4 Dose4

 

01  Jun09   15     Jun09    12      Sep09    17      Jan10   20

02  Mar05   11     Nov05   15     Dec05    18

03  Jan10   13     

04  Apr11    16     Aug12    17

 

 

I need to find a total dose for each ID, but if the date is within 60 days of another dose then only the highest dose should be counted, i.e.

 

ID  Date1 Dose1  Date2 Dose2  Date3 Dose3  Date4 Dose4              Total Dose

 

01  Jun09   15     Jun09    12      Sep09    17      Jan10   20                       52

02  Mar05   11     Nov05   15      Dec05    18                                              29

03  Jan10   13                                                                                             13

04  Apr11    16     Aug12    17                                                                     33

 

 

 

Using your code I tried to adjust the condition as follows:

 

data have;
  infile datalines truncover;
  input ID Date1 :monyy5. Dose1 Date2 :monyy5. Dose2 Date3 :monyy5. Dose3 Date4 :monyy5. Dose4;
  format date: yymmn6.;
  datalines;
01  Jun09   12     Jun09    15      Sep09    17      Jan10   20
02  Mar05   11     Nov05   15       Nov05    18
03  Jan10   13     Jan10   10       Jan10    12
04  Apr11    16     Aug12    17
;
run;

filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas';
%include qs ;


data want (drop=t_:);
  set have;
  array t_dates(4);
  array t_doses(4);
  array dates(4) date:;
  array doses(4) dose:;
  do t_i=1 to 4;
    t_dates(t_i)=dates(t_i);
    t_doses(t_i)=doses(t_i);
  end;
  %Qsort (Arr=t_dates t_doses, By=t_doses,seq=d);
  %Qsort (Arr=t_dates t_doses, By=t_dates);
  
  do t_i=2 to 4;
    if t_dates(t_i) - t_dates(t_i-1) < 60 then call missing(t_doses(t_i));
  end;
  Total_Dose=sum(of t_doses(*));
run;

However it gave the results:

 

ID  Date1 Dose1  Date2 Dose2  Date3 Dose3  Date4 Dose4              Total Dose

 

01  Jun09   15     Jun09    12      Sep09    17      Jan10   20                       52

02  Mar05   11     Nov05   15      Dec05    18                                              18

03  Jan10   13                                                                                              .

04  Apr11    16     Aug12    17                                                                     17

 

 

Thanks

Cydney

art297
Opal | Level 21

@CydneyLB : What totals would you want given the following two records:

01  Jun09   15     Jul09    12    Aug09    17      Sep09    20                       ?
01  Dec18   15     Feb19    12    Apr19    17      Jun19    20                       ?

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

@CydneyLB: You didn't answer my question but (I think), regardless of your answer, here is code that I think accomplishes your latest modification of the criteria. It incorporates both one %qsort as well as traversing the arrays with two do loops:

data have;
  infile datalines truncover;
  input ID Date1 :date9. Dose1 Date2 :date9. Dose2 Date3 :date9. Dose3 Date4 :date9. Dose4;
  format date: date9.;
  datalines;
01  1Jun2009   15     1Jun2009    12      1Sep2009    17    1Jan2010    20
02  5Mar2005   11     4Nov2005    15      20Dec2005   18
03  16Jan2010  13
04  14Apr2011  16     12Aug2012   17
05  1Jun2009   15     1Jul2009    12    1Aug2009    17      1Sep2009    20
06  1Dec2018   15     1Feb2019    17    1Apr2019    16      1Jun2019    20
;
run;

filename qs url 'https://raw.githubusercontent.com/art297/qsort/master/qsort.sas';
%include qs ;


data want (drop=t_:);
  set have;
  array t_dates(4);
  array t_doses(4);
  array dates(4) date:;
  array doses(4) dose:;
  do t_i=1 to 4;
    t_dates(t_i)=dates(t_i);
    t_doses(t_i)=doses(t_i);
  end;
  %Qsort (Arr=t_dates t_doses, By=t_doses,seq=d);
  
  do t_i=1 to 3;
    do t_j=t_i+1 to 4;
      if abs(intck('days', t_dates(t_i), t_dates(t_j), 'continuous')) le 60 then call missing(t_doses(t_j));
   end;
  end;
  Total_Dose=sum(of t_doses(*));
run;

Art, CEO, AnalystFinder.com

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1176 views
  • 5 likes
  • 5 in conversation