I have the following dataset.
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
I want to split the data by start date/ end date and group treatment based on the date range.
Expected output are as follows.
State City           Stdt          endt          treatment
NY     NeyYork   1/1/2010 3/1/2010 Dialysis
MD    Baltimore 1/1/2011 1/31/2011 Dialysis
MD    Baltimore 2/1/2011 3/1/2011 Dialysis/Heart/Knee
MD    Baltimore 3/2/2011 5/1/2011 Dialysis/Heart/Knee/Urine
MD    Baltimore 5/2/2011 8/1/2011 Dialysis/Heart/Urine
MD    Baltimore 8/2/2011 12/1/2011 Dialysis/Heart/Urine
MD    Baltimore 12/2/2011 1/14/2015 Dialysis/Urine
MD    Baltimore 1/15/2015 2/14/2015 Urine
The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.
Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.
if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
data temp;
 set have;
 do date=Stdt to endt;
  output;
 end;
 format date mmddyy10.;
 drop Stdt endt;
run;
proc sort data=temp nodupkey;by  State  city date Treatment;run;
data temp2;
 do until(last.date);
   set temp;
   by  State  city date;
   length Treatments $ 100;
   Treatments=catx('/',Treatments,Treatment);
 end;
 drop Treatment;
run;
data want;
 set temp2;
 by State  city Treatments notsorted;
 retain Stdt ;
 if first.Treatments then Stdt=date;
 if last.Treatments then do;endt=date;output;end;
 format Stdt endt mmddyy10.;
 drop date;
run;
Please explain the logic here. What are the steps that take you from the input data to the output data?
I have added the logic on main post.
The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.
Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.
if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
data temp;
 set have;
 do date=Stdt to endt;
  output;
 end;
 format date mmddyy10.;
 drop Stdt endt;
run;
proc sort data=temp nodupkey;by  State  city date Treatment;run;
data temp2;
 do until(last.date);
   set temp;
   by  State  city date;
   length Treatments $ 100;
   Treatments=catx('/',Treatments,Treatment);
 end;
 drop Treatment;
run;
data want;
 set temp2;
 by State  city Treatments notsorted;
 retain Stdt ;
 if first.Treatments then Stdt=date;
 if last.Treatments then do;endt=date;output;end;
 format Stdt endt mmddyy10.;
 drop date;
run;
Thank you for the solution.
The solution is fine for the overlap dates. If we have dates without overlap like for New York, we got the start date as 01/01/2010 and the end date as 03/01/2011. But we do not have any program from 03/02/2010 to 12/31/2010 and there should be a separate line for this.
I have added one more line for New York on the previous dataset.
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
NY NeyYork 01/01/2011 03/01/2011 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
I want to split the data by start date/ end date and group treatment based on the date range.
Expected output are as follows.
State City Stdt endt treatment
NY NeyYork 1/1/2010 3/1/2010 Dialysis
NY NeyYork 1/1/2011 3/1/2011 Dialysis
MD Baltimore 1/1/2011 1/31/2011 Dialysis
MD Baltimore 2/1/2011 3/1/2011 Dialysis/Heart/Knee
MD Baltimore 3/2/2011 5/1/2011 Dialysis/Heart/Knee/Urine
MD Baltimore 5/2/2011 12/1/2011 Dialysis/Heart/Urine
MD Baltimore 12/2/2011 1/14/2015 Dialysis/Urine
MD Baltimore 1/15/2015 2/14/2015 Urine
The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.
Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.
if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.
OK. I did not take into account of this scenario .
Try this one :
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
NY NeyYork 01/01/2011 03/01/2011 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
data temp;
 set have;
 do date=Stdt to endt;
  output;
 end;
 format date mmddyy10.;
 drop Stdt endt;
run;
proc sort data=temp nodupkey;by  State  city date Treatment;run;
data temp2;
 do until(last.date);
   set temp;
   by  State  city date;
   length Treatments $ 100;
   Treatments=catx('/',Treatments,Treatment);
 end;
 drop Treatment;
run;
data temp3;
 set temp2;
 by State  city Treatments notsorted;
 if first.Treatments or dif(date) ne 1 then group+1;
run;
data want;
 set temp3;
 by group;
 retain Stdt ;
 if first.group then Stdt=date;
 if last.group then do;endt=date;output;end;
 format Stdt endt mmddyy10.;
 drop date;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.
Find more tutorials on the SAS Users YouTube channel.
