I have the following dataset.
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
I want to split the data by start date/ end date and group treatment based on the date range.
Expected output are as follows.
State City Stdt endt treatment
NY NeyYork 1/1/2010 3/1/2010 Dialysis
MD Baltimore 1/1/2011 1/31/2011 Dialysis
MD Baltimore 2/1/2011 3/1/2011 Dialysis/Heart/Knee
MD Baltimore 3/2/2011 5/1/2011 Dialysis/Heart/Knee/Urine
MD Baltimore 5/2/2011 8/1/2011 Dialysis/Heart/Urine
MD Baltimore 8/2/2011 12/1/2011 Dialysis/Heart/Urine
MD Baltimore 12/2/2011 1/14/2015 Dialysis/Urine
MD Baltimore 1/15/2015 2/14/2015 Urine
The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.
Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.
if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
data temp;
set have;
do date=Stdt to endt;
output;
end;
format date mmddyy10.;
drop Stdt endt;
run;
proc sort data=temp nodupkey;by State city date Treatment;run;
data temp2;
do until(last.date);
set temp;
by State city date;
length Treatments $ 100;
Treatments=catx('/',Treatments,Treatment);
end;
drop Treatment;
run;
data want;
set temp2;
by State city Treatments notsorted;
retain Stdt ;
if first.Treatments then Stdt=date;
if last.Treatments then do;endt=date;output;end;
format Stdt endt mmddyy10.;
drop date;
run;
Please explain the logic here. What are the steps that take you from the input data to the output data?
I have added the logic on main post.
The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.
Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.
if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
data temp;
set have;
do date=Stdt to endt;
output;
end;
format date mmddyy10.;
drop Stdt endt;
run;
proc sort data=temp nodupkey;by State city date Treatment;run;
data temp2;
do until(last.date);
set temp;
by State city date;
length Treatments $ 100;
Treatments=catx('/',Treatments,Treatment);
end;
drop Treatment;
run;
data want;
set temp2;
by State city Treatments notsorted;
retain Stdt ;
if first.Treatments then Stdt=date;
if last.Treatments then do;endt=date;output;end;
format Stdt endt mmddyy10.;
drop date;
run;
Thank you for the solution.
The solution is fine for the overlap dates. If we have dates without overlap like for New York, we got the start date as 01/01/2010 and the end date as 03/01/2011. But we do not have any program from 03/02/2010 to 12/31/2010 and there should be a separate line for this.
I have added one more line for New York on the previous dataset.
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
NY NeyYork 01/01/2011 03/01/2011 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
I want to split the data by start date/ end date and group treatment based on the date range.
Expected output are as follows.
State City Stdt endt treatment
NY NeyYork 1/1/2010 3/1/2010 Dialysis
NY NeyYork 1/1/2011 3/1/2011 Dialysis
MD Baltimore 1/1/2011 1/31/2011 Dialysis
MD Baltimore 2/1/2011 3/1/2011 Dialysis/Heart/Knee
MD Baltimore 3/2/2011 5/1/2011 Dialysis/Heart/Knee/Urine
MD Baltimore 5/2/2011 12/1/2011 Dialysis/Heart/Urine
MD Baltimore 12/2/2011 1/14/2015 Dialysis/Urine
MD Baltimore 1/15/2015 2/14/2015 Urine
The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.
Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.
if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.
OK. I did not take into account of this scenario .
Try this one :
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
NY NeyYork 01/01/2011 03/01/2011 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;
data temp;
set have;
do date=Stdt to endt;
output;
end;
format date mmddyy10.;
drop Stdt endt;
run;
proc sort data=temp nodupkey;by State city date Treatment;run;
data temp2;
do until(last.date);
set temp;
by State city date;
length Treatments $ 100;
Treatments=catx('/',Treatments,Treatment);
end;
drop Treatment;
run;
data temp3;
set temp2;
by State city Treatments notsorted;
if first.Treatments or dif(date) ne 1 then group+1;
run;
data want;
set temp3;
by group;
retain Stdt ;
if first.group then Stdt=date;
if last.group then do;endt=date;output;end;
format Stdt endt mmddyy10.;
drop date;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.
Find more tutorials on the SAS Users YouTube channel.