BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
AnimaCh44
Fluorite | Level 6

I have the following dataset.

data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;

format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;

 

I want to split the data by start date/ end date and group treatment based on the date range.

Expected output are as follows. 

State City           Stdt          endt          treatment
NY     NeyYork   1/1/2010 3/1/2010 Dialysis
MD    Baltimore 1/1/2011 1/31/2011 Dialysis
MD    Baltimore 2/1/2011 3/1/2011 Dialysis/Heart/Knee
MD    Baltimore 3/2/2011 5/1/2011 Dialysis/Heart/Knee/Urine
MD    Baltimore 5/2/2011 8/1/2011 Dialysis/Heart/Urine
MD    Baltimore 8/2/2011 12/1/2011 Dialysis/Heart/Urine
MD    Baltimore 12/2/2011 1/14/2015 Dialysis/Urine
MD    Baltimore 1/15/2015 2/14/2015 Urine

 

The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.

Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.

if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;

format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;

data temp;
 set have;
 do date=Stdt to endt;
  output;
 end;
 format date mmddyy10.;
 drop Stdt endt;
run;

proc sort data=temp nodupkey;by  State  city date Treatment;run;

data temp2;
 do until(last.date);
   set temp;
   by  State  city date;
   length Treatments $ 100;
   Treatments=catx('/',Treatments,Treatment);
 end;
 drop Treatment;
run;

data want;
 set temp2;
 by State  city Treatments notsorted;
 retain Stdt ;
 if first.Treatments then Stdt=date;
 if last.Treatments then do;endt=date;output;end;
 format Stdt endt mmddyy10.;
 drop date;
run;

Ksharp_0-1699237244927.png

 

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Please explain the logic here. What are the steps that take you from the input data to the output data?

--
Paige Miller
AnimaCh44
Fluorite | Level 6

I have added the logic on main post.
The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.

Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.

if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.

Ksharp
Super User
data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;

format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;

data temp;
 set have;
 do date=Stdt to endt;
  output;
 end;
 format date mmddyy10.;
 drop Stdt endt;
run;

proc sort data=temp nodupkey;by  State  city date Treatment;run;

data temp2;
 do until(last.date);
   set temp;
   by  State  city date;
   length Treatments $ 100;
   Treatments=catx('/',Treatments,Treatment);
 end;
 drop Treatment;
run;

data want;
 set temp2;
 by State  city Treatments notsorted;
 retain Stdt ;
 if first.Treatments then Stdt=date;
 if last.Treatments then do;endt=date;output;end;
 format Stdt endt mmddyy10.;
 drop date;
run;

Ksharp_0-1699237244927.png

 

AnimaCh44
Fluorite | Level 6

Thank you for the solution. 

AnimaCh44
Fluorite | Level 6

The solution is fine for the overlap dates. If we have dates without overlap like for New York, we got the start date as 01/01/2010 and the end date as 03/01/2011. But we do not have any program from 03/02/2010 to 12/31/2010 and there should be a separate line for this. 

 

I have added one more line for New York on the previous dataset.

data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;

format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis

NY NeyYork 01/01/2011 03/01/2011 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;

 

I want to split the data by start date/ end date and group treatment based on the date range.

Expected output are as follows.

State City Stdt endt treatment
NY NeyYork 1/1/2010 3/1/2010 Dialysis
NY NeyYork 1/1/2011 3/1/2011 Dialysis
MD Baltimore 1/1/2011 1/31/2011 Dialysis
MD Baltimore 2/1/2011 3/1/2011 Dialysis/Heart/Knee
MD Baltimore 3/2/2011 5/1/2011 Dialysis/Heart/Knee/Urine
MD Baltimore 5/2/2011 12/1/2011 Dialysis/Heart/Urine
MD Baltimore 12/2/2011 1/14/2015 Dialysis/Urine
MD Baltimore 1/15/2015 2/14/2015 Urine

 

The logic is group all the treatment within that date range. First date range for Baltimore is 01/01/2011 to 08/01/2011 and treatment is Dialysis. Between 01/01/2011 to 08/01/2011 date range there are other start dates with different end dates and treatment like 02/01/2011, 03/02/2011, 04/15/2011, 02/01/2011.

Main logic is to group the treatment on specific date:- if we pick the random date 07/03/2011 for Baltimore then we have the Dialysis/Heart/Knee as treatment.

if we pick the random date as 09/01/2011 for Baltimore then we have the Heart/Urine/Dialysis as treatment.

Ksharp
Super User

OK. I did not take into account of this scenario .

Try this one :

 

data have;
input State $ city $ (Stdt endt)(:mmddyy10.) Treatment $;
format Stdt endt mmddyy10.;
datalines;
NY NeyYork 01/01/2010 03/01/2010 Dialysis
NY NeyYork 01/01/2011 03/01/2011 Dialysis
MD Baltimore 01/01/2011 08/01/2011 Dialysis
MD Baltimore 02/01/2011 12/01/2011 Heart
MD Baltimore 03/02/2011 02/14/2015 Urine
MD Baltimore 04/15/2011 01/14/2015 Dialysis
MD Baltimore 02/01/2011 05/01/2011 Knee
;
run;

data temp;
 set have;
 do date=Stdt to endt;
  output;
 end;
 format date mmddyy10.;
 drop Stdt endt;
run;

proc sort data=temp nodupkey;by  State  city date Treatment;run;

data temp2;
 do until(last.date);
   set temp;
   by  State  city date;
   length Treatments $ 100;
   Treatments=catx('/',Treatments,Treatment);
 end;
 drop Treatment;
run;

data temp3;
 set temp2;
 by State  city Treatments notsorted;
 if first.Treatments or dif(date) ne 1 then group+1;
run;
data want;
 set temp3;
 by group;
 retain Stdt ;
 if first.group then Stdt=date;
 if last.group then do;endt=date;output;end;
 format Stdt endt mmddyy10.;
 drop date;
run;

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Autotuning Deep Learning Models Using SAS

Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 5713 views
  • 2 likes
  • 3 in conversation