BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Pyrite | Level 9

Hi guys, 

suppose to have two data sets (files). File 1 is composed by time-periods with a label for each one and File2 that contains sub-periods without labels. I need to add labels to File2 based on the time interval from File1 so that if the period has Label "x" and the sub-period is contained in the period of File1, the sub-period will take the label from the period of File1. 

 

Can anyone help me please? 

 

 

 

 

data have1;
  input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA   ex005
0001 01MAY2015 31MAY2015 HospitalA   ex004
0001 01JUN2015 31DEC2015 HospitalC   ex005
0002 06FEB2018 08FEB2018 HospitalA   ex004
0002 09FEB2018 31AUG2018 HospitalC   ex005
0002 01SEP2018 31DEC2019 HospitalC   ex004
0003 01JAN2019 30SEP2019 HospitalD   ex008
0003 01OCT2019 31DEC2020 HospitalD   ex004
;

File2:

data have2;
  input ID :$20. Start :date9. End :date9.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015 
0001 31JAN2015 15FEB2015 
0001 15FEB2015 30APR2015
0001 01MAY2015 15MAY2015
0001 16MAY2015 31MAY2015
0001 01JUN2015 15SEP2015
0001 16SEP2015 31DEC2015
......
;

File3 desired output:

data output;
  input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015 HospitalA  ex005
0001 31JAN2015 15FEB2015 HospitalA  ex005
0001 15FEB2015 30APR2015 HospitalA  ex005 
0001 01MAY2015 15MAY2015 HospitalA  ex004
0001 16MAY2015 31MAY2015 HospitalA  ex004
0001 01JUN2015 15SEP2015 HospitalC  ex005
0001 16SEP2015 31DEC2015 HospitalC  ex005
......
;

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

If I am correct that some sub-periods are not contained within any intervals in have1, and you want those blanked for Label and Role, try this

 

data have1;
  input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA ex005
0001 01MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 31DEC2015 HospitalC ex005
0002 06FEB2018 08FEB2018 HospitalA ex004
0002 09FEB2018 31AUG2018 HospitalC ex005
0002 01SEP2018 31DEC2019 HospitalC ex004
0003 01JAN2019 30SEP2019 HospitalD ex008
0003 01OCT2019 31DEC2020 HospitalD ex004
;

data have2;
  input ID :$20. Start :date9. End :date9.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015 
0001 31JAN2015 15FEB2015 
0001 15FEB2015 30APR2015 
0001 01MAY2015 15MAY2015 
0001 16MAY2015 31MAY2015 
0001 01JUN2015 15SEP2015 
0001 16SEP2015 31DEC2015 
;

data want(drop = s e);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'have1(rename = (Start = s End = e)', multidata : 'Y');
      h.definekey('ID');
      h.definedata(all : 'Y');
      h.definedone();
      dcl hiter i('h');
   end;

   set have2;

   if 0 then set have1(rename = (Start = s End = e));
   call missing(s, e, Label, Role);

   do while (i.next() = 0);
      if Start >= s and End <= e then leave;
      else call missing(Label, Role);
   end;

run;

 

Result:

 

ID    Start       End         Label      Role
0001  2015-01-01  2015-01-30  HospitalA  ex005
0003  2015-01-31  2015-02-15             
0001  2015-02-15  2015-04-30  HospitalA  ex005
0001  2015-05-01  2015-05-15  HospitalA  ex004
0003  2015-05-16  2015-05-31             
0001  2015-06-01  2015-09-15  HospitalC  ex005
0003  2015-09-16  2015-12-31             

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

A few questions:

 

  1. The period of have2 must be entirely contained by a period from have1, right?
  2. Can the period 01JAN2015 - 31MAY2015 be viewed as a single period? I.e. if a subperiod from have2 is 29APR2015 - 02MAY2015, does the Label become Hospital2 then?
  3. Does the IDs in have1 not matter?
NewUsrStat
Pyrite | Level 9

1) Yes, each sub-period in have2 is contained in one of the intervals in have1.
2) Unfortunately the period 01JAN2015 - 31MAY2015 cannot be viewed as a single period because there are additional variables in the file which I have not shown for simplicity.
3) The ID does matter because of the presence of additional information in the original file.

 

I will edit the examples by adding another variable.

PeterClemmensen
Tourmaline | Level 20

I don't see how your posted desired result match the logic you describe.

 

For example, in the 2nd obs of have 2, the sub period is 31JAN2015 - 15FEB2015. This sub-period is not fully contained by any period in have1 when 01JAN2015 - 31MAY2015 cannot be viewed as a single period.

 

 

PeterClemmensen
Tourmaline | Level 20

If I am correct that some sub-periods are not contained within any intervals in have1, and you want those blanked for Label and Role, try this

 

data have1;
  input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA ex005
0001 01MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 31DEC2015 HospitalC ex005
0002 06FEB2018 08FEB2018 HospitalA ex004
0002 09FEB2018 31AUG2018 HospitalC ex005
0002 01SEP2018 31DEC2019 HospitalC ex004
0003 01JAN2019 30SEP2019 HospitalD ex008
0003 01OCT2019 31DEC2020 HospitalD ex004
;

data have2;
  input ID :$20. Start :date9. End :date9.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015 
0001 31JAN2015 15FEB2015 
0001 15FEB2015 30APR2015 
0001 01MAY2015 15MAY2015 
0001 16MAY2015 31MAY2015 
0001 01JUN2015 15SEP2015 
0001 16SEP2015 31DEC2015 
;

data want(drop = s e);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'have1(rename = (Start = s End = e)', multidata : 'Y');
      h.definekey('ID');
      h.definedata(all : 'Y');
      h.definedone();
      dcl hiter i('h');
   end;

   set have2;

   if 0 then set have1(rename = (Start = s End = e));
   call missing(s, e, Label, Role);

   do while (i.next() = 0);
      if Start >= s and End <= e then leave;
      else call missing(Label, Role);
   end;

run;

 

Result:

 

ID    Start       End         Label      Role
0001  2015-01-01  2015-01-30  HospitalA  ex005
0003  2015-01-31  2015-02-15             
0001  2015-02-15  2015-04-30  HospitalA  ex005
0001  2015-05-01  2015-05-15  HospitalA  ex004
0003  2015-05-16  2015-05-31             
0001  2015-06-01  2015-09-15  HospitalC  ex005
0003  2015-09-16  2015-12-31             
Ksharp
Super User
data have1;
  input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA   ex005
0001 01MAY2015 31MAY2015 HospitalA   ex004
0001 01JUN2015 31DEC2015 HospitalC   ex005
0002 06FEB2018 08FEB2018 HospitalA   ex004
0002 09FEB2018 31AUG2018 HospitalC   ex005
0002 01SEP2018 31DEC2019 HospitalC   ex004
0003 01JAN2019 30SEP2019 HospitalD   ex008
0003 01OCT2019 31DEC2020 HospitalD   ex004
;

data have2;
  input ID :$20. Start :date9. End :date9.;
  format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015 
0001 31JAN2015 15FEB2015 
0001 15FEB2015 30APR2015
0001 01MAY2015 15MAY2015
0001 16MAY2015 31MAY2015
0001 01JUN2015 15SEP2015
0001 16SEP2015 31DEC2015
;

proc sql;
create table want as
select a.*,b.label,b.role
 from have2 as a left join have1 as b 
  on a.id=b.id and a.start between b.start and b.end 
               and a.end between b.start and b.end
 order by id,start;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 384 views
  • 1 like
  • 3 in conversation