Hi guys,
suppose to have two data sets (files). File 1 is composed by time-periods with a label for each one and File2 that contains sub-periods without labels. I need to add labels to File2 based on the time interval from File1 so that if the period has Label "x" and the sub-period is contained in the period of File1, the sub-period will take the label from the period of File1.
Can anyone help me please?
data have1;
input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA ex005
0001 01MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 31DEC2015 HospitalC ex005
0002 06FEB2018 08FEB2018 HospitalA ex004
0002 09FEB2018 31AUG2018 HospitalC ex005
0002 01SEP2018 31DEC2019 HospitalC ex004
0003 01JAN2019 30SEP2019 HospitalD ex008
0003 01OCT2019 31DEC2020 HospitalD ex004
;
File2:
data have2;
input ID :$20. Start :date9. End :date9.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015
0001 31JAN2015 15FEB2015
0001 15FEB2015 30APR2015
0001 01MAY2015 15MAY2015
0001 16MAY2015 31MAY2015
0001 01JUN2015 15SEP2015
0001 16SEP2015 31DEC2015
......
;
File3 desired output:
data output;
input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015 HospitalA ex005
0001 31JAN2015 15FEB2015 HospitalA ex005
0001 15FEB2015 30APR2015 HospitalA ex005
0001 01MAY2015 15MAY2015 HospitalA ex004
0001 16MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 15SEP2015 HospitalC ex005
0001 16SEP2015 31DEC2015 HospitalC ex005
......
;
If I am correct that some sub-periods are not contained within any intervals in have1, and you want those blanked for Label and Role, try this
data have1;
input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA ex005
0001 01MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 31DEC2015 HospitalC ex005
0002 06FEB2018 08FEB2018 HospitalA ex004
0002 09FEB2018 31AUG2018 HospitalC ex005
0002 01SEP2018 31DEC2019 HospitalC ex004
0003 01JAN2019 30SEP2019 HospitalD ex008
0003 01OCT2019 31DEC2020 HospitalD ex004
;
data have2;
input ID :$20. Start :date9. End :date9.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015
0001 31JAN2015 15FEB2015
0001 15FEB2015 30APR2015
0001 01MAY2015 15MAY2015
0001 16MAY2015 31MAY2015
0001 01JUN2015 15SEP2015
0001 16SEP2015 31DEC2015
;
data want(drop = s e);
if _N_ = 1 then do;
dcl hash h(dataset : 'have1(rename = (Start = s End = e)', multidata : 'Y');
h.definekey('ID');
h.definedata(all : 'Y');
h.definedone();
dcl hiter i('h');
end;
set have2;
if 0 then set have1(rename = (Start = s End = e));
call missing(s, e, Label, Role);
do while (i.next() = 0);
if Start >= s and End <= e then leave;
else call missing(Label, Role);
end;
run;
Result:
ID Start End Label Role 0001 2015-01-01 2015-01-30 HospitalA ex005 0003 2015-01-31 2015-02-15 0001 2015-02-15 2015-04-30 HospitalA ex005 0001 2015-05-01 2015-05-15 HospitalA ex004 0003 2015-05-16 2015-05-31 0001 2015-06-01 2015-09-15 HospitalC ex005 0003 2015-09-16 2015-12-31
A few questions:
1) Yes, each sub-period in have2 is contained in one of the intervals in have1.
2) Unfortunately the period 01JAN2015 - 31MAY2015 cannot be viewed as a single period because there are additional variables in the file which I have not shown for simplicity.
3) The ID does matter because of the presence of additional information in the original file.
I will edit the examples by adding another variable.
I don't see how your posted desired result match the logic you describe.
For example, in the 2nd obs of have 2, the sub period is 31JAN2015 - 15FEB2015. This sub-period is not fully contained by any period in have1 when 01JAN2015 - 31MAY2015 cannot be viewed as a single period.
If I am correct that some sub-periods are not contained within any intervals in have1, and you want those blanked for Label and Role, try this
data have1;
input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA ex005
0001 01MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 31DEC2015 HospitalC ex005
0002 06FEB2018 08FEB2018 HospitalA ex004
0002 09FEB2018 31AUG2018 HospitalC ex005
0002 01SEP2018 31DEC2019 HospitalC ex004
0003 01JAN2019 30SEP2019 HospitalD ex008
0003 01OCT2019 31DEC2020 HospitalD ex004
;
data have2;
input ID :$20. Start :date9. End :date9.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015
0001 31JAN2015 15FEB2015
0001 15FEB2015 30APR2015
0001 01MAY2015 15MAY2015
0001 16MAY2015 31MAY2015
0001 01JUN2015 15SEP2015
0001 16SEP2015 31DEC2015
;
data want(drop = s e);
if _N_ = 1 then do;
dcl hash h(dataset : 'have1(rename = (Start = s End = e)', multidata : 'Y');
h.definekey('ID');
h.definedata(all : 'Y');
h.definedone();
dcl hiter i('h');
end;
set have2;
if 0 then set have1(rename = (Start = s End = e));
call missing(s, e, Label, Role);
do while (i.next() = 0);
if Start >= s and End <= e then leave;
else call missing(Label, Role);
end;
run;
Result:
ID Start End Label Role 0001 2015-01-01 2015-01-30 HospitalA ex005 0003 2015-01-31 2015-02-15 0001 2015-02-15 2015-04-30 HospitalA ex005 0001 2015-05-01 2015-05-15 HospitalA ex004 0003 2015-05-16 2015-05-31 0001 2015-06-01 2015-09-15 HospitalC ex005 0003 2015-09-16 2015-12-31
data have1;
input ID :$20. Start :date9. End :date9. Label :$20. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA ex005
0001 01MAY2015 31MAY2015 HospitalA ex004
0001 01JUN2015 31DEC2015 HospitalC ex005
0002 06FEB2018 08FEB2018 HospitalA ex004
0002 09FEB2018 31AUG2018 HospitalC ex005
0002 01SEP2018 31DEC2019 HospitalC ex004
0003 01JAN2019 30SEP2019 HospitalD ex008
0003 01OCT2019 31DEC2020 HospitalD ex004
;
data have2;
input ID :$20. Start :date9. End :date9.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30JAN2015
0001 31JAN2015 15FEB2015
0001 15FEB2015 30APR2015
0001 01MAY2015 15MAY2015
0001 16MAY2015 31MAY2015
0001 01JUN2015 15SEP2015
0001 16SEP2015 31DEC2015
;
proc sql;
create table want as
select a.*,b.label,b.role
from have2 as a left join have1 as b
on a.id=b.id and a.start between b.start and b.end
and a.end between b.start and b.end
order by id,start;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.