Hello,
I would like to restrict a group of people who is continuously as a member between start and end date.
My data looks like below:
ID StartDate EndDate
1 3/19/2019 6/01/2019
1 7/7/2019 8/21/2019
2 3/6/2019 9/01/2019
Registration data is:
member_id 01/2019 02/2019 03/2019 ...
1 0 1 1
2 0 0 0
so the output data will only keep record '1 7/7/2019 8/21/2019' since person1 is a member between start and end dates.
Any recommendation will be appreciated.
Thank you!
Here is one approach. It might be easier to transpose your registration data, especially if start and end dates cover multiple years. I have assumed below that part month registrations should be included.
data member ;
infile cards dlm=',' ;
input id StartDate:mmddyy10. EndDate:mmddyy10. ;
StartMonth=intnx('MONTH',StartDate,0,'B') ;
EndMonth=intnx('MONTH',EndDate,0,'B') ;
format StartDate EndDate StartMonth EndMonth ddmmyy10. ;
cards ;
1,3/19/2018,6/01/2018
1,7/7/2019,8/21/2019
2,3/6/2019,9/01/2019
;
run ;
data registration ;
infile cards dlm=',' ;
input person_id year jan feb mar apr may jun jul aug sep oct nov dec ;
cards ;
1,2018,0,0,0,1,1,0,0,0,0,0,0,1
1,2019,1,1,1,1,1,1,1,1,1,1,1,1
2,2018,1,1,0,0,1,1,1,1,1,1,1,1
2,2019,1,1,1,1,1,1,0,0,0,0,0,0
;
run;
data registrationTransposed ;
set registration ;
array mon(12) jan feb mar apr may jun jul aug sep oct nov dec ;
do i=1 to 12 ;
date=mdy(i,1,year) ;
Flag=mon(i) ;
output ;
end ;
keep person_id date flag ;
format date monyy7. ;
run ;
proc sql ;
create table want as select distinct
m.ID
,m.StartDate
,m.EndDate
from member as m
inner join
registrationTransposed as r
on
m.id=r.Person_ID
and r.Date between m.StartMonth and m.EndMonth
group by 1,2,3
having sum(Flag)=intck('MONTH',m.StartMonth,m.EndMonth)+1 ;
;
quit ;
You can do this in a single DATA step by (1) reading in the registration months for a given ID, (2) populating a two-way registration history array indexed by year for the row and month for the column, (3) reading in the start_end_spans for the same ID and checking those spans against the registration history:
data start_end_spans ;
input id StartDate:mmddyy10. EndDate:mmddyy10. ;
format StartDate EndDate date9. ;
cards ;
1 3/19/2018 6/01/2018
1 7/7/2019 8/21/2019
2 3/6/2019 9/01/2019
run;
data registration_months;
input id year jan feb mar apr may jun jul aug sep oct nov dec ;
cards ;
1 2018 0 0 0 1 1 0 0 0 0 0 0 1
1 2019 1 1 1 1 1 1 1 1 1 1 1 1
2 2018 1 1 0 0 1 1 1 1 1 1 1 1
2 2019 1 1 1 1 1 1 0 0 0 0 0 0
run;
data want (keep=id startdate enddate);
set registration_months (in=inreg) start_end_spans (in=in_spans);
by id;
array reg_history{2018:2019,1:12} _temporary_;
if first.id then call missing(of reg_history{*});
array reg_months {12} jan--dec ;
if inreg then do m=1 to 12;
reg_history{year,m}=reg_months{m};
end;
if in_spans;
missing_month=0;
date=startdate;
do until (date>enddate or missing_month=1);
if reg_history{year(date),month(date)}^=1 then missing_month=1;
date=intnx('month',date,1,'b');
end;
if missing_month=0;
run;
Just be sure to declare the possible span of years ( I used 2018 through 2019 above) in the reg_history array statement.
@mkeintz impressive, would this even work, if begin and end date are not in the same year?
@andreas_lds wrote:
@mkeintz impressive, would this even work, if begin and end date are not in the same year?
Yes, because:
Here is another solution picking up the idea of transposing mentioned by @seemiyah
data reg_months_transformed;
set registration_months;
length date 8;
format date date9.;
array months[12] jan feb mar apr may jun jul aug sep oct nov dec;
do i = 1 to 12;
if months[i] = 1 then do;
date = mdy(i, 1, year);
output;
end;
end;
drop year jan feb mar apr may jun jul aug sep oct nov dec i;
run;
data want;
if 0 then set work.reg_months_transformed;
set start_end_spans;
if _n_ = 1 then do;
declare hash h(dataset: 'work.reg_months_transformed');
h.defineKey('id', 'date');
h.defineDone();
end;
s = intnx('month', StartDate, 0, 'b');
e = intnx('month', EndDate, 0, 'b');
complete = 1;
do i = 0 to intck('month', s, e) ;
date = intnx('month', StartDate, i, 'b');
complete = complete and (h.check() = 0);
end;
if complete;
drop e i s date complete;
run;
data start_end_spans ; input id StartDate:mmddyy10. EndDate:mmddyy10. ; format StartDate EndDate date9. ; cards ; 1 3/19/2018 6/01/2018 1 7/7/2019 8/21/2019 2 3/6/2019 9/01/2019 ; run; data want; set start_end_spans ; year=year(StartDate); array months[12] jan feb mar apr may jun jul aug sep oct nov dec; do i=1 to dim(months); months[i] = 0; end; do i = StartDate to EndDate; if month(i) ne m then months[month(i)]=1; m= month(i); end; drop m i; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.