BookmarkSubscribeRSS Feed
huhuhu
Obsidian | Level 7

Hello,

 

I would like to restrict a group of people who is continuously as a member between start and end date. 

 

My data looks like below:

ID   StartDate     EndDate

1     3/19/2019      6/01/2019

1     7/7/2019        8/21/2019

2     3/6/2019       9/01/2019

 

Registration data is:

member_id 01/2019  02/2019 03/2019   ...

1                 0               1            1

2                 0                0            0

 

so the output data will only keep record '1     7/7/2019        8/21/2019' since person1 is a member between start and end dates.

 

Any recommendation will be appreciated.

 

Thank you!

6 REPLIES 6
seemiyah
Fluorite | Level 6

Here is one approach. It might be easier to transpose your registration data, especially if start and end dates cover multiple years.  I have assumed below that part month registrations should be included. 

 

data member ;
	infile cards dlm=',' ;
	input id StartDate:mmddyy10. EndDate:mmddyy10. ;

	StartMonth=intnx('MONTH',StartDate,0,'B') ;
	EndMonth=intnx('MONTH',EndDate,0,'B') ;

	format StartDate EndDate StartMonth EndMonth ddmmyy10. ;
cards ;
1,3/19/2018,6/01/2018
1,7/7/2019,8/21/2019
2,3/6/2019,9/01/2019
;
run ;

data registration ;
	infile cards dlm=',' ;
	input person_id year jan feb mar apr may jun jul aug sep oct nov dec ;
cards ;
1,2018,0,0,0,1,1,0,0,0,0,0,0,1
1,2019,1,1,1,1,1,1,1,1,1,1,1,1
2,2018,1,1,0,0,1,1,1,1,1,1,1,1
2,2019,1,1,1,1,1,1,0,0,0,0,0,0
;
run;

data registrationTransposed ;
	set registration ;
	array mon(12) jan feb mar apr may jun jul aug sep oct nov dec ;
	do i=1 to 12 ;
		date=mdy(i,1,year) ;
		Flag=mon(i) ;
		output ;
	end ;
	keep person_id date flag ;
	format  date monyy7. ;
run ;

proc sql ;
	create table want as select distinct
 	m.ID
	,m.StartDate
	,m.EndDate

	from member as m
	inner join
	registrationTransposed as r

	on 
	m.id=r.Person_ID
	and r.Date between m.StartMonth and m.EndMonth

	group by 1,2,3

	having sum(Flag)=intck('MONTH',m.StartMonth,m.EndMonth)+1 ;
;
quit ;

 

mkeintz
PROC Star

You can do this in a single DATA step by (1) reading in the registration months for a given ID, (2) populating a two-way registration history array indexed by year for the row and month for the column, (3) reading in the start_end_spans for the same ID and checking those spans against the registration history:

 

data start_end_spans ;
  input id StartDate:mmddyy10. EndDate:mmddyy10. ;
  format StartDate EndDate date9. ;
cards ;
1 3/19/2018 6/01/2018
1 7/7/2019  8/21/2019
2 3/6/2019  9/01/2019
run;

data registration_months;
  input id year jan feb mar apr may jun jul aug sep oct nov dec ;
cards ;
1 2018 0 0 0 1 1 0 0 0 0 0 0 1
1 2019 1 1 1 1 1 1 1 1 1 1 1 1
2 2018 1 1 0 0 1 1 1 1 1 1 1 1
2 2019 1 1 1 1 1 1 0 0 0 0 0 0
run;

data want (keep=id startdate enddate);
  set registration_months (in=inreg) start_end_spans (in=in_spans);
  by id;

  array reg_history{2018:2019,1:12} _temporary_;
  if first.id then call missing(of reg_history{*});

  array reg_months {12} jan--dec ;
  if inreg then do m=1 to 12;
    reg_history{year,m}=reg_months{m};
  end;

  if in_spans;
  missing_month=0;
  date=startdate;
  do until (date>enddate or missing_month=1);
    if reg_history{year(date),month(date)}^=1 then missing_month=1;
    date=intnx('month',date,1,'b');
  end;
  if missing_month=0;
run;

Just be sure to declare the possible span of years ( I used 2018 through 2019 above) in the reg_history array statement.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

@mkeintz impressive, would this even work, if begin and end date are not in the same year?

mkeintz
PROC Star

@andreas_lds wrote:

@mkeintz impressive, would this even work, if begin and end date are not in the same year?


Yes, because:

  1. The registration history covers multiple years, so the available history is complete by the time a start_end_span observation is read..
  2. The DO UNTIL uses the intnx function to advance from startdate to enddate by 1 month increments while confirming registration status for the corresponding year/month.  The intnx function crosses year boundaries seamlessly.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Here is another solution picking up the idea of transposing mentioned by @seemiyah

data reg_months_transformed;
   set registration_months;
   
   length date 8;
   format date date9.;
   
   array months[12] jan feb mar apr may jun jul aug sep oct nov dec;
   
   do i = 1 to 12;
      if months[i] = 1 then do;
         date = mdy(i, 1, year);
         output;
      end;
   end;
   
   drop year jan feb mar apr may jun jul aug sep oct nov dec i;
run;

data want;
   if 0 then set work.reg_months_transformed;
   set start_end_spans;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.reg_months_transformed');
      h.defineKey('id', 'date');
      h.defineDone();
   end;
   
   s = intnx('month', StartDate, 0, 'b');
   e = intnx('month', EndDate, 0, 'b');
   complete = 1;
   
   do i = 0 to intck('month', s, e) ;
      date = intnx('month', StartDate, i, 'b');
      complete = complete and (h.check() = 0);
   end;
   
   if complete;
   
   drop e i s date complete;
run;
Ksharp
Super User
data start_end_spans ;
  input id StartDate:mmddyy10. EndDate:mmddyy10. ;
  format StartDate EndDate date9. ;
cards ;
1 3/19/2018 6/01/2018
1 7/7/2019  8/21/2019
2 3/6/2019  9/01/2019
;
run;


data want;
   set start_end_spans ;
      year=year(StartDate);
   
   array months[12] jan feb mar apr may jun jul aug sep oct nov dec;
   do i=1 to dim(months);
    months[i] = 0;
   end; 
   do i = StartDate to EndDate;
     if month(i) ne m then  months[month(i)]=1;
	 m= month(i);
   end;
   drop m i;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1473 views
  • 6 likes
  • 5 in conversation