BookmarkSubscribeRSS Feed
huhuhu
Obsidian | Level 7

Hello,

 

I would like to restrict a group of people who is continuously as a member between start and end date. 

 

My data looks like below:

ID   StartDate     EndDate

1     3/19/2019      6/01/2019

1     7/7/2019        8/21/2019

2     3/6/2019       9/01/2019

 

Registration data is:

member_id 01/2019  02/2019 03/2019   ...

1                 0               1            1

2                 0                0            0

 

so the output data will only keep record '1     7/7/2019        8/21/2019' since person1 is a member between start and end dates.

 

Any recommendation will be appreciated.

 

Thank you!

6 REPLIES 6
seemiyah
Fluorite | Level 6

Here is one approach. It might be easier to transpose your registration data, especially if start and end dates cover multiple years.  I have assumed below that part month registrations should be included. 

 

data member ;
	infile cards dlm=',' ;
	input id StartDate:mmddyy10. EndDate:mmddyy10. ;

	StartMonth=intnx('MONTH',StartDate,0,'B') ;
	EndMonth=intnx('MONTH',EndDate,0,'B') ;

	format StartDate EndDate StartMonth EndMonth ddmmyy10. ;
cards ;
1,3/19/2018,6/01/2018
1,7/7/2019,8/21/2019
2,3/6/2019,9/01/2019
;
run ;

data registration ;
	infile cards dlm=',' ;
	input person_id year jan feb mar apr may jun jul aug sep oct nov dec ;
cards ;
1,2018,0,0,0,1,1,0,0,0,0,0,0,1
1,2019,1,1,1,1,1,1,1,1,1,1,1,1
2,2018,1,1,0,0,1,1,1,1,1,1,1,1
2,2019,1,1,1,1,1,1,0,0,0,0,0,0
;
run;

data registrationTransposed ;
	set registration ;
	array mon(12) jan feb mar apr may jun jul aug sep oct nov dec ;
	do i=1 to 12 ;
		date=mdy(i,1,year) ;
		Flag=mon(i) ;
		output ;
	end ;
	keep person_id date flag ;
	format  date monyy7. ;
run ;

proc sql ;
	create table want as select distinct
 	m.ID
	,m.StartDate
	,m.EndDate

	from member as m
	inner join
	registrationTransposed as r

	on 
	m.id=r.Person_ID
	and r.Date between m.StartMonth and m.EndMonth

	group by 1,2,3

	having sum(Flag)=intck('MONTH',m.StartMonth,m.EndMonth)+1 ;
;
quit ;

 

mkeintz
PROC Star

You can do this in a single DATA step by (1) reading in the registration months for a given ID, (2) populating a two-way registration history array indexed by year for the row and month for the column, (3) reading in the start_end_spans for the same ID and checking those spans against the registration history:

 

data start_end_spans ;
  input id StartDate:mmddyy10. EndDate:mmddyy10. ;
  format StartDate EndDate date9. ;
cards ;
1 3/19/2018 6/01/2018
1 7/7/2019  8/21/2019
2 3/6/2019  9/01/2019
run;

data registration_months;
  input id year jan feb mar apr may jun jul aug sep oct nov dec ;
cards ;
1 2018 0 0 0 1 1 0 0 0 0 0 0 1
1 2019 1 1 1 1 1 1 1 1 1 1 1 1
2 2018 1 1 0 0 1 1 1 1 1 1 1 1
2 2019 1 1 1 1 1 1 0 0 0 0 0 0
run;

data want (keep=id startdate enddate);
  set registration_months (in=inreg) start_end_spans (in=in_spans);
  by id;

  array reg_history{2018:2019,1:12} _temporary_;
  if first.id then call missing(of reg_history{*});

  array reg_months {12} jan--dec ;
  if inreg then do m=1 to 12;
    reg_history{year,m}=reg_months{m};
  end;

  if in_spans;
  missing_month=0;
  date=startdate;
  do until (date>enddate or missing_month=1);
    if reg_history{year(date),month(date)}^=1 then missing_month=1;
    date=intnx('month',date,1,'b');
  end;
  if missing_month=0;
run;

Just be sure to declare the possible span of years ( I used 2018 through 2019 above) in the reg_history array statement.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

@mkeintz impressive, would this even work, if begin and end date are not in the same year?

mkeintz
PROC Star

@andreas_lds wrote:

@mkeintz impressive, would this even work, if begin and end date are not in the same year?


Yes, because:

  1. The registration history covers multiple years, so the available history is complete by the time a start_end_span observation is read..
  2. The DO UNTIL uses the intnx function to advance from startdate to enddate by 1 month increments while confirming registration status for the corresponding year/month.  The intnx function crosses year boundaries seamlessly.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Here is another solution picking up the idea of transposing mentioned by @seemiyah

data reg_months_transformed;
   set registration_months;
   
   length date 8;
   format date date9.;
   
   array months[12] jan feb mar apr may jun jul aug sep oct nov dec;
   
   do i = 1 to 12;
      if months[i] = 1 then do;
         date = mdy(i, 1, year);
         output;
      end;
   end;
   
   drop year jan feb mar apr may jun jul aug sep oct nov dec i;
run;

data want;
   if 0 then set work.reg_months_transformed;
   set start_end_spans;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.reg_months_transformed');
      h.defineKey('id', 'date');
      h.defineDone();
   end;
   
   s = intnx('month', StartDate, 0, 'b');
   e = intnx('month', EndDate, 0, 'b');
   complete = 1;
   
   do i = 0 to intck('month', s, e) ;
      date = intnx('month', StartDate, i, 'b');
      complete = complete and (h.check() = 0);
   end;
   
   if complete;
   
   drop e i s date complete;
run;
Ksharp
Super User
data start_end_spans ;
  input id StartDate:mmddyy10. EndDate:mmddyy10. ;
  format StartDate EndDate date9. ;
cards ;
1 3/19/2018 6/01/2018
1 7/7/2019  8/21/2019
2 3/6/2019  9/01/2019
;
run;


data want;
   set start_end_spans ;
      year=year(StartDate);
   
   array months[12] jan feb mar apr may jun jul aug sep oct nov dec;
   do i=1 to dim(months);
    months[i] = 0;
   end; 
   do i = StartDate to EndDate;
     if month(i) ne m then  months[month(i)]=1;
	 m= month(i);
   end;
   drop m i;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 997 views
  • 6 likes
  • 5 in conversation