BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
altijani
Quartz | Level 8

Hello Community. Bringing this back up after 2 years.

I have the following data that I need to use to measure the periods of membership overlap (if any) for the same individual but in multiple stores:

Obs_NumStore_IDMember_IDMember_FromMember_ToMember_Type
1AA12301/01/202012/31/2021Full
2BB12306/01/202112/31/2999Full
3AA45601/01/202012/31/2021Full
4BB45609/01/202110/31/2021Part
5AA78901/01/202012/31/2021Full
6BB91106/01/202110/01/2021Part
7AA41101/01/202012/31/2022Full
8BB41109/01/202112/31/2099Part
9CC41112/01/202112/31/2022Full

Few notes (the colors are to clarify the overlapping):

- Observations 1&2 are for the same ID in 2 different stores

- Observations 5& 6 are for distinct ID and for only one stores (no overlap)

- Observations 7-8-9 are for the same ID in 3 different stores

- The new dataset (below) should have 5 new columns to measure overlap from start UNTILL the end of 2021.

- The new column "Type_Overlap" is tricky in that it has to start with that store's membership type first. For example, Obs 7/8/9 are similar in 4 of the new 5 columns EXCEPT the type_overlap.

 

The new columns to be added to the data should be as follows:

Obs_NumStore_IDMember_IDMember_FromMember_ToMember_TypeMnths_Overlaps_till2021Num_Stores_OverlapType_OverlapOverlap_FromOverlap_To
1AA12301/01/202012/31/2021Full72Full/Full06/01/202112/31/2021
2BB12306/01/202112/31/2999Full72Full/Full06/01/202112/31/2021
3AA45601/01/202012/31/2021Full22Full/Part09/01/202110/31/2021
4BB45609/01/202110/31/2021Part22Part/Full09/01/202110/31/2021
5AA78901/01/202012/31/2021Full00Full  
6BB91106/01/202110/01/2021Part00Part  
7AA41101/01/202012/31/2022Full43Full/Part/Full09/01/202112/31/2021
8BB41109/01/202112/31/2099Part43Part/Full/Full09/01/202112/31/2021
9CC41112/01/202112/31/2022Full43Full/Full/Part09/01/202112/31/2021

 

Is there a way to do this in SAS in a meaningful code? My data is a lot much larger than the above, but I hope I captured all the scenarios.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming that no store_id occurs more than once per member_id (so that there won't be a false overlap inferred), then a DATA step passing through each ID twice will do.

 

It uses a 2-way array indexed by YEAR and MONTH to track overlaps.  

 

 

/*Combine and Condense Duplicate Records*/
%let HLB=2019;  /* History lower bound */
%let HUB=2021;  /* History upper bound */

data have;
infile cards expandtabs truncover;
input Obs_Num	Store_ID $	Member_ID $	Member_From :mmddyy10.	Member_To  :mmddyy10.	Member_Type $;
format Member_From Member_To mmddyy10.;
cards;
1	AA	123	01/01/2020	12/31/2021	Full
2	BB	123	06/01/2021	12/31/2999	Full
3	AA	456	01/01/2020	12/31/2021	Full
4	BB	456	09/01/2021	10/31/2021	Part
5	AA	789	01/01/2020	12/31/2021	Full
6	BB	911	06/01/2021	10/01/2021	Part
7	AA	411	01/01/2020	12/31/2022	Full
8	BB	411	09/01/2021	12/31/2099	Part
9	CC	411	12/01/2021	12/31/2022	Full
;

data want (drop=_:);
  /** Do a first pass through a single ID **/
  set have;
  by member_id notsorted;

  array mfreq{&HLB:&HUB,12} _temporary_  /*Frequency by year and month */;
  retain mnths_overlaps_till2021 num_stores_overlap type_overlap overlap_from overlap_to;
  length type_overlap $20;
  format overlap_from overlap_to mmddyy10. ;

  if first.member_id then do;
    call missing(of mfreq{*},of mnths_overlaps_till2021 -- overlap_to);
    mnths_overlaps_till2021 = 0;
  end;

  num_stores_overlap + not(first.member_id=1 and last.member_id=1);

  type_overlap=catx('/',type_overlap,member_type);

  _mdate=member_from;
  do while (_mdate< min(member_to,"31dec&HUB"d));
    _y=year(_mdate);
    _m=month(_mdate);
    mfreq{_y,_m}+1;
    
    if mfreq{_y,_m}=2 then do;
      overlap_from=min(overlap_from,mdy(_m,1,_y));
      overlap_to=max(overlap_to,intnx('month',mdy(_m,1,_y),0,'end'));
      mnths_overlaps_till2021 = intck('month',overlap_from,overlap_to)+1;
    end;
    _mdate=intnx('month',_mdate,1);
  end;

  /* Do a second pass through the same ID, and output */
  if last.member_id then do _i=1 to coalesce(dif(_n_),_n_);
    set have;
    output;
  end;
run;

 

 

Also, this assumes that all overlaps are nested (e.g. for 3 stores A,B, and C:  A/B overlap for 4 months, and   A/C and B/C overlap 1 month within that 4 month A/B overlap).  And you want the longest span of those nested overlaps.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
ballardw
Super User

What if a member has periods at the same store that do not over lap at all what would the output look like:

5	AA	789	01/01/2020	12/31/2021	Full
6	AA	789	01/01/2023	12/31/2023	Full

Or ta different stores

5	AA	789	01/01/2020	12/31/2021	Full
6	BB	789	01/01/2023	12/31/2023	Full

Same store with gap and overlap by different store

5	AA	789	01/01/2020	12/31/2021	Full
6	AA	789	01/01/2023	12/31/2024	Part
7	BB	789	06/01/2021	07/31/2023	Full

Multiple stores with gaps but no overlap as well for the same ID.

Should sequential non-overlapping dates be treated differently:

5	AA	789	01/01/2020	12/31/2021	Full
6	AA	789	01/01/2022	12/31/2022	Full
7	AA	789	01/01/2023	12/31/2023	Full

You will likely get better responses if you provide some example data in the form of a working data step.

 

I am also interested to a minor degree in exactly how the output data set is to be used.

 

Do you have any idea what the maximum number of values might be involved with this Type_overlap? If you data has enough time involved that has the potential of having dozens if not more values in one variable an it is seldom a good idea to ever have more than one value in a single variable.

 

mkeintz
PROC Star

Your overlap period only considers months through 31dec2021.   I get that.  But why is the overlap count for the three observations for ID 411 set to 4?  The date ranges are

  • 01jan2020:31dec2022
  • 01sep2021:31dec2099
    and
  • 01dec2021:31dec2022

Yes, the first two obs overlap for 4 months, but the last one only overlaps for 1, yet you assign a vale of 4 to all three obs.

 

What if the dates were

  • 01jan2020:31dec2021
  • 01jan2020:30jun2020
    and
  • 01jan2021:30apr2021

How many overlap months would those 3 ranges have?

 

What is your rule for measuring overlap?

 

Also 

  1. Do you always have 1 obs for each member_id/store_id combination?
  2. Are the data always sorted by member_id/member_from  member_from within each member_id?
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
infile cards expandtabs truncover;
input Obs_Num	Store_ID $	Member_ID $	Member_From :mmddyy10.	Member_To  :mmddyy10.	Member_Type $;
format Member_From Member_To mmddyy10.;
cards;
1	AA	123	01/01/2020	12/31/2021	Full
2	BB	123	06/01/2021	12/31/2999	Full
3	AA	456	01/01/2020	12/31/2021	Full
4	BB	456	09/01/2021	10/31/2021	Part
5	AA	789	01/01/2020	12/31/2021	Full
6	BB	911	06/01/2021	10/01/2021	Part
7	AA	411	01/01/2020	12/31/2022	Full
8	BB	411	09/01/2021	12/31/2099	Part
9	CC	411	12/01/2021	12/31/2022	Full
;
data have2;
 set have;
 if Member_To>'31dec2021'd then Member_To='31dec2021'd;
run;
data temp;
 set have2;
 do date= Member_From to Member_To;
   output;
 end;
 format date mmddyy10.;
 drop  Member_From Member_To;
run;
proc sort data=temp out=temp2 nodupkey;
by Member_ID date  Obs_Num	Store_ID;
run;
data temp3;
do until(last.date);
 set temp2;
 by Member_ID date ; 
 length _Obs_Num _Store_ID _Member_Type $ 200;
 _Obs_Num=catx('|',_Obs_Num,Obs_Num);
 _Store_ID=catx('|',_Store_ID,Store_ID);
 _Member_Type=catx('|',_Member_Type,Member_Type);
end;
if findc(_Obs_Num,'|') then output;
 drop  Obs_Num	Store_ID Member_Type;
run;
data temp4;
 set temp3;
 by Member_ID date ; 
 if first.Member_ID or dif(date) ne 1 then group+1;
run;
proc sql;
create table want1 as
select group,Member_ID,min(date) as Overlap_From format=mmddyy10.,max(date) as Overlap_To format=mmddyy10.,
intck('month',calculated Overlap_From,calculated Overlap_To,'c')+1 as Mnths_Overlaps_till2021,
max(countw(_Store_ID,'|')) as Num_Stores_Overlap
 from temp4
  group by group,Member_ID;

create table want2 as
select a.*,group,Overlap_From,Overlap_To,
coalesce(Mnths_Overlaps_till2021,0) as Mnths_Overlaps_till2021,
coalesce(Num_Stores_Overlap,0) as Num_Stores_Overlap
 from have as a left join want1 as b
  on a.Member_ID=b.Member_ID
   order by Obs_Num;

create table temp5 as
select distinct group,Member_ID,_Obs_Num,_Member_Type
 from temp4
  group by group,Member_ID
   having countw(_Obs_Num,'|')=max(countw(_Obs_Num,'|'));

create table want3 as
select a.*,_Obs_Num,coalescec(_Member_Type,Member_Type) as _Member_Type
 from want2 as a left join temp5 as b
  on a.group=b.group and a.Member_ID=b.Member_ID
   order by Obs_Num;
quit;

data want4;
 set want3;
 length Type_Overlap $ 200;
 if missing(_Obs_Num) or input(scan(_Obs_Num,1,'|'),best.)=Obs_Num then Type_Overlap=_Member_Type;
  else do;
    p=findw(_Obs_Num,strip(Obs_Num),'|','eit');
    Type_Overlap=scan(_Member_Type,p,'|');
	do i=1 to countw(_Member_Type,'|');
      if i ne p then Type_Overlap=catx('|',Type_Overlap,scan(_Member_Type,i,'|'));
	end;
  end;
drop p i  _Obs_Num _Member_Type;
run;
mkeintz
PROC Star

Assuming that no store_id occurs more than once per member_id (so that there won't be a false overlap inferred), then a DATA step passing through each ID twice will do.

 

It uses a 2-way array indexed by YEAR and MONTH to track overlaps.  

 

 

/*Combine and Condense Duplicate Records*/
%let HLB=2019;  /* History lower bound */
%let HUB=2021;  /* History upper bound */

data have;
infile cards expandtabs truncover;
input Obs_Num	Store_ID $	Member_ID $	Member_From :mmddyy10.	Member_To  :mmddyy10.	Member_Type $;
format Member_From Member_To mmddyy10.;
cards;
1	AA	123	01/01/2020	12/31/2021	Full
2	BB	123	06/01/2021	12/31/2999	Full
3	AA	456	01/01/2020	12/31/2021	Full
4	BB	456	09/01/2021	10/31/2021	Part
5	AA	789	01/01/2020	12/31/2021	Full
6	BB	911	06/01/2021	10/01/2021	Part
7	AA	411	01/01/2020	12/31/2022	Full
8	BB	411	09/01/2021	12/31/2099	Part
9	CC	411	12/01/2021	12/31/2022	Full
;

data want (drop=_:);
  /** Do a first pass through a single ID **/
  set have;
  by member_id notsorted;

  array mfreq{&HLB:&HUB,12} _temporary_  /*Frequency by year and month */;
  retain mnths_overlaps_till2021 num_stores_overlap type_overlap overlap_from overlap_to;
  length type_overlap $20;
  format overlap_from overlap_to mmddyy10. ;

  if first.member_id then do;
    call missing(of mfreq{*},of mnths_overlaps_till2021 -- overlap_to);
    mnths_overlaps_till2021 = 0;
  end;

  num_stores_overlap + not(first.member_id=1 and last.member_id=1);

  type_overlap=catx('/',type_overlap,member_type);

  _mdate=member_from;
  do while (_mdate< min(member_to,"31dec&HUB"d));
    _y=year(_mdate);
    _m=month(_mdate);
    mfreq{_y,_m}+1;
    
    if mfreq{_y,_m}=2 then do;
      overlap_from=min(overlap_from,mdy(_m,1,_y));
      overlap_to=max(overlap_to,intnx('month',mdy(_m,1,_y),0,'end'));
      mnths_overlaps_till2021 = intck('month',overlap_from,overlap_to)+1;
    end;
    _mdate=intnx('month',_mdate,1);
  end;

  /* Do a second pass through the same ID, and output */
  if last.member_id then do _i=1 to coalesce(dif(_n_),_n_);
    set have;
    output;
  end;
run;

 

 

Also, this assumes that all overlaps are nested (e.g. for 3 stores A,B, and C:  A/B overlap for 4 months, and   A/C and B/C overlap 1 month within that 4 month A/B overlap).  And you want the longest span of those nested overlaps.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 466 views
  • 0 likes
  • 4 in conversation