BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
altijani
Quartz | Level 8

Hello Community. Bringing this back up after 2 years.

I have the following data that I need to use to measure the periods of membership overlap (if any) for the same individual but in multiple stores:

Obs_NumStore_IDMember_IDMember_FromMember_ToMember_Type
1AA12301/01/202012/31/2021Full
2BB12306/01/202112/31/2999Full
3AA45601/01/202012/31/2021Full
4BB45609/01/202110/31/2021Part
5AA78901/01/202012/31/2021Full
6BB91106/01/202110/01/2021Part
7AA41101/01/202012/31/2022Full
8BB41109/01/202112/31/2099Part
9CC41112/01/202112/31/2022Full

Few notes (the colors are to clarify the overlapping):

- Observations 1&2 are for the same ID in 2 different stores

- Observations 5& 6 are for distinct ID and for only one stores (no overlap)

- Observations 7-8-9 are for the same ID in 3 different stores

- The new dataset (below) should have 5 new columns to measure overlap from start UNTILL the end of 2021.

- The new column "Type_Overlap" is tricky in that it has to start with that store's membership type first. For example, Obs 7/8/9 are similar in 4 of the new 5 columns EXCEPT the type_overlap.

 

The new columns to be added to the data should be as follows:

Obs_NumStore_IDMember_IDMember_FromMember_ToMember_TypeMnths_Overlaps_till2021Num_Stores_OverlapType_OverlapOverlap_FromOverlap_To
1AA12301/01/202012/31/2021Full72Full/Full06/01/202112/31/2021
2BB12306/01/202112/31/2999Full72Full/Full06/01/202112/31/2021
3AA45601/01/202012/31/2021Full22Full/Part09/01/202110/31/2021
4BB45609/01/202110/31/2021Part22Part/Full09/01/202110/31/2021
5AA78901/01/202012/31/2021Full00Full  
6BB91106/01/202110/01/2021Part00Part  
7AA41101/01/202012/31/2022Full43Full/Part/Full09/01/202112/31/2021
8BB41109/01/202112/31/2099Part43Part/Full/Full09/01/202112/31/2021
9CC41112/01/202112/31/2022Full43Full/Full/Part09/01/202112/31/2021

 

Is there a way to do this in SAS in a meaningful code? My data is a lot much larger than the above, but I hope I captured all the scenarios.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming that no store_id occurs more than once per member_id (so that there won't be a false overlap inferred), then a DATA step passing through each ID twice will do.

 

It uses a 2-way array indexed by YEAR and MONTH to track overlaps.  

 

 

/*Combine and Condense Duplicate Records*/
%let HLB=2019;  /* History lower bound */
%let HUB=2021;  /* History upper bound */

data have;
infile cards expandtabs truncover;
input Obs_Num	Store_ID $	Member_ID $	Member_From :mmddyy10.	Member_To  :mmddyy10.	Member_Type $;
format Member_From Member_To mmddyy10.;
cards;
1	AA	123	01/01/2020	12/31/2021	Full
2	BB	123	06/01/2021	12/31/2999	Full
3	AA	456	01/01/2020	12/31/2021	Full
4	BB	456	09/01/2021	10/31/2021	Part
5	AA	789	01/01/2020	12/31/2021	Full
6	BB	911	06/01/2021	10/01/2021	Part
7	AA	411	01/01/2020	12/31/2022	Full
8	BB	411	09/01/2021	12/31/2099	Part
9	CC	411	12/01/2021	12/31/2022	Full
;

data want (drop=_:);
  /** Do a first pass through a single ID **/
  set have;
  by member_id notsorted;

  array mfreq{&HLB:&HUB,12} _temporary_  /*Frequency by year and month */;
  retain mnths_overlaps_till2021 num_stores_overlap type_overlap overlap_from overlap_to;
  length type_overlap $20;
  format overlap_from overlap_to mmddyy10. ;

  if first.member_id then do;
    call missing(of mfreq{*},of mnths_overlaps_till2021 -- overlap_to);
    mnths_overlaps_till2021 = 0;
  end;

  num_stores_overlap + not(first.member_id=1 and last.member_id=1);

  type_overlap=catx('/',type_overlap,member_type);

  _mdate=member_from;
  do while (_mdate< min(member_to,"31dec&HUB"d));
    _y=year(_mdate);
    _m=month(_mdate);
    mfreq{_y,_m}+1;
    
    if mfreq{_y,_m}=2 then do;
      overlap_from=min(overlap_from,mdy(_m,1,_y));
      overlap_to=max(overlap_to,intnx('month',mdy(_m,1,_y),0,'end'));
      mnths_overlaps_till2021 = intck('month',overlap_from,overlap_to)+1;
    end;
    _mdate=intnx('month',_mdate,1);
  end;

  /* Do a second pass through the same ID, and output */
  if last.member_id then do _i=1 to coalesce(dif(_n_),_n_);
    set have;
    output;
  end;
run;

 

 

Also, this assumes that all overlaps are nested (e.g. for 3 stores A,B, and C:  A/B overlap for 4 months, and   A/C and B/C overlap 1 month within that 4 month A/B overlap).  And you want the longest span of those nested overlaps.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
ballardw
Super User

What if a member has periods at the same store that do not over lap at all what would the output look like:

5	AA	789	01/01/2020	12/31/2021	Full
6	AA	789	01/01/2023	12/31/2023	Full

Or ta different stores

5	AA	789	01/01/2020	12/31/2021	Full
6	BB	789	01/01/2023	12/31/2023	Full

Same store with gap and overlap by different store

5	AA	789	01/01/2020	12/31/2021	Full
6	AA	789	01/01/2023	12/31/2024	Part
7	BB	789	06/01/2021	07/31/2023	Full

Multiple stores with gaps but no overlap as well for the same ID.

Should sequential non-overlapping dates be treated differently:

5	AA	789	01/01/2020	12/31/2021	Full
6	AA	789	01/01/2022	12/31/2022	Full
7	AA	789	01/01/2023	12/31/2023	Full

You will likely get better responses if you provide some example data in the form of a working data step.

 

I am also interested to a minor degree in exactly how the output data set is to be used.

 

Do you have any idea what the maximum number of values might be involved with this Type_overlap? If you data has enough time involved that has the potential of having dozens if not more values in one variable an it is seldom a good idea to ever have more than one value in a single variable.

 

mkeintz
PROC Star

Your overlap period only considers months through 31dec2021.   I get that.  But why is the overlap count for the three observations for ID 411 set to 4?  The date ranges are

  • 01jan2020:31dec2022
  • 01sep2021:31dec2099
    and
  • 01dec2021:31dec2022

Yes, the first two obs overlap for 4 months, but the last one only overlaps for 1, yet you assign a vale of 4 to all three obs.

 

What if the dates were

  • 01jan2020:31dec2021
  • 01jan2020:30jun2020
    and
  • 01jan2021:30apr2021

How many overlap months would those 3 ranges have?

 

What is your rule for measuring overlap?

 

Also 

  1. Do you always have 1 obs for each member_id/store_id combination?
  2. Are the data always sorted by member_id/member_from  member_from within each member_id?
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
infile cards expandtabs truncover;
input Obs_Num	Store_ID $	Member_ID $	Member_From :mmddyy10.	Member_To  :mmddyy10.	Member_Type $;
format Member_From Member_To mmddyy10.;
cards;
1	AA	123	01/01/2020	12/31/2021	Full
2	BB	123	06/01/2021	12/31/2999	Full
3	AA	456	01/01/2020	12/31/2021	Full
4	BB	456	09/01/2021	10/31/2021	Part
5	AA	789	01/01/2020	12/31/2021	Full
6	BB	911	06/01/2021	10/01/2021	Part
7	AA	411	01/01/2020	12/31/2022	Full
8	BB	411	09/01/2021	12/31/2099	Part
9	CC	411	12/01/2021	12/31/2022	Full
;
data have2;
 set have;
 if Member_To>'31dec2021'd then Member_To='31dec2021'd;
run;
data temp;
 set have2;
 do date= Member_From to Member_To;
   output;
 end;
 format date mmddyy10.;
 drop  Member_From Member_To;
run;
proc sort data=temp out=temp2 nodupkey;
by Member_ID date  Obs_Num	Store_ID;
run;
data temp3;
do until(last.date);
 set temp2;
 by Member_ID date ; 
 length _Obs_Num _Store_ID _Member_Type $ 200;
 _Obs_Num=catx('|',_Obs_Num,Obs_Num);
 _Store_ID=catx('|',_Store_ID,Store_ID);
 _Member_Type=catx('|',_Member_Type,Member_Type);
end;
if findc(_Obs_Num,'|') then output;
 drop  Obs_Num	Store_ID Member_Type;
run;
data temp4;
 set temp3;
 by Member_ID date ; 
 if first.Member_ID or dif(date) ne 1 then group+1;
run;
proc sql;
create table want1 as
select group,Member_ID,min(date) as Overlap_From format=mmddyy10.,max(date) as Overlap_To format=mmddyy10.,
intck('month',calculated Overlap_From,calculated Overlap_To,'c')+1 as Mnths_Overlaps_till2021,
max(countw(_Store_ID,'|')) as Num_Stores_Overlap
 from temp4
  group by group,Member_ID;

create table want2 as
select a.*,group,Overlap_From,Overlap_To,
coalesce(Mnths_Overlaps_till2021,0) as Mnths_Overlaps_till2021,
coalesce(Num_Stores_Overlap,0) as Num_Stores_Overlap
 from have as a left join want1 as b
  on a.Member_ID=b.Member_ID
   order by Obs_Num;

create table temp5 as
select distinct group,Member_ID,_Obs_Num,_Member_Type
 from temp4
  group by group,Member_ID
   having countw(_Obs_Num,'|')=max(countw(_Obs_Num,'|'));

create table want3 as
select a.*,_Obs_Num,coalescec(_Member_Type,Member_Type) as _Member_Type
 from want2 as a left join temp5 as b
  on a.group=b.group and a.Member_ID=b.Member_ID
   order by Obs_Num;
quit;

data want4;
 set want3;
 length Type_Overlap $ 200;
 if missing(_Obs_Num) or input(scan(_Obs_Num,1,'|'),best.)=Obs_Num then Type_Overlap=_Member_Type;
  else do;
    p=findw(_Obs_Num,strip(Obs_Num),'|','eit');
    Type_Overlap=scan(_Member_Type,p,'|');
	do i=1 to countw(_Member_Type,'|');
      if i ne p then Type_Overlap=catx('|',Type_Overlap,scan(_Member_Type,i,'|'));
	end;
  end;
drop p i  _Obs_Num _Member_Type;
run;
mkeintz
PROC Star

Assuming that no store_id occurs more than once per member_id (so that there won't be a false overlap inferred), then a DATA step passing through each ID twice will do.

 

It uses a 2-way array indexed by YEAR and MONTH to track overlaps.  

 

 

/*Combine and Condense Duplicate Records*/
%let HLB=2019;  /* History lower bound */
%let HUB=2021;  /* History upper bound */

data have;
infile cards expandtabs truncover;
input Obs_Num	Store_ID $	Member_ID $	Member_From :mmddyy10.	Member_To  :mmddyy10.	Member_Type $;
format Member_From Member_To mmddyy10.;
cards;
1	AA	123	01/01/2020	12/31/2021	Full
2	BB	123	06/01/2021	12/31/2999	Full
3	AA	456	01/01/2020	12/31/2021	Full
4	BB	456	09/01/2021	10/31/2021	Part
5	AA	789	01/01/2020	12/31/2021	Full
6	BB	911	06/01/2021	10/01/2021	Part
7	AA	411	01/01/2020	12/31/2022	Full
8	BB	411	09/01/2021	12/31/2099	Part
9	CC	411	12/01/2021	12/31/2022	Full
;

data want (drop=_:);
  /** Do a first pass through a single ID **/
  set have;
  by member_id notsorted;

  array mfreq{&HLB:&HUB,12} _temporary_  /*Frequency by year and month */;
  retain mnths_overlaps_till2021 num_stores_overlap type_overlap overlap_from overlap_to;
  length type_overlap $20;
  format overlap_from overlap_to mmddyy10. ;

  if first.member_id then do;
    call missing(of mfreq{*},of mnths_overlaps_till2021 -- overlap_to);
    mnths_overlaps_till2021 = 0;
  end;

  num_stores_overlap + not(first.member_id=1 and last.member_id=1);

  type_overlap=catx('/',type_overlap,member_type);

  _mdate=member_from;
  do while (_mdate< min(member_to,"31dec&HUB"d));
    _y=year(_mdate);
    _m=month(_mdate);
    mfreq{_y,_m}+1;
    
    if mfreq{_y,_m}=2 then do;
      overlap_from=min(overlap_from,mdy(_m,1,_y));
      overlap_to=max(overlap_to,intnx('month',mdy(_m,1,_y),0,'end'));
      mnths_overlaps_till2021 = intck('month',overlap_from,overlap_to)+1;
    end;
    _mdate=intnx('month',_mdate,1);
  end;

  /* Do a second pass through the same ID, and output */
  if last.member_id then do _i=1 to coalesce(dif(_n_),_n_);
    set have;
    output;
  end;
run;

 

 

Also, this assumes that all overlaps are nested (e.g. for 3 stores A,B, and C:  A/B overlap for 4 months, and   A/C and B/C overlap 1 month within that 4 month A/B overlap).  And you want the longest span of those nested overlaps.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 934 views
  • 0 likes
  • 4 in conversation