BookmarkSubscribeRSS Feed
Ronein
Onyx | Level 15

Hello

I have a data set with following columns:

Group ID

individual ID

month YYYYMM

so for a specific Group ID can have multiple individual IDS.

My task- I need to identify if there is a change over time in the family structure.

For example:

Family 123456: 

in 202401  family structure is 111,222,333  and in 202402  111,222,444  so since it is not same then Indincator=1

What is the way to calculate it?

Why my code provide wrong result?

Data have;
Input month GroupID  IndividualID;
GROUP_Id_MONTH=catx('-',GroupID,month);
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
202402 135798 555
202402 135798 666
;
Run;

/*Data Want;*/
/*input GroupID Ind_Change;*/
/*cards;*/
/*123456 1*/
/*135798 0*/
/*;*/
/*Run;*/

PROC SORT DATA=have  ;BY GROUP_Id_MONTH;rUN;
data _WANT_;
length cat $20.;
do until (last.GROUP_Id_MONTH);
set have;
by GROUP_Id_MONTH notsorted;
cat=catx(',',cat,IndividualID);
end;
drop IndividualID;

lag_cat=LAG1(cat);
if FIRST.GROUP_Id_MONTH THEN DO;
lag_cat=.;
Ind_Change=.;
end;
else do;
IF lag_cat ne cat then Ind_Change=1;
end;
retain Ind_Change;
Run;
 

 

 

 

6 REPLIES 6
quickbluefish
Barite | Level 11

How about this?  This assumes your individualID variable is numeric.  Also, it's comparing the current month with whatever the previous month (if any) was - so if there are 3 unique months for a GroupID, then, e.g., month 3 is being compared to month 2, not month 1.  

Data have;
Input month GroupID IndividualID;
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
202402 135798 555
202402 135798 666
;
Run;

proc sql noprint;
select max(inid) into :maxInID trimmed
from (select groupID, count(distinct individualID) as inid from have group by groupID);
quit;

proc sort data=have; by groupID month individualID; run;

data want;
set have;
by groupID month;
array ID {*} InID1-InID&maxInID;
array T {&maxInID} _temporary_;
length isfirstGRP 3;
retain ID isfirstGRP;
if first.month then do;
    i=0;
    isfirstGRP=first.groupID;
    call missing(of ID[*]);
end;
i+1;
ID[i]=individualID;
if last.month then do;
    indicator=0;
    do i=1 to dim(ID);
        if ID[i]^=T[i] then do;
            if isfirstGRP=0 then indicator=1;
        end;
        T[i]=ID[i];
    end;
    output;
end;
drop i individualID;
run;

proc print data=want; run;
quickbluefish
Barite | Level 11

The output from above is transposed.  If you need it back in its original structure, you'd just modify the last data step as follows (see comment, "changes below here"):

data want;
set have;
by groupID month;
array ID {*} InID1-InID&maxInID;
array T {&maxInID} _temporary_;
length isfirstGRP 3;
retain ID isfirstGRP;
if first.month then do;
    i=0;
    isfirstGRP=first.groupID;
    call missing(of ID[*]);
end;
i+1;
ID[i]=individualID;
if last.month then do;
    indicator=0;
    do i=1 to dim(ID);
        if ID[i]^=T[i] then do;
            if isfirstGRP=0 then indicator=1;
        end;
        T[i]=ID[i];
    end;
    ** changes below here ;
    do i=1 to dim(ID);
    	if missing(ID[i]) then leave;
    	individualID=ID[i];
    	output;
    end;
end;
drop i InID:;
run;
Tom
Super User Tom
Super User

Just aggregate for each MONTH nested under each GROUPID.  Tricky part is knowing when you are on the first month for a GROUPID since it could have multiple observations.  So make your own month number counter.

 

First let's add a duplicate month so the test data is more complete:

data have;
  input GroupID Month IndividualID ;
cards;
123456 202401 111
123456 202401 222
123456 202401 333
123456 202402 111
123456 202402 222
123456 202403 444
123456 202403 111
123456 202403 222
123456 202402 444
135798 202401 555
135798 202401 666
135798 202402 555
135798 202402 666
;

So generate the list for a month.  lag the value.  Make sure not to use the lagged value on the first month for a group.  Write the observation before incrementing the month counter so the output dataset is not confusing.

proc sort data=have;
  by GroupID month IndividualID ;
run;

data want;
  do until(last.month);
    set have;
    by GroupID month IndividualID ;
    if first.groupid then monthno=1; 
    length all_ids $200;
    all_ids=catx('-',all_ids,IndividualID);
  end;
  lag_ids = lag(all_ids);
  if monthno=1 then do;
    changed=0;
    lag_ids=' ';
  end;
  else changed= (lag_ids ne all_ids);
  output;
  monthno+1;
  drop individualid;
run;

Results

        Group
Obs      ID       Month    monthno    all_ids        lag_ids        changed

 1     123456    202401       1       111-222-333                      0
 2     123456    202402       2       111-222-444    111-222-333       1
 3     123456    202403       3       111-222-444    111-222-444       0
 4     135798    202401       1       555-666                          0
 5     135798    202402       2       555-666        555-666           0

Or just remember the first month for each group.

data want;
  do until(last.month);
    set have;
    by GroupID month IndividualID ;
    if first.groupid then firstmonth=month;
    retain firstmonth;
    length all_ids $200;
    all_ids=catx('-',all_ids,IndividualID);
  end;
  changed= (all_ids ne lag(all_ids));
  if month=firstmonth then changed=0;
  drop individualid firstmonth;
run;
Ksharp
Super User

Does your  family structure always matched one to one ? a.k.a  data is balanced for each GroupID and Month .

And you didn't post desired output yet ?

 

Assuming I understood what you mean.

Data have;
Input month GroupID  IndividualID;
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
202402 135798 555
202402 135798 666
;
Run;

proc sql;
create table want as
select *,
case when count(distinct month)=(select count(distinct month) from have where GroupID=a.GroupID) then 0
 else 1 end as Indincator
 from have as a
  group by GroupID,IndividualID 
    order by GroupID,Month,IndividualID;
quit;

Ksharp_0-1736475570876.png

 

 

yabwon
Amethyst | Level 16

One more approach with hash tables, I've added third group (123457) for tests.

 

Data have;
Input month GroupID  IndividualID;
GROUP_Id_MONTH=catx('-',GroupID,month);
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
202402 135798 555
202402 135798 666
202401 123457 111
202401 123457 222
202401 123457 333
202402 123457 111
202402 123457 222
;
Run;

data want;
  declare hash family();
  family.defineKey('IndividualID');
  family.defineDone();

  family.clear();
  call missing(flag, period, fs);

  do until(last.GroupID);
    set have;
    by GroupID month notsorted;
    period+first.month;

    if 1=period then                                 /* create first family setup in the first period */
      family.add();                                  /* add an individual to the first family */
    else                                             /* compare family setup in other periods */
      do;
        if first.month then fs=1;        
                       else fs+1;                    /* other family size */
        flag + (family.check())                      /* check for new member */
             + (family.num_items NE fs)*last.month   /* size change (e.g., from 3 to 2) */
        ;
      end;
  end;

  flag = flag&1;
  output;
  keep GroupID flag;
run;

proc print;
run;

Bart

 

[EDIT:]

And of course with arrays:

data want2;
  array ids[100] _temporary_; /* 100 - assumed max family size */
  call missing(flag, of ids[*], period, ffs, ofs);

  do until(last.GroupID);
    set have;
    by GroupID month notsorted;
    period+first.month;

    if 1=period then /* create first family setup in the first period */
      do;
        ffs+1;                 /* first family size */
        ids[ffs]=IndividualID; /* add individual to the first family */
      end;
    else /* compare family setup in other periods */
      do;
        if first.month then ofs=1;        
                       else ofs+1;       /* other family size */
        flag + (IndividualID not in IDS) /* check for new member */
             + (ffs NE ofs)*last.month   /* size change (e.g., from 3 to 2) */
        ;
      end;
  end;

  flag = flag&1;
  output;
  keep GroupID flag;
run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Amethyst | Level 16

One more for not sorted groups of families, see the data.

 

/* HASH TABLES - data not sorted family groups */
Data have;
Input month GroupID  IndividualID;
GROUP_Id_MONTH=catx('-',GroupID,month);
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202401 123457 1111
202401 123457 2222
202401 123457 3333
202402 135798 555
202402 135798 666
202402 123457 1111
202402 123457 2222
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
;
Run;

data _null_;
  if 1=_N_ then 
    do;
      declare hash family(ordered:"A"); /* create first family setup in the first period */
      family.defineKey('GroupID', 'IndividualID');
      family.defineDone();

      declare hash grpIDflag(); /* store flag for group */
      grpIDflag.defineKey('GroupID');
      grpIDflag.defineData('GroupID', 'flag');
      grpIDflag.defineDone();
    end;
  

  do fs = 1 by 1 until(last.GroupID);
    set have end=_E_;
    by month GroupID notsorted;

    if grpIDflag.find() then family.add();        /* add an individual to the first family */
    else                                          /* compare family setup in other periods */
      flag + (family.check())                     /* check for new member */
           + (family.num_items NE fs)*last.month  /* size change (e.g., from 3 to 2) */
      ;  
  end;

  flag = flag&1;
  grpIDflag.replace();
  
  if _E_;
  grpIDflag.output(dataset:'want3');
  /* family.output(dataset:'firstFamily'); */
  keep GroupID flag;
run;

proc print data=want3;
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 4009 views
  • 0 likes
  • 5 in conversation