Hello
I have a data set with following columns:
Group ID
individual ID
month YYYYMM
so for a specific Group ID can have multiple individual IDS.
My task- I need to identify if there is a change over time in the family structure.
For example:
Family 123456:
in 202401 family structure is 111,222,333 and in 202402 111,222,444 so since it is not same then Indincator=1
What is the way to calculate it?
Why my code provide wrong result?
Data have;
Input month GroupID IndividualID;
GROUP_Id_MONTH=catx('-',GroupID,month);
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
202402 135798 555
202402 135798 666
;
Run;
/*Data Want;*/
/*input GroupID Ind_Change;*/
/*cards;*/
/*123456 1*/
/*135798 0*/
/*;*/
/*Run;*/
PROC SORT DATA=have ;BY GROUP_Id_MONTH;rUN;
data _WANT_;
length cat $20.;
do until (last.GROUP_Id_MONTH);
set have;
by GROUP_Id_MONTH notsorted;
cat=catx(',',cat,IndividualID);
end;
drop IndividualID;
lag_cat=LAG1(cat);
if FIRST.GROUP_Id_MONTH THEN DO;
lag_cat=.;
Ind_Change=.;
end;
else do;
IF lag_cat ne cat then Ind_Change=1;
end;
retain Ind_Change;
Run;
How about this? This assumes your individualID variable is numeric. Also, it's comparing the current month with whatever the previous month (if any) was - so if there are 3 unique months for a GroupID, then, e.g., month 3 is being compared to month 2, not month 1.
Data have;
Input month GroupID IndividualID;
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
202402 135798 555
202402 135798 666
;
Run;
proc sql noprint;
select max(inid) into :maxInID trimmed
from (select groupID, count(distinct individualID) as inid from have group by groupID);
quit;
proc sort data=have; by groupID month individualID; run;
data want;
set have;
by groupID month;
array ID {*} InID1-InID&maxInID;
array T {&maxInID} _temporary_;
length isfirstGRP 3;
retain ID isfirstGRP;
if first.month then do;
i=0;
isfirstGRP=first.groupID;
call missing(of ID[*]);
end;
i+1;
ID[i]=individualID;
if last.month then do;
indicator=0;
do i=1 to dim(ID);
if ID[i]^=T[i] then do;
if isfirstGRP=0 then indicator=1;
end;
T[i]=ID[i];
end;
output;
end;
drop i individualID;
run;
proc print data=want; run;
The output from above is transposed. If you need it back in its original structure, you'd just modify the last data step as follows (see comment, "changes below here"):
data want;
set have;
by groupID month;
array ID {*} InID1-InID&maxInID;
array T {&maxInID} _temporary_;
length isfirstGRP 3;
retain ID isfirstGRP;
if first.month then do;
i=0;
isfirstGRP=first.groupID;
call missing(of ID[*]);
end;
i+1;
ID[i]=individualID;
if last.month then do;
indicator=0;
do i=1 to dim(ID);
if ID[i]^=T[i] then do;
if isfirstGRP=0 then indicator=1;
end;
T[i]=ID[i];
end;
** changes below here ;
do i=1 to dim(ID);
if missing(ID[i]) then leave;
individualID=ID[i];
output;
end;
end;
drop i InID:;
run;
Just aggregate for each MONTH nested under each GROUPID. Tricky part is knowing when you are on the first month for a GROUPID since it could have multiple observations. So make your own month number counter.
First let's add a duplicate month so the test data is more complete:
data have;
input GroupID Month IndividualID ;
cards;
123456 202401 111
123456 202401 222
123456 202401 333
123456 202402 111
123456 202402 222
123456 202403 444
123456 202403 111
123456 202403 222
123456 202402 444
135798 202401 555
135798 202401 666
135798 202402 555
135798 202402 666
;
So generate the list for a month. lag the value. Make sure not to use the lagged value on the first month for a group. Write the observation before incrementing the month counter so the output dataset is not confusing.
proc sort data=have;
by GroupID month IndividualID ;
run;
data want;
do until(last.month);
set have;
by GroupID month IndividualID ;
if first.groupid then monthno=1;
length all_ids $200;
all_ids=catx('-',all_ids,IndividualID);
end;
lag_ids = lag(all_ids);
if monthno=1 then do;
changed=0;
lag_ids=' ';
end;
else changed= (lag_ids ne all_ids);
output;
monthno+1;
drop individualid;
run;
Results
Group Obs ID Month monthno all_ids lag_ids changed 1 123456 202401 1 111-222-333 0 2 123456 202402 2 111-222-444 111-222-333 1 3 123456 202403 3 111-222-444 111-222-444 0 4 135798 202401 1 555-666 0 5 135798 202402 2 555-666 555-666 0
Or just remember the first month for each group.
data want;
do until(last.month);
set have;
by GroupID month IndividualID ;
if first.groupid then firstmonth=month;
retain firstmonth;
length all_ids $200;
all_ids=catx('-',all_ids,IndividualID);
end;
changed= (all_ids ne lag(all_ids));
if month=firstmonth then changed=0;
drop individualid firstmonth;
run;
Does your family structure always matched one to one ? a.k.a data is balanced for each GroupID and Month .
And you didn't post desired output yet ?
Assuming I understood what you mean.
Data have;
Input month GroupID IndividualID;
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
202402 135798 555
202402 135798 666
;
Run;
proc sql;
create table want as
select *,
case when count(distinct month)=(select count(distinct month) from have where GroupID=a.GroupID) then 0
else 1 end as Indincator
from have as a
group by GroupID,IndividualID
order by GroupID,Month,IndividualID;
quit;
One more approach with hash tables, I've added third group (123457) for tests.
Data have;
Input month GroupID IndividualID;
GROUP_Id_MONTH=catx('-',GroupID,month);
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
202402 135798 555
202402 135798 666
202401 123457 111
202401 123457 222
202401 123457 333
202402 123457 111
202402 123457 222
;
Run;
data want;
declare hash family();
family.defineKey('IndividualID');
family.defineDone();
family.clear();
call missing(flag, period, fs);
do until(last.GroupID);
set have;
by GroupID month notsorted;
period+first.month;
if 1=period then /* create first family setup in the first period */
family.add(); /* add an individual to the first family */
else /* compare family setup in other periods */
do;
if first.month then fs=1;
else fs+1; /* other family size */
flag + (family.check()) /* check for new member */
+ (family.num_items NE fs)*last.month /* size change (e.g., from 3 to 2) */
;
end;
end;
flag = flag&1;
output;
keep GroupID flag;
run;
proc print;
run;
Bart
[EDIT:]
And of course with arrays:
data want2;
array ids[100] _temporary_; /* 100 - assumed max family size */
call missing(flag, of ids[*], period, ffs, ofs);
do until(last.GroupID);
set have;
by GroupID month notsorted;
period+first.month;
if 1=period then /* create first family setup in the first period */
do;
ffs+1; /* first family size */
ids[ffs]=IndividualID; /* add individual to the first family */
end;
else /* compare family setup in other periods */
do;
if first.month then ofs=1;
else ofs+1; /* other family size */
flag + (IndividualID not in IDS) /* check for new member */
+ (ffs NE ofs)*last.month /* size change (e.g., from 3 to 2) */
;
end;
end;
flag = flag&1;
output;
keep GroupID flag;
run;
One more for not sorted groups of families, see the data.
/* HASH TABLES - data not sorted family groups */
Data have;
Input month GroupID IndividualID;
GROUP_Id_MONTH=catx('-',GroupID,month);
cards;
202401 123456 111
202401 123456 222
202401 123456 333
202401 123457 1111
202401 123457 2222
202401 123457 3333
202402 135798 555
202402 135798 666
202402 123457 1111
202402 123457 2222
202402 123456 111
202402 123456 222
202402 123456 444
202401 135798 555
202401 135798 666
;
Run;
data _null_;
if 1=_N_ then
do;
declare hash family(ordered:"A"); /* create first family setup in the first period */
family.defineKey('GroupID', 'IndividualID');
family.defineDone();
declare hash grpIDflag(); /* store flag for group */
grpIDflag.defineKey('GroupID');
grpIDflag.defineData('GroupID', 'flag');
grpIDflag.defineDone();
end;
do fs = 1 by 1 until(last.GroupID);
set have end=_E_;
by month GroupID notsorted;
if grpIDflag.find() then family.add(); /* add an individual to the first family */
else /* compare family setup in other periods */
flag + (family.check()) /* check for new member */
+ (family.num_items NE fs)*last.month /* size change (e.g., from 3 to 2) */
;
end;
flag = flag&1;
grpIDflag.replace();
if _E_;
grpIDflag.output(dataset:'want3');
/* family.output(dataset:'firstFamily'); */
keep GroupID flag;
run;
proc print data=want3;
run;
Bart
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.