Thank you sir @Kurt_Bremser I was lazy to create bigger samples but you made it easy to test and here i did for 1 million families
67 data have;
68 do famid = 1 to 1e6;
69 do indid = 1 to 5;
70 do implicate = 1 to 5;
71 imp_inc = 10000; /* just some value */
72 output;
73 end;
74 end;
75 end;
76 run;
NOTE: The data set WORK.HAVE has 25000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 1.63 seconds
cpu time 1.62 seconds
77 data want1;
78 array t(999) _temporary_;
79 call missing(of t(*));
80 do until(last.famid);
81 set have;
82 by famid;
83 if indid=1 then t(implicate)=imp_inc;
84 else imp_inc=t(implicate);
85 output;
86 end;
87 run;
NOTE: There were 25000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT1 has 25000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 3.87 seconds
cpu time 3.29 seconds
88
89 data want (drop=rc);
90 set have;
91 by famid;
92 if _n_ = 1
93 then do;
94 declare hash h();
95 h.definekey('implicate');
96 h.definedata('imp_inc');
97 h.definedone();
98 end;
99 if indid = 1
100 then rc = h.add();
101 else rc = h.find();
102 if last.famid then rc = h.clear();
103 run;
NOTE: There were 25000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 25000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 5.82 seconds
cpu time 5.70 seconds