Creating a new dummy variable (dum) as 1 for current year if at least past two consecutive year (t-2, t-1) has div = 0, and current year (t) div = 1, and at least next two consecutive year (t+1, t+2) has div = 1.
To be noted, this is a panel data structure. I have over 10,000 different PERMNO. Really appreciate if someone can help, thanks a lot
Data Have;
PERMNO | Fyear | div |
10100 | 1980 | 0 |
10100 | 1981 | 0 |
10100 | 1982 | 1 |
10100 | 1983 | 1 |
10100 | 1984 | 0 |
10100 | 1985 | 1 |
10100 | 1986 | 1 |
10100 | 1987 | 1 |
10100 | 1988 | 0 |
10100 | 1989 | 0 |
10100 | 1990 | 0 |
10100 | 1991 | 1 |
10100 | 1992 | 1 |
10100 | 1993 | 1 |
10100 | 1994 | 1 |
10100 | 1995 | 0 |
Data want;
PERMNO | Fyear | div | dum |
10100 | 1980 | 0 | 0 |
10100 | 1981 | 0 | 0 |
10100 | 1982 | 1 | 0 |
10100 | 1983 | 1 | 0 |
10100 | 1984 | 0 | 0 |
10100 | 1985 | 1 | 0 |
10100 | 1986 | 1 | 0 |
10100 | 1987 | 1 | 0 |
10100 | 1988 | 0 | 0 |
10100 | 1989 | 0 | 0 |
10100 | 1990 | 0 | 0 |
10100 | 1991 | 1 | 1 |
10100 | 1992 | 1 | 0 |
10100 | 1993 | 1 | 0 |
10100 | 1994 | 1 | 0 |
10100 | 1995 | 0 | 0 |
Here is an alternative solution
data want(keep = PERMNO Fyear div dummy);
array v {-3 : 2} _temporary_ (0 0 0 1 1 1);
set have curobs = c;
cum = 0;
do i = lbound(v) to hbound(v);
p = (c + i);
if p > n | p le 0 then leave;
set have(keep = permno div rename = (permno = pn div = d)) point = p nobs = n;
if permno ne pn then leave;
if d = v[i] then cum + 1;
else leave;
end;
dummy = (cum = 6);
run;
Try this. I added another PERMNO for demonstration. Should be reasonably fast even for large data and many PERMNO.
Feel free to ask.
data have;
input PERMNO Fyear div;
datalines;
10100 1980 0
10100 1981 0
10100 1982 1
10100 1983 1
10100 1984 0
10100 1985 1
10100 1986 1
10100 1987 1
10100 1988 0
10100 1989 0
10100 1990 0
10100 1991 1
10100 1992 1
10100 1993 1
10100 1994 1
10100 1995 0
10101 1980 0
10101 1981 0
10101 1982 1
10101 1983 1
10101 1984 0
10101 1985 1
10101 1986 1
10101 1987 1
10101 1988 0
10101 1989 0
10101 1990 0
10101 1991 1
10101 1992 1
10101 1993 1
10101 1994 1
10101 1995 0
;
data have(drop = c d y rc);
array v {-2 : 2} _temporary_ (0 0 1 1 1);
dcl hash h();
h.definekey("PERMNO", "Fyear");
h.definedata("div");
h.definedone();
do until (last.PERMNO);
set have;
by PERMNO;
h.add();
end;
do until (last.PERMNO);
set have;
by PERMNO;
c = 0;
d = div;
do y = -2 to 2;
if h.find(key : PERMNO, key : Fyear + y) = 0 and div = v[y] then c + 1;
end;
dummy = (c=5);
div = d;
output;
end;
rc = h.clear();
run;
Result:
Obs PERMNO Fyear div dummy 1 10100 1980 0 0 2 10100 1981 0 0 3 10100 1982 1 0 4 10100 1983 1 0 5 10100 1984 0 0 6 10100 1985 1 0 7 10100 1986 1 0 8 10100 1987 1 0 9 10100 1988 0 0 10 10100 1989 0 0 11 10100 1990 0 0 12 10100 1991 1 1 13 10100 1992 1 0 14 10100 1993 1 0 15 10100 1994 1 0 16 10100 1995 0 0 17 10101 1980 0 0 18 10101 1981 0 0 19 10101 1982 1 0 20 10101 1983 1 0 21 10101 1984 0 0 22 10101 1985 1 0 23 10101 1986 1 0 24 10101 1987 1 0 25 10101 1988 0 0 26 10101 1989 0 0 27 10101 1990 0 0 28 10101 1991 1 1 29 10101 1992 1 0 30 10101 1993 1 0 31 10101 1994 1 0 32 10101 1995 0 0
Dear Peter,
thanks for your prompt response, it seems working using your data. I am not exactly sure what the code is doing. What if I change the rules now, say
instead of at least past two consecutive year, I am now requiring past three consecutive year (t-3, t-2, t-1) has div = 0, and current year (t) div = 1, and at least next two consecutive year (t+1, t+2) has div = 1.
I will have to explore other rules, but helping with this change will help me to understand what your code is doing, thank you.
No Problem. I made the changes and commented where the changes are done.
Let me know if it works 🙂
data have(drop = c d y rc);
array v {-3 : 2} _temporary_ (0 0 0 1 1 1); /* Here */
dcl hash h();
h.definekey("PERMNO", "Fyear");
h.definedata("div");
h.definedone();
do until (last.PERMNO);
set have;
by PERMNO;
h.add();
end;
do until (last.PERMNO);
set have;
by PERMNO;
c = 0;
d = div;
do y = -3 to 2; /* Here */
if h.find(key : PERMNO, key : Fyear + y) = 0 and div = v[y] then c + 1;
end;
dummy = (c = 6); /* Here */
div = d;
output;
end;
rc = h.clear();
run;
Dear Peter,
I am very sure it will work. I will try with my actual dataset tomorrow, and will let you know. thanks for your great work
Anytime. Please remember to close the thread if it does 🙂
Also, welcome to the community!
Here is an alternative solution
data want(keep = PERMNO Fyear div dummy);
array v {-3 : 2} _temporary_ (0 0 0 1 1 1);
set have curobs = c;
cum = 0;
do i = lbound(v) to hbound(v);
p = (c + i);
if p > n | p le 0 then leave;
set have(keep = permno div rename = (permno = pn div = d)) point = p nobs = n;
if permno ne pn then leave;
if d = v[i] then cum + 1;
else leave;
end;
dummy = (cum = 6);
run;
Thank you, it works perfectly!
EDIT: @Sandi1, just edited the code a bit. Does the same thing, but you only need to change one line for the different logic.
data have(drop = c d y rc);
array v {-3 : 2} _temporary_ (0 0 0 1 1 1); /* Here */
dcl hash h();
h.definekey("PERMNO", "Fyear");
h.definedata("d");
h.definedone();
do until (last.PERMNO);
set have;
by PERMNO;
d = div;
h.add();
end;
do until (last.PERMNO);
set have;
by PERMNO;
c = 0;
do y = lbound(v) to hbound(v);
if h.find(key : PERMNO, key : Fyear + y) = 0 and d = v[y] then c + 1;
end;
dummy = (c = dim(v));
output;
end;
rc = h.clear();
run;
Assuming there is not gap between years.
data have; input PERMNO Fyear div; datalines; 10100 1980 0 10100 1981 0 10100 1982 1 10100 1983 1 10100 1984 0 10100 1985 1 10100 1986 1 10100 1987 1 10100 1988 0 10100 1989 0 10100 1990 0 10100 1991 1 10100 1992 1 10100 1993 1 10100 1994 1 10100 1995 0 10101 1980 0 10101 1981 0 10101 1982 1 10101 1983 1 10101 1984 0 10101 1985 1 10101 1986 1 10101 1987 1 10101 1988 0 10101 1989 0 10101 1990 0 10101 1991 1 10101 1992 1 10101 1993 1 10101 1994 1 10101 1995 0 ; data want; merge have have(keep=PERMNO div rename=(PERMNO=_PERMNO div=_div) firstobs=2) have(keep=PERMNO div rename=(PERMNO=__PERMNO div=__div) firstobs=3); if lag2(PERMNO)=PERMNO and lag(PERMNO)=PERMNO and lag2(div)=0 and lag(div)=0 and PERMNO=__PERMNO and PERMNO=_PERMNO and __div=1 and _div=1 and div=1 then dummy=1; else dummy=0; drop _: ; run;
I don't know which way is better. good luck.
data dum (Keep=PERMNO Fyear dum sortedby=PERMNO Fyear);
if 0 then set have;
length journey $5; /*FIFO array*/
journey="";
retain journey;
do until (last.PERMNO); /*DO-Whitlock*/
set have(rename=(Fyear=Fyear_));
by PERMNO;
lag_Fyear_=lag(Fyear_);
if not first.PERMNO and lag_Fyear_^=Fyear_-1 then do;
putlog "ERROR: years not consecutive" _ALL_;
/*Check, Just in case*/
_ERROR_=1;
end;
if length(Journey)=5
/*"pop" first "div" from journey (FIFO rules) if full, IOW slide journey once to the left*/
then journey=substr(journey,2,4);
/*push this "div" onto the end*/
journey=strip(strip(journey)||put(div,1.0));
dum = journey in ("00111" );
Fyear=Fyear_-2;
if length(Journey)=5
then output;
end;
data want;
update have dum;
by PERMNO Fyear;
run;
Thanks for your thoughts. Appreciate
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.