Creating a new dummy variable (dum) as 1 for current year if at least past two consecutive year (t-2, t-1) has div = 0, and current year (t) div = 1, and at least next two consecutive year (t+1, t+2) has div = 1.
To be noted, this is a panel data structure. I have over 10,000 different PERMNO. Really appreciate if someone can help, thanks a lot
Data Have;
PERMNO | Fyear | div |
10100 | 1980 | 0 |
10100 | 1981 | 0 |
10100 | 1982 | 1 |
10100 | 1983 | 1 |
10100 | 1984 | 0 |
10100 | 1985 | 1 |
10100 | 1986 | 1 |
10100 | 1987 | 1 |
10100 | 1988 | 0 |
10100 | 1989 | 0 |
10100 | 1990 | 0 |
10100 | 1991 | 1 |
10100 | 1992 | 1 |
10100 | 1993 | 1 |
10100 | 1994 | 1 |
10100 | 1995 | 0 |
Data want;
PERMNO | Fyear | div | dum |
10100 | 1980 | 0 | 0 |
10100 | 1981 | 0 | 0 |
10100 | 1982 | 1 | 0 |
10100 | 1983 | 1 | 0 |
10100 | 1984 | 0 | 0 |
10100 | 1985 | 1 | 0 |
10100 | 1986 | 1 | 0 |
10100 | 1987 | 1 | 0 |
10100 | 1988 | 0 | 0 |
10100 | 1989 | 0 | 0 |
10100 | 1990 | 0 | 0 |
10100 | 1991 | 1 | 1 |
10100 | 1992 | 1 | 0 |
10100 | 1993 | 1 | 0 |
10100 | 1994 | 1 | 0 |
10100 | 1995 | 0 | 0 |
Here is an alternative solution
data want(keep = PERMNO Fyear div dummy);
array v {-3 : 2} _temporary_ (0 0 0 1 1 1);
set have curobs = c;
cum = 0;
do i = lbound(v) to hbound(v);
p = (c + i);
if p > n | p le 0 then leave;
set have(keep = permno div rename = (permno = pn div = d)) point = p nobs = n;
if permno ne pn then leave;
if d = v[i] then cum + 1;
else leave;
end;
dummy = (cum = 6);
run;
Try this. I added another PERMNO for demonstration. Should be reasonably fast even for large data and many PERMNO.
Feel free to ask.
data have;
input PERMNO Fyear div;
datalines;
10100 1980 0
10100 1981 0
10100 1982 1
10100 1983 1
10100 1984 0
10100 1985 1
10100 1986 1
10100 1987 1
10100 1988 0
10100 1989 0
10100 1990 0
10100 1991 1
10100 1992 1
10100 1993 1
10100 1994 1
10100 1995 0
10101 1980 0
10101 1981 0
10101 1982 1
10101 1983 1
10101 1984 0
10101 1985 1
10101 1986 1
10101 1987 1
10101 1988 0
10101 1989 0
10101 1990 0
10101 1991 1
10101 1992 1
10101 1993 1
10101 1994 1
10101 1995 0
;
data have(drop = c d y rc);
array v {-2 : 2} _temporary_ (0 0 1 1 1);
dcl hash h();
h.definekey("PERMNO", "Fyear");
h.definedata("div");
h.definedone();
do until (last.PERMNO);
set have;
by PERMNO;
h.add();
end;
do until (last.PERMNO);
set have;
by PERMNO;
c = 0;
d = div;
do y = -2 to 2;
if h.find(key : PERMNO, key : Fyear + y) = 0 and div = v[y] then c + 1;
end;
dummy = (c=5);
div = d;
output;
end;
rc = h.clear();
run;
Result:
Obs PERMNO Fyear div dummy 1 10100 1980 0 0 2 10100 1981 0 0 3 10100 1982 1 0 4 10100 1983 1 0 5 10100 1984 0 0 6 10100 1985 1 0 7 10100 1986 1 0 8 10100 1987 1 0 9 10100 1988 0 0 10 10100 1989 0 0 11 10100 1990 0 0 12 10100 1991 1 1 13 10100 1992 1 0 14 10100 1993 1 0 15 10100 1994 1 0 16 10100 1995 0 0 17 10101 1980 0 0 18 10101 1981 0 0 19 10101 1982 1 0 20 10101 1983 1 0 21 10101 1984 0 0 22 10101 1985 1 0 23 10101 1986 1 0 24 10101 1987 1 0 25 10101 1988 0 0 26 10101 1989 0 0 27 10101 1990 0 0 28 10101 1991 1 1 29 10101 1992 1 0 30 10101 1993 1 0 31 10101 1994 1 0 32 10101 1995 0 0
Dear Peter,
thanks for your prompt response, it seems working using your data. I am not exactly sure what the code is doing. What if I change the rules now, say
instead of at least past two consecutive year, I am now requiring past three consecutive year (t-3, t-2, t-1) has div = 0, and current year (t) div = 1, and at least next two consecutive year (t+1, t+2) has div = 1.
I will have to explore other rules, but helping with this change will help me to understand what your code is doing, thank you.
No Problem. I made the changes and commented where the changes are done.
Let me know if it works 🙂
data have(drop = c d y rc);
array v {-3 : 2} _temporary_ (0 0 0 1 1 1); /* Here */
dcl hash h();
h.definekey("PERMNO", "Fyear");
h.definedata("div");
h.definedone();
do until (last.PERMNO);
set have;
by PERMNO;
h.add();
end;
do until (last.PERMNO);
set have;
by PERMNO;
c = 0;
d = div;
do y = -3 to 2; /* Here */
if h.find(key : PERMNO, key : Fyear + y) = 0 and div = v[y] then c + 1;
end;
dummy = (c = 6); /* Here */
div = d;
output;
end;
rc = h.clear();
run;
Dear Peter,
I am very sure it will work. I will try with my actual dataset tomorrow, and will let you know. thanks for your great work
Anytime. Please remember to close the thread if it does 🙂
Also, welcome to the community!
Here is an alternative solution
data want(keep = PERMNO Fyear div dummy);
array v {-3 : 2} _temporary_ (0 0 0 1 1 1);
set have curobs = c;
cum = 0;
do i = lbound(v) to hbound(v);
p = (c + i);
if p > n | p le 0 then leave;
set have(keep = permno div rename = (permno = pn div = d)) point = p nobs = n;
if permno ne pn then leave;
if d = v[i] then cum + 1;
else leave;
end;
dummy = (cum = 6);
run;
Thank you, it works perfectly!
EDIT: @Sandi1, just edited the code a bit. Does the same thing, but you only need to change one line for the different logic.
data have(drop = c d y rc);
array v {-3 : 2} _temporary_ (0 0 0 1 1 1); /* Here */
dcl hash h();
h.definekey("PERMNO", "Fyear");
h.definedata("d");
h.definedone();
do until (last.PERMNO);
set have;
by PERMNO;
d = div;
h.add();
end;
do until (last.PERMNO);
set have;
by PERMNO;
c = 0;
do y = lbound(v) to hbound(v);
if h.find(key : PERMNO, key : Fyear + y) = 0 and d = v[y] then c + 1;
end;
dummy = (c = dim(v));
output;
end;
rc = h.clear();
run;
Assuming there is not gap between years.
data have;
input PERMNO Fyear div;
datalines;
10100 1980 0
10100 1981 0
10100 1982 1
10100 1983 1
10100 1984 0
10100 1985 1
10100 1986 1
10100 1987 1
10100 1988 0
10100 1989 0
10100 1990 0
10100 1991 1
10100 1992 1
10100 1993 1
10100 1994 1
10100 1995 0
10101 1980 0
10101 1981 0
10101 1982 1
10101 1983 1
10101 1984 0
10101 1985 1
10101 1986 1
10101 1987 1
10101 1988 0
10101 1989 0
10101 1990 0
10101 1991 1
10101 1992 1
10101 1993 1
10101 1994 1
10101 1995 0
;
data want;
merge have have(keep=PERMNO div rename=(PERMNO=_PERMNO div=_div) firstobs=2)
have(keep=PERMNO div rename=(PERMNO=__PERMNO div=__div) firstobs=3);
if lag2(PERMNO)=PERMNO and lag(PERMNO)=PERMNO and lag2(div)=0 and lag(div)=0 and
PERMNO=__PERMNO and PERMNO=_PERMNO and __div=1 and _div=1 and div=1 then dummy=1;
else dummy=0;
drop _: ;
run;
I don't know which way is better. good luck.
data dum (Keep=PERMNO Fyear dum sortedby=PERMNO Fyear);
if 0 then set have;
length journey $5; /*FIFO array*/
journey="";
retain journey;
do until (last.PERMNO); /*DO-Whitlock*/
set have(rename=(Fyear=Fyear_));
by PERMNO;
lag_Fyear_=lag(Fyear_);
if not first.PERMNO and lag_Fyear_^=Fyear_-1 then do;
putlog "ERROR: years not consecutive" _ALL_;
/*Check, Just in case*/
_ERROR_=1;
end;
if length(Journey)=5
/*"pop" first "div" from journey (FIFO rules) if full, IOW slide journey once to the left*/
then journey=substr(journey,2,4);
/*push this "div" onto the end*/
journey=strip(strip(journey)||put(div,1.0));
dum = journey in ("00111" );
Fyear=Fyear_-2;
if length(Journey)=5
then output;
end;
data want;
update have dum;
by PERMNO Fyear;
run;
Thanks for your thoughts. Appreciate
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.