BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sandi1
Fluorite | Level 6

Creating a new dummy variable (dum) as 1 for current year if at least past two consecutive year (t-2, t-1) has div = 0, and current year (t) div = 1, and at least next two consecutive year (t+1, t+2) has div = 1.

To be noted, this is a panel data structure. I have over 10,000 different PERMNO. Really appreciate if someone can help, thanks a lot

 

Data Have;

 

PERMNO

Fyear

div

10100

1980

0

10100

1981

0

10100

1982

1

10100

1983

1

10100

1984

0

10100

1985

1

10100

1986

1

10100

1987

1

10100

1988

0

10100

1989

0

10100

1990

0

10100

1991

1

10100

1992

1

10100

1993

1

10100

1994

1

10100

1995

0

 

Data want;

PERMNO

Fyear

div

dum

10100

1980

0

0

10100

1981

0

0

10100

1982

1

0

10100

1983

1

0

10100

1984

0

0

10100

1985

1

0

10100

1986

1

0

10100

1987

1

0

10100

1988

0

0

10100

1989

0

0

10100

1990

0

0

10100

1991

1

1

10100

1992

1

0

10100

1993

1

0

10100

1994

1

0

10100

1995

0

0

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Here is an alternative solution

 

data want(keep = PERMNO Fyear div dummy);
   
   array v {-3 : 2} _temporary_ (0 0 0 1 1 1);

   set have curobs = c;
   cum = 0;

   do i = lbound(v) to hbound(v);
      p = (c + i);
      if p > n | p le 0 then leave;
      set have(keep = permno div rename = (permno = pn div = d)) point = p nobs = n;
      if permno ne pn then leave;
      if d = v[i] then cum + 1;
      else leave;
   end;

   dummy = (cum = 6);

run;

View solution in original post

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

Try this. I added another PERMNO for demonstration. Should be reasonably fast even for large data and many PERMNO.

 

Feel free to ask.

 

data have;
input PERMNO Fyear div;
datalines;
10100 1980 0
10100 1981 0 
10100 1982 1
10100 1983 1
10100 1984 0
10100 1985 1
10100 1986 1
10100 1987 1
10100 1988 0
10100 1989 0
10100 1990 0
10100 1991 1
10100 1992 1
10100 1993 1
10100 1994 1
10100 1995 0
10101 1980 0
10101 1981 0 
10101 1982 1
10101 1983 1
10101 1984 0
10101 1985 1
10101 1986 1
10101 1987 1
10101 1988 0
10101 1989 0
10101 1990 0
10101 1991 1
10101 1992 1
10101 1993 1
10101 1994 1
10101 1995 0
;

data have(drop = c d y rc);

   array v {-2 : 2} _temporary_ (0 0 1 1 1);

   dcl hash h();
   h.definekey("PERMNO", "Fyear");
   h.definedata("div");
   h.definedone();

   do until (last.PERMNO);
      set have;
      by PERMNO;
      h.add();
   end;

   do until (last.PERMNO);
      set have;
      by PERMNO;
  
      c = 0;
      d = div;

      do y = -2 to 2;
         if h.find(key : PERMNO, key : Fyear + y) = 0 and div = v[y] then c + 1;
      end;

      dummy = (c=5);
      div = d;

      output;
   end;

   rc = h.clear();

run;

 

 

Result:

 

 

Spoiler
Obs  PERMNO  Fyear  div  dummy 
1    10100   1980   0    0 
2    10100   1981   0    0 
3    10100   1982   1    0 
4    10100   1983   1    0 
5    10100   1984   0    0 
6    10100   1985   1    0 
7    10100   1986   1    0 
8    10100   1987   1    0 
9    10100   1988   0    0 
10   10100   1989   0    0 
11   10100   1990   0    0 
12   10100   1991   1    1 
13   10100   1992   1    0 
14   10100   1993   1    0 
15   10100   1994   1    0 
16   10100   1995   0    0 
17   10101   1980   0    0 
18   10101   1981   0    0 
19   10101   1982   1    0 
20   10101   1983   1    0 
21   10101   1984   0    0 
22   10101   1985   1    0 
23   10101   1986   1    0 
24   10101   1987   1    0 
25   10101   1988   0    0 
26   10101   1989   0    0 
27   10101   1990   0    0 
28   10101   1991   1    1 
29   10101   1992   1    0 
30   10101   1993   1    0 
31   10101   1994   1    0 
32   10101   1995   0    0 

 

Sandi1
Fluorite | Level 6

Dear Peter,

 

thanks for your prompt response, it seems working using your data. I am not exactly sure what the code is doing. What if I change the rules now, say 

instead of at least past two consecutive year, I am now requiring past three consecutive year (t-3, t-2, t-1) has div = 0, and current year (t) div = 1, and at least next two consecutive year (t+1, t+2) has div = 1.

I will have to explore other rules, but helping with this change will help me to understand what your code is doing, thank you. 

PeterClemmensen
Tourmaline | Level 20

No Problem. I made the changes and commented where the changes are done.

 

Let me know if it works 🙂

 

data have(drop = c d y rc);

   array v {-3 : 2} _temporary_ (0 0 0 1 1 1);   /* Here */

   dcl hash h();
   h.definekey("PERMNO", "Fyear");
   h.definedata("div");
   h.definedone();

   do until (last.PERMNO);
      set have;
      by PERMNO;
      h.add();
   end;

   do until (last.PERMNO);
      set have;
      by PERMNO;
  
      c = 0;
      d = div;

      do y = -3 to 2;                            /* Here */
         if h.find(key : PERMNO, key : Fyear + y) = 0 and div = v[y] then c + 1;
      end;

      dummy = (c = 6);                           /* Here */
      div = d;

      output;
   end;

   rc = h.clear();

run;
Sandi1
Fluorite | Level 6

Dear Peter,

 

I am very sure it will work. I will try with my actual dataset tomorrow, and will let you know. thanks for your great work

PeterClemmensen
Tourmaline | Level 20

Anytime. Please remember to close the thread if it does 🙂

 

Also, welcome to the community!

PeterClemmensen
Tourmaline | Level 20

Here is an alternative solution

 

data want(keep = PERMNO Fyear div dummy);
   
   array v {-3 : 2} _temporary_ (0 0 0 1 1 1);

   set have curobs = c;
   cum = 0;

   do i = lbound(v) to hbound(v);
      p = (c + i);
      if p > n | p le 0 then leave;
      set have(keep = permno div rename = (permno = pn div = d)) point = p nobs = n;
      if permno ne pn then leave;
      if d = v[i] then cum + 1;
      else leave;
   end;

   dummy = (cum = 6);

run;
Sandi1
Fluorite | Level 6

Thank you, it works perfectly! 

PeterClemmensen
Tourmaline | Level 20

EDIT: @Sandi1, just edited the code a bit. Does the same thing, but you only need to change one line for the different logic.

 

data have(drop = c d y rc);

   array v {-3 : 2} _temporary_ (0 0 0 1 1 1);   /* Here */

   dcl hash h();
   h.definekey("PERMNO", "Fyear");
   h.definedata("d");
   h.definedone();

   do until (last.PERMNO);
      set have;
      by PERMNO;
      d = div;
      h.add();
   end;

   do until (last.PERMNO);
      set have;
      by PERMNO;
 
      c = 0;
      do y = lbound(v) to hbound(v);
         if h.find(key : PERMNO, key : Fyear + y) = 0 and d = v[y] then c + 1;
      end;
      dummy = (c = dim(v));

      output;
   end;

   rc = h.clear();

run;
Ksharp
Super User

Assuming there is not gap between years.

 

data have;
input PERMNO Fyear div;
datalines;
10100 1980 0
10100 1981 0 
10100 1982 1
10100 1983 1
10100 1984 0
10100 1985 1
10100 1986 1
10100 1987 1
10100 1988 0
10100 1989 0
10100 1990 0
10100 1991 1
10100 1992 1
10100 1993 1
10100 1994 1
10100 1995 0
10101 1980 0
10101 1981 0 
10101 1982 1
10101 1983 1
10101 1984 0
10101 1985 1
10101 1986 1
10101 1987 1
10101 1988 0
10101 1989 0
10101 1990 0
10101 1991 1
10101 1992 1
10101 1993 1
10101 1994 1
10101 1995 0
;
data want;
 merge have have(keep=PERMNO div rename=(PERMNO=_PERMNO div=_div) firstobs=2)
            have(keep=PERMNO div rename=(PERMNO=__PERMNO div=__div) firstobs=3);
 if lag2(PERMNO)=PERMNO and lag(PERMNO)=PERMNO and lag2(div)=0 and lag(div)=0 and
    PERMNO=__PERMNO and PERMNO=_PERMNO and __div=1 and _div=1 and div=1 then dummy=1;
 else dummy=0;
drop _: ;
run;
Sandi1
Fluorite | Level 6
Thanks for your thoughts. Appreciate
PhilC
Rhodochrosite | Level 12

I don't know which way is better. good luck.

data dum (Keep=PERMNO Fyear dum sortedby=PERMNO Fyear);
  if 0 then set have;
  length journey $5; /*FIFO array*/
    journey="";
    retain journey;

  do until (last.PERMNO); /*DO-Whitlock*/
    set have(rename=(Fyear=Fyear_));
      by PERMNO;
    lag_Fyear_=lag(Fyear_);
    if not first.PERMNO and lag_Fyear_^=Fyear_-1 then do; 
      putlog "ERROR: years not consecutive" _ALL_;
       /*Check, Just in case*/
      _ERROR_=1;
    end;

    if length(Journey)=5
      /*"pop" first "div" from journey (FIFO rules) if full, IOW slide journey once to the left*/
      then journey=substr(journey,2,4); 

    /*push this "div" onto the end*/
    journey=strip(strip(journey)||put(div,1.0));

    dum = journey in ("00111" );
    Fyear=Fyear_-2;
    if length(Journey)=5 
      then output;
  end;
data want;
  update have dum;
    by PERMNO Fyear;
run;
Sandi1
Fluorite | Level 6

Thanks for your thoughts. Appreciate

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2540 views
  • 7 likes
  • 4 in conversation