BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sandi1
Fluorite | Level 6

Creating a new dummy variable (dum) as 1 for current year if at least past two consecutive year (t-2, t-1) has div = 0, and current year (t) div = 1, and at least next two consecutive year (t+1, t+2) has div = 1.

To be noted, this is a panel data structure. I have over 10,000 different PERMNO. Really appreciate if someone can help, thanks a lot

 

Data Have;

 

PERMNO

Fyear

div

10100

1980

0

10100

1981

0

10100

1982

1

10100

1983

1

10100

1984

0

10100

1985

1

10100

1986

1

10100

1987

1

10100

1988

0

10100

1989

0

10100

1990

0

10100

1991

1

10100

1992

1

10100

1993

1

10100

1994

1

10100

1995

0

 

Data want;

PERMNO

Fyear

div

dum

10100

1980

0

0

10100

1981

0

0

10100

1982

1

0

10100

1983

1

0

10100

1984

0

0

10100

1985

1

0

10100

1986

1

0

10100

1987

1

0

10100

1988

0

0

10100

1989

0

0

10100

1990

0

0

10100

1991

1

1

10100

1992

1

0

10100

1993

1

0

10100

1994

1

0

10100

1995

0

0

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Here is an alternative solution

 

data want(keep = PERMNO Fyear div dummy);
   
   array v {-3 : 2} _temporary_ (0 0 0 1 1 1);

   set have curobs = c;
   cum = 0;

   do i = lbound(v) to hbound(v);
      p = (c + i);
      if p > n | p le 0 then leave;
      set have(keep = permno div rename = (permno = pn div = d)) point = p nobs = n;
      if permno ne pn then leave;
      if d = v[i] then cum + 1;
      else leave;
   end;

   dummy = (cum = 6);

run;

View solution in original post

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

Try this. I added another PERMNO for demonstration. Should be reasonably fast even for large data and many PERMNO.

 

Feel free to ask.

 

data have;
input PERMNO Fyear div;
datalines;
10100 1980 0
10100 1981 0 
10100 1982 1
10100 1983 1
10100 1984 0
10100 1985 1
10100 1986 1
10100 1987 1
10100 1988 0
10100 1989 0
10100 1990 0
10100 1991 1
10100 1992 1
10100 1993 1
10100 1994 1
10100 1995 0
10101 1980 0
10101 1981 0 
10101 1982 1
10101 1983 1
10101 1984 0
10101 1985 1
10101 1986 1
10101 1987 1
10101 1988 0
10101 1989 0
10101 1990 0
10101 1991 1
10101 1992 1
10101 1993 1
10101 1994 1
10101 1995 0
;

data have(drop = c d y rc);

   array v {-2 : 2} _temporary_ (0 0 1 1 1);

   dcl hash h();
   h.definekey("PERMNO", "Fyear");
   h.definedata("div");
   h.definedone();

   do until (last.PERMNO);
      set have;
      by PERMNO;
      h.add();
   end;

   do until (last.PERMNO);
      set have;
      by PERMNO;
  
      c = 0;
      d = div;

      do y = -2 to 2;
         if h.find(key : PERMNO, key : Fyear + y) = 0 and div = v[y] then c + 1;
      end;

      dummy = (c=5);
      div = d;

      output;
   end;

   rc = h.clear();

run;

 

 

Result:

 

 

Spoiler
Obs  PERMNO  Fyear  div  dummy 
1    10100   1980   0    0 
2    10100   1981   0    0 
3    10100   1982   1    0 
4    10100   1983   1    0 
5    10100   1984   0    0 
6    10100   1985   1    0 
7    10100   1986   1    0 
8    10100   1987   1    0 
9    10100   1988   0    0 
10   10100   1989   0    0 
11   10100   1990   0    0 
12   10100   1991   1    1 
13   10100   1992   1    0 
14   10100   1993   1    0 
15   10100   1994   1    0 
16   10100   1995   0    0 
17   10101   1980   0    0 
18   10101   1981   0    0 
19   10101   1982   1    0 
20   10101   1983   1    0 
21   10101   1984   0    0 
22   10101   1985   1    0 
23   10101   1986   1    0 
24   10101   1987   1    0 
25   10101   1988   0    0 
26   10101   1989   0    0 
27   10101   1990   0    0 
28   10101   1991   1    1 
29   10101   1992   1    0 
30   10101   1993   1    0 
31   10101   1994   1    0 
32   10101   1995   0    0 

 

Sandi1
Fluorite | Level 6

Dear Peter,

 

thanks for your prompt response, it seems working using your data. I am not exactly sure what the code is doing. What if I change the rules now, say 

instead of at least past two consecutive year, I am now requiring past three consecutive year (t-3, t-2, t-1) has div = 0, and current year (t) div = 1, and at least next two consecutive year (t+1, t+2) has div = 1.

I will have to explore other rules, but helping with this change will help me to understand what your code is doing, thank you. 

PeterClemmensen
Tourmaline | Level 20

No Problem. I made the changes and commented where the changes are done.

 

Let me know if it works 🙂

 

data have(drop = c d y rc);

   array v {-3 : 2} _temporary_ (0 0 0 1 1 1);   /* Here */

   dcl hash h();
   h.definekey("PERMNO", "Fyear");
   h.definedata("div");
   h.definedone();

   do until (last.PERMNO);
      set have;
      by PERMNO;
      h.add();
   end;

   do until (last.PERMNO);
      set have;
      by PERMNO;
  
      c = 0;
      d = div;

      do y = -3 to 2;                            /* Here */
         if h.find(key : PERMNO, key : Fyear + y) = 0 and div = v[y] then c + 1;
      end;

      dummy = (c = 6);                           /* Here */
      div = d;

      output;
   end;

   rc = h.clear();

run;
Sandi1
Fluorite | Level 6

Dear Peter,

 

I am very sure it will work. I will try with my actual dataset tomorrow, and will let you know. thanks for your great work

PeterClemmensen
Tourmaline | Level 20

Anytime. Please remember to close the thread if it does 🙂

 

Also, welcome to the community!

PeterClemmensen
Tourmaline | Level 20

Here is an alternative solution

 

data want(keep = PERMNO Fyear div dummy);
   
   array v {-3 : 2} _temporary_ (0 0 0 1 1 1);

   set have curobs = c;
   cum = 0;

   do i = lbound(v) to hbound(v);
      p = (c + i);
      if p > n | p le 0 then leave;
      set have(keep = permno div rename = (permno = pn div = d)) point = p nobs = n;
      if permno ne pn then leave;
      if d = v[i] then cum + 1;
      else leave;
   end;

   dummy = (cum = 6);

run;
Sandi1
Fluorite | Level 6

Thank you, it works perfectly! 

PeterClemmensen
Tourmaline | Level 20

EDIT: @Sandi1, just edited the code a bit. Does the same thing, but you only need to change one line for the different logic.

 

data have(drop = c d y rc);

   array v {-3 : 2} _temporary_ (0 0 0 1 1 1);   /* Here */

   dcl hash h();
   h.definekey("PERMNO", "Fyear");
   h.definedata("d");
   h.definedone();

   do until (last.PERMNO);
      set have;
      by PERMNO;
      d = div;
      h.add();
   end;

   do until (last.PERMNO);
      set have;
      by PERMNO;
 
      c = 0;
      do y = lbound(v) to hbound(v);
         if h.find(key : PERMNO, key : Fyear + y) = 0 and d = v[y] then c + 1;
      end;
      dummy = (c = dim(v));

      output;
   end;

   rc = h.clear();

run;
Ksharp
Super User

Assuming there is not gap between years.

 

data have;
input PERMNO Fyear div;
datalines;
10100 1980 0
10100 1981 0 
10100 1982 1
10100 1983 1
10100 1984 0
10100 1985 1
10100 1986 1
10100 1987 1
10100 1988 0
10100 1989 0
10100 1990 0
10100 1991 1
10100 1992 1
10100 1993 1
10100 1994 1
10100 1995 0
10101 1980 0
10101 1981 0 
10101 1982 1
10101 1983 1
10101 1984 0
10101 1985 1
10101 1986 1
10101 1987 1
10101 1988 0
10101 1989 0
10101 1990 0
10101 1991 1
10101 1992 1
10101 1993 1
10101 1994 1
10101 1995 0
;
data want;
 merge have have(keep=PERMNO div rename=(PERMNO=_PERMNO div=_div) firstobs=2)
            have(keep=PERMNO div rename=(PERMNO=__PERMNO div=__div) firstobs=3);
 if lag2(PERMNO)=PERMNO and lag(PERMNO)=PERMNO and lag2(div)=0 and lag(div)=0 and
    PERMNO=__PERMNO and PERMNO=_PERMNO and __div=1 and _div=1 and div=1 then dummy=1;
 else dummy=0;
drop _: ;
run;
Sandi1
Fluorite | Level 6
Thanks for your thoughts. Appreciate
PhilC
Rhodochrosite | Level 12

I don't know which way is better. good luck.

data dum (Keep=PERMNO Fyear dum sortedby=PERMNO Fyear);
  if 0 then set have;
  length journey $5; /*FIFO array*/
    journey="";
    retain journey;

  do until (last.PERMNO); /*DO-Whitlock*/
    set have(rename=(Fyear=Fyear_));
      by PERMNO;
    lag_Fyear_=lag(Fyear_);
    if not first.PERMNO and lag_Fyear_^=Fyear_-1 then do; 
      putlog "ERROR: years not consecutive" _ALL_;
       /*Check, Just in case*/
      _ERROR_=1;
    end;

    if length(Journey)=5
      /*"pop" first "div" from journey (FIFO rules) if full, IOW slide journey once to the left*/
      then journey=substr(journey,2,4); 

    /*push this "div" onto the end*/
    journey=strip(strip(journey)||put(div,1.0));

    dum = journey in ("00111" );
    Fyear=Fyear_-2;
    if length(Journey)=5 
      then output;
  end;
data want;
  update have dum;
    by PERMNO Fyear;
run;
Sandi1
Fluorite | Level 6

Thanks for your thoughts. Appreciate

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1058 views
  • 7 likes
  • 4 in conversation