BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JOLSAS
Quartz | Level 8

Hi all,

I'm quite new to SAS and I have this project that is due in two weeks. While I'm learning to program, what I need to do first is quite a big challenge. So I need your help!

Basically I have a data set that has variables "execid", which is the IDs for company executives, "gvkey", which is the IDs for the companies. I also have a variable "year" indicating different years.

The dataset looks like this:

GVKEYEXECIDYEAR
1055321996
1055321997
1055321998
1055321999
1055322000
13250221998
13250221999
13250222001
107831992
107831993
101361996
101361997
101361998
101361999
101362000
101362001

This says executive #2 works in the company 10553 from 1996 to 2000, he also works in the company 132502 from 1998 to 2001. Executive #3 works in the company 1078 from 1992 to 1993. Executive #6 works in the company 1013 from 1996 to 2001. Also, an executive may work in three or even four companies in the data period: 1992 to 2006.

Now the task is to select those executives that "are employed for at least three years in each of at least two different firms".

So in the case of the above example, executive 2 would qualify. #3 wouldn't qualify since he only works in 1078 for two years. #6 wouldn't qualify either since s/he only worked for one company.

While I can do some simple defining variables, sorting, compute frequencies, etc. This is a little too much for me. My whole data set has over 100,000 items and Iexpect only a small part of them would qualify.

Can someone please help me? I'm desperate.

Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

You are going to have to check a number of cases to ensure I correctly interpreted and coded for the additional requirements.  The following uses what is called a DOW loop:

data have;

  input GVKEY          EXECID          YEAR;

  cards;

10553          2          1996

10553          2          1997

10553          2          1998

10553          2          1999

10553          2          2000

132502          2          1998

132502          2          1999

132502          2          2000

132502          2          2001

1078          3          1992

1078          3          1993

1013          6          1996

1013          6          1997

1013          6          1998

1013          6          1999

1013          6          2000

1013          6          2001

;

data want (keep=gvkey execid year);

  do until(last.execid);

    set have;

    by EXECID;

    last_year=lag(year);

    last_gvkey=lag(gvkey);

    if first.execid then firms=0;

    if gvkey ne last_gvkey then do;

      counter=1;

      if firms eq 1 then do;

        start_year2=year;

        gvkey2=gvkey;

            end;

      else do;

        start_year=year;

        gvkey1=gvkey;

      end;

    end;

    else do;

      if year eq last_year+1 then do;

        counter+1;

        if counter eq 3 then do;

          firms+1;

        end;

        if counter ge 3 then do;

          if firms eq 1 then end_year=year;

          else end_year2=year;

        end;

      end;

      else do;

        counter=1;

        if firms eq 0 then do;

          start_year=year;

                gvkey1=gvkey;

              end;

              else if firms eq 1 then do;

          start_year2=year;

          gvkey2=gvkey;

        end;

      end;

    end;

  end;

  do until(last.execid);

    set have;

    by EXECID;

    if firms eq 2 and

     ((gvkey eq gvkey1 and start_year<=year<=end_year) or

     (gvkey eq gvkey2 and start_year2<=year<=end_year2))

     then output;

  end;

run;

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

Dear Desperate,

It really isn't that difficult.  However, with the example data you provided, no execs would be selected as Exec 2 was missing a year with the second company.  I added that year in the following data and possible code:

data have;

  input GVKEY          EXECID          YEAR;

  cards;

10553          2          1996

10553          2          1997

10553          2          1998

10553          2          1999

10553          2          2000

132502          2          1998

132502          2          1999

132502          2          2000

132502          2          2001

1078          3          1992

1078          3          1993

1013          6          1996

1013          6          1997

1013          6          1998

1013          6          1999

1013          6          2000

1013          6          2001

;

data want (drop=last_year counter firms);

  set have;

  by EXECID GVKEY;

  last_year=lag(year);

  if first.execid then firms=0;

  if first.gvkey then counter=1;

  else do;

    if year eq last_year+1 then do;

      counter+1;

      if counter eq 3 then do;

         firms+1;

         if firms eq 2 then do;

           output;

           firms=-99;

         end;

      end;

    end;

    else counter=1;

  end;

run;

You, of course, have to review the code to ensure that it really does what you want.  And, moreso, to understand it so that you can modify it to accomodate other conditions you might be asked to evaluate.

JOLSAS
Quartz | Level 8

Thank you so much for your reply!

I'm still studying your program but I've run it and it did select those executives.

I guess I didn't explain it clearly though. I would need all the years for the executives. So in the case of the executive #2, I would want the years in the firm 10553 from 1996 to 2000, and all the years in the firm 132502 from 1998 to 2001. This is to say, there are nine items for the executive #2 in the "want" data set (actually eight items if using the data set I provided, since #2 has three years in the second company, which satisfies the minimum required years conditionSmiley Happy).

Therefore, "want" would contain all the firm-years for each executive who satisfies the condition.

Sorry I didn't explain clearly. Would you show me how to do this again?

Thank you.

Ksharp
Super User

OK. Just as Art.T said ,It is not too difficult.But I am curious that why this poster will appear at statistical forum. Smiley Happy

data have;
  input GVKEY          EXECID          YEAR;
  cards;
10553          2          1996
10553          2          1997
10553          2          1998
10553          2          1999
10553          2          2000
132502          2          1998
132502          2          1999
132502          2          2000
132502          2          2001
1078          3          1992
1078          3          1993
1013          6          1996
1013          6          1997
1013          6          1998
1013          6          1999
1013          6          2000
1013          6          2001
;
run;
proc sql noprint;
 create table want as
  select *
   from have
    group by execid
     having count(distinct gvkey) ge 2 and count(distinct year) ge 3;
quit;
     

Ksharp

JOLSAS
Quartz | Level 8

Well this project I'm working on is a statistical one. But I guess you're right, this particular question is not a statistical one. Sorry!:smileysilly:

art297
Opal | Level 21

You are going to have to check a number of cases to ensure I correctly interpreted and coded for the additional requirements.  The following uses what is called a DOW loop:

data have;

  input GVKEY          EXECID          YEAR;

  cards;

10553          2          1996

10553          2          1997

10553          2          1998

10553          2          1999

10553          2          2000

132502          2          1998

132502          2          1999

132502          2          2000

132502          2          2001

1078          3          1992

1078          3          1993

1013          6          1996

1013          6          1997

1013          6          1998

1013          6          1999

1013          6          2000

1013          6          2001

;

data want (keep=gvkey execid year);

  do until(last.execid);

    set have;

    by EXECID;

    last_year=lag(year);

    last_gvkey=lag(gvkey);

    if first.execid then firms=0;

    if gvkey ne last_gvkey then do;

      counter=1;

      if firms eq 1 then do;

        start_year2=year;

        gvkey2=gvkey;

            end;

      else do;

        start_year=year;

        gvkey1=gvkey;

      end;

    end;

    else do;

      if year eq last_year+1 then do;

        counter+1;

        if counter eq 3 then do;

          firms+1;

        end;

        if counter ge 3 then do;

          if firms eq 1 then end_year=year;

          else end_year2=year;

        end;

      end;

      else do;

        counter=1;

        if firms eq 0 then do;

          start_year=year;

                gvkey1=gvkey;

              end;

              else if firms eq 1 then do;

          start_year2=year;

          gvkey2=gvkey;

        end;

      end;

    end;

  end;

  do until(last.execid);

    set have;

    by EXECID;

    if firms eq 2 and

     ((gvkey eq gvkey1 and start_year<=year<=end_year) or

     (gvkey eq gvkey2 and start_year2<=year<=end_year2))

     then output;

  end;

run;

JOLSAS
Quartz | Level 8

It worked perfectly!!!!!!!!!! Thank you sooooo much!!!!

God, I would have never done this by myself.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1318 views
  • 0 likes
  • 3 in conversation