Hi all,
I'm quite new to SAS and I have this project that is due in two weeks. While I'm learning to program, what I need to do first is quite a big challenge. So I need your help!
Basically I have a data set that has variables "execid", which is the IDs for company executives, "gvkey", which is the IDs for the companies. I also have a variable "year" indicating different years.
The dataset looks like this:
GVKEY | EXECID | YEAR |
10553 | 2 | 1996 |
10553 | 2 | 1997 |
10553 | 2 | 1998 |
10553 | 2 | 1999 |
10553 | 2 | 2000 |
132502 | 2 | 1998 |
132502 | 2 | 1999 |
132502 | 2 | 2001 |
1078 | 3 | 1992 |
1078 | 3 | 1993 |
1013 | 6 | 1996 |
1013 | 6 | 1997 |
1013 | 6 | 1998 |
1013 | 6 | 1999 |
1013 | 6 | 2000 |
1013 | 6 | 2001 |
This says executive #2 works in the company 10553 from 1996 to 2000, he also works in the company 132502 from 1998 to 2001. Executive #3 works in the company 1078 from 1992 to 1993. Executive #6 works in the company 1013 from 1996 to 2001. Also, an executive may work in three or even four companies in the data period: 1992 to 2006.
Now the task is to select those executives that "are employed for at least three years in each of at least two different firms".
So in the case of the above example, executive 2 would qualify. #3 wouldn't qualify since he only works in 1078 for two years. #6 wouldn't qualify either since s/he only worked for one company.
While I can do some simple defining variables, sorting, compute frequencies, etc. This is a little too much for me. My whole data set has over 100,000 items and Iexpect only a small part of them would qualify.
Can someone please help me? I'm desperate.
Many thanks!
You are going to have to check a number of cases to ensure I correctly interpreted and coded for the additional requirements. The following uses what is called a DOW loop:
data have;
input GVKEY EXECID YEAR;
cards;
10553 2 1996
10553 2 1997
10553 2 1998
10553 2 1999
10553 2 2000
132502 2 1998
132502 2 1999
132502 2 2000
132502 2 2001
1078 3 1992
1078 3 1993
1013 6 1996
1013 6 1997
1013 6 1998
1013 6 1999
1013 6 2000
1013 6 2001
;
data want (keep=gvkey execid year);
do until(last.execid);
set have;
by EXECID;
last_year=lag(year);
last_gvkey=lag(gvkey);
if first.execid then firms=0;
if gvkey ne last_gvkey then do;
counter=1;
if firms eq 1 then do;
start_year2=year;
gvkey2=gvkey;
end;
else do;
start_year=year;
gvkey1=gvkey;
end;
end;
else do;
if year eq last_year+1 then do;
counter+1;
if counter eq 3 then do;
firms+1;
end;
if counter ge 3 then do;
if firms eq 1 then end_year=year;
else end_year2=year;
end;
end;
else do;
counter=1;
if firms eq 0 then do;
start_year=year;
gvkey1=gvkey;
end;
else if firms eq 1 then do;
start_year2=year;
gvkey2=gvkey;
end;
end;
end;
end;
do until(last.execid);
set have;
by EXECID;
if firms eq 2 and
((gvkey eq gvkey1 and start_year<=year<=end_year) or
(gvkey eq gvkey2 and start_year2<=year<=end_year2))
then output;
end;
run;
Dear Desperate,
It really isn't that difficult. However, with the example data you provided, no execs would be selected as Exec 2 was missing a year with the second company. I added that year in the following data and possible code:
data have;
input GVKEY EXECID YEAR;
cards;
10553 2 1996
10553 2 1997
10553 2 1998
10553 2 1999
10553 2 2000
132502 2 1998
132502 2 1999
132502 2 2000
132502 2 2001
1078 3 1992
1078 3 1993
1013 6 1996
1013 6 1997
1013 6 1998
1013 6 1999
1013 6 2000
1013 6 2001
;
data want (drop=last_year counter firms);
set have;
by EXECID GVKEY;
last_year=lag(year);
if first.execid then firms=0;
if first.gvkey then counter=1;
else do;
if year eq last_year+1 then do;
counter+1;
if counter eq 3 then do;
firms+1;
if firms eq 2 then do;
output;
firms=-99;
end;
end;
end;
else counter=1;
end;
run;
You, of course, have to review the code to ensure that it really does what you want. And, moreso, to understand it so that you can modify it to accomodate other conditions you might be asked to evaluate.
Thank you so much for your reply!
I'm still studying your program but I've run it and it did select those executives.
I guess I didn't explain it clearly though. I would need all the years for the executives. So in the case of the executive #2, I would want the years in the firm 10553 from 1996 to 2000, and all the years in the firm 132502 from 1998 to 2001. This is to say, there are nine items for the executive #2 in the "want" data set (actually eight items if using the data set I provided, since #2 has three years in the second company, which satisfies the minimum required years condition).
Therefore, "want" would contain all the firm-years for each executive who satisfies the condition.
Sorry I didn't explain clearly. Would you show me how to do this again?
Thank you.
OK. Just as Art.T said ,It is not too difficult.But I am curious that why this poster will appear at statistical forum.
data have; input GVKEY EXECID YEAR; cards; 10553 2 1996 10553 2 1997 10553 2 1998 10553 2 1999 10553 2 2000 132502 2 1998 132502 2 1999 132502 2 2000 132502 2 2001 1078 3 1992 1078 3 1993 1013 6 1996 1013 6 1997 1013 6 1998 1013 6 1999 1013 6 2000 1013 6 2001 ; run; proc sql noprint; create table want as select * from have group by execid having count(distinct gvkey) ge 2 and count(distinct year) ge 3; quit;
Ksharp
Well this project I'm working on is a statistical one. But I guess you're right, this particular question is not a statistical one. Sorry!:smileysilly:
You are going to have to check a number of cases to ensure I correctly interpreted and coded for the additional requirements. The following uses what is called a DOW loop:
data have;
input GVKEY EXECID YEAR;
cards;
10553 2 1996
10553 2 1997
10553 2 1998
10553 2 1999
10553 2 2000
132502 2 1998
132502 2 1999
132502 2 2000
132502 2 2001
1078 3 1992
1078 3 1993
1013 6 1996
1013 6 1997
1013 6 1998
1013 6 1999
1013 6 2000
1013 6 2001
;
data want (keep=gvkey execid year);
do until(last.execid);
set have;
by EXECID;
last_year=lag(year);
last_gvkey=lag(gvkey);
if first.execid then firms=0;
if gvkey ne last_gvkey then do;
counter=1;
if firms eq 1 then do;
start_year2=year;
gvkey2=gvkey;
end;
else do;
start_year=year;
gvkey1=gvkey;
end;
end;
else do;
if year eq last_year+1 then do;
counter+1;
if counter eq 3 then do;
firms+1;
end;
if counter ge 3 then do;
if firms eq 1 then end_year=year;
else end_year2=year;
end;
end;
else do;
counter=1;
if firms eq 0 then do;
start_year=year;
gvkey1=gvkey;
end;
else if firms eq 1 then do;
start_year2=year;
gvkey2=gvkey;
end;
end;
end;
end;
do until(last.execid);
set have;
by EXECID;
if firms eq 2 and
((gvkey eq gvkey1 and start_year<=year<=end_year) or
(gvkey eq gvkey2 and start_year2<=year<=end_year2))
then output;
end;
run;
It worked perfectly!!!!!!!!!! Thank you sooooo much!!!!
God, I would have never done this by myself.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.