Hello,
This is my first time submitting a question! Exciting stuff!
So, I am trying to keep the first set of observations based off of two variables. Here is an example dataset:
Have:
datadate | gvkey | eventdate |
3/31/2001 | 5 | 10/1/2004 |
6/30/2001 | 5 | 10/1/2004 |
9/30/2001 | 5 | 10/1/2004 |
12/31/2001 | 5 | 10/1/2004 |
3/31/2002 | 5 | 10/1/2004 |
6/30/2002 | 5 | 11/4/2005 |
9/30/2002 | 5 | 11/4/2005 |
12/31/2002 | 5 | 11/4/2005 |
3/31/2003 | 5 | 11/4/2005 |
6/30/2003 | 5 | 11/4/2005 |
Want:
3/31/2001 | 5 | 10/1/2004 |
6/30/2001 | 5 | 10/1/2004 |
9/30/2001 | 5 | 10/1/2004 |
12/31/2001 | 5 | 10/1/2004 |
3/31/2002 | 5 | 10/1/2004 |
To be clear, what I would like to pull from this dataset is the first five rows. So, all observations that have the first event date for each gvkey. What I'm having trouble doing is pulling all five observations. I can pull just the first observation based off of event date and gvkey, but I haven't figured out a way to pull all five. Thanks!
@Reeza My understanding is that the OP wants just the 1st event records for each GVkey regardless of how many records the 1st event may contain
data have;
input datadate :mmddyy10. gvkey $ eventdate :mmddyy10.;
format datadate eventdate mmddyy10.;
cards;
3/31/2001 001577 10/1/2004
6/30/2001 001577 10/1/2004
9/30/2001 001577 10/1/2004
12/31/2001 001577 10/1/2004
3/31/2002 001577 10/1/2004
6/30/2002 001577 11/4/2005
9/30/2002 001577 11/4/2005
12/31/2002 001577 11/4/2005
3/31/2003 001577 11/4/2005
6/30/2003 001577 11/4/2005
3/31/2001 004324 9/3/2005
6/30/2001 004324 9/3/2005
9/30/2001 004324 9/3/2005
12/31/2001 004324 9/3/2005
3/31/2002 004324 9/3/2005
6/30/2002 004324 9/3/2005
9/30/2002 004324 10/12/2006
12/31/2002 004324 10/12/2006
3/31/2003 004324 10/12/2006
6/30/2003 004324 10/12/2006
;
data want;
set have;
by gvkey eventdate;
if first.gvkey then n=1;
else if first.eventdate then n+1;
if n=1;
drop n;
run;
datadate | gvkey | eventdate |
---|---|---|
03/31/2001 | 001577 | 10/01/2004 |
06/30/2001 | 001577 | 10/01/2004 |
09/30/2001 | 001577 | 10/01/2004 |
12/31/2001 | 001577 | 10/01/2004 |
03/31/2002 | 001577 | 10/01/2004 |
03/31/2001 | 004324 | 09/03/2005 |
06/30/2001 | 004324 | 09/03/2005 |
09/30/2001 | 004324 | 09/03/2005 |
12/31/2001 | 004324 | 09/03/2005 |
03/31/2002 | 004324 | 09/03/2005 |
06/30/2002 | 004324 | 09/03/2005 |
And you know it's 5 because of the value in GVKEY?
Is this done once or does it need to be done for each group?
data want;
set have;
by eventDate gvkey;
if first.eventDate then count=0;
count+1;
if count<= gvkey;
run;
@Chutzler wrote:
Hello,
This is my first time submitting a question! Exciting stuff!
So, I am trying to keep the first set of observations based off of two variables. Here is an example dataset:
Have:
datadate gvkey eventdate 3/31/2001 5 10/1/2004 6/30/2001 5 10/1/2004 9/30/2001 5 10/1/2004 12/31/2001 5 10/1/2004 3/31/2002 5 10/1/2004 6/30/2002 5 11/4/2005 9/30/2002 5 11/4/2005 12/31/2002 5 11/4/2005 3/31/2003 5 11/4/2005 6/30/2003 5 11/4/2005 Want:
3/31/2001 5 10/1/2004 6/30/2001 5 10/1/2004 9/30/2001 5 10/1/2004 12/31/2001 5 10/1/2004 3/31/2002 5 10/1/2004
To be clear, what I would like to pull from this dataset is the first five rows. So, all observations that have the first event date for each gvkey. What I'm having trouble doing is pulling all five observations. I can pull just the first observation based off of event date and gvkey, but I haven't figured out a way to pull all five. Thanks!
Sorry, I should have clarified. It needs to be done for each group. I don't know it's 5 because of the value of the GVKEY. It could be more or less than five. The GVKEYS are typically six digit identifiers. Maybe this will be more clear:
Have:
datadate | gvkey | eventdate |
3/31/2001 | 001577 | 10/1/2004 |
6/30/2001 | 001577 | 10/1/2004 |
9/30/2001 | 001577 | 10/1/2004 |
12/31/2001 | 001577 | 10/1/2004 |
3/31/2002 | 001577 | 10/1/2004 |
6/30/2002 | 001577 | 11/4/2005 |
9/30/2002 | 001577 | 11/4/2005 |
12/31/2002 | 001577 | 11/4/2005 |
3/31/2003 | 001577 | 11/4/2005 |
6/30/2003 | 001577 | 11/4/2005 |
3/31/2001 | 004324 | 9/3/2005 |
6/30/2001 | 004324 | 9/3/2005 |
9/30/2001 | 004324 | 9/3/2005 |
12/31/2001 | 004324 | 9/3/2005 |
3/31/2002 | 004324 | 9/3/2005 |
6/30/2002 | 004324 | 9/3/2005 |
9/30/2002 | 004324 | 10/12/2006 |
12/31/2002 | 004324 | 10/12/2006 |
3/31/2003 | 004324 | 10/12/2006 |
6/30/2003 | 004324 | 10/12/2006 |
Want:
datadate | gvkey | eventdate |
3/31/2001 | 001577 | 10/1/2004 |
6/30/2001 | 001577 | 10/1/2004 |
9/30/2001 | 001577 | 10/1/2004 |
12/31/2001 | 001577 | 10/1/2004 |
3/31/2002 | 001577 | 10/1/2004 |
3/31/2001 | 004324 | 9/3/2005 |
6/30/2001 | 004324 | 9/3/2005 |
9/30/2001 | 004324 | 9/3/2005 |
12/31/2001 | 004324 | 9/3/2005 |
3/31/2002 | 004324 | 9/3/2005 |
6/30/2002 | 004324 | 9/3/2005 |
So why is it 5 for GVKEY=001577 and 6 for GVKEY=004324?
How do you know that?
@Reeza My understanding is that the OP wants just the 1st event records for each GVkey regardless of how many records the 1st event may contain
data have;
input datadate :mmddyy10. gvkey $ eventdate :mmddyy10.;
format datadate eventdate mmddyy10.;
cards;
3/31/2001 001577 10/1/2004
6/30/2001 001577 10/1/2004
9/30/2001 001577 10/1/2004
12/31/2001 001577 10/1/2004
3/31/2002 001577 10/1/2004
6/30/2002 001577 11/4/2005
9/30/2002 001577 11/4/2005
12/31/2002 001577 11/4/2005
3/31/2003 001577 11/4/2005
6/30/2003 001577 11/4/2005
3/31/2001 004324 9/3/2005
6/30/2001 004324 9/3/2005
9/30/2001 004324 9/3/2005
12/31/2001 004324 9/3/2005
3/31/2002 004324 9/3/2005
6/30/2002 004324 9/3/2005
9/30/2002 004324 10/12/2006
12/31/2002 004324 10/12/2006
3/31/2003 004324 10/12/2006
6/30/2003 004324 10/12/2006
;
data want;
set have;
by gvkey eventdate;
if first.gvkey then n=1;
else if first.eventdate then n+1;
if n=1;
drop n;
run;
datadate | gvkey | eventdate |
---|---|---|
03/31/2001 | 001577 | 10/01/2004 |
06/30/2001 | 001577 | 10/01/2004 |
09/30/2001 | 001577 | 10/01/2004 |
12/31/2001 | 001577 | 10/01/2004 |
03/31/2002 | 001577 | 10/01/2004 |
03/31/2001 | 004324 | 09/03/2005 |
06/30/2001 | 004324 | 09/03/2005 |
09/30/2001 | 004324 | 09/03/2005 |
12/31/2001 | 004324 | 09/03/2005 |
03/31/2002 | 004324 | 09/03/2005 |
06/30/2002 | 004324 | 09/03/2005 |
Awesome, this worked! Thanks so much!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.