BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Chutzler
Calcite | Level 5

Hello,

 

This is my first time submitting a question! Exciting stuff!

 

So, I am trying to keep the first set of observations based off of two variables. Here is an example dataset:

Have:

datadategvkeyeventdate
3/31/2001510/1/2004
6/30/2001510/1/2004
9/30/2001510/1/2004
12/31/2001510/1/2004
3/31/2002510/1/2004
6/30/2002511/4/2005
9/30/2002511/4/2005
12/31/2002511/4/2005
3/31/2003511/4/2005
6/30/2003511/4/2005

 Want:

3/31/2001510/1/2004
6/30/2001510/1/2004
9/30/2001510/1/2004
12/31/2001510/1/2004
3/31/2002510/1/2004

 

To be clear, what I would like to pull from this dataset is the first five rows. So, all observations that have the first event date for each gvkey. What I'm having trouble doing is pulling all five observations. I can pull just the first observation based off of event date and gvkey, but I haven't figured out a way to pull all five. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

@Reeza  My understanding is that the OP wants just the 1st event records for each GVkey regardless of how many records the 1st event may contain

 


data have;
input datadate	:mmddyy10. gvkey $	eventdate :mmddyy10.;
format datadate eventdate mmddyy10.;
cards;
3/31/2001	001577	10/1/2004
6/30/2001	001577	10/1/2004
9/30/2001	001577	10/1/2004
12/31/2001	001577	10/1/2004
3/31/2002	001577	10/1/2004
6/30/2002	001577	11/4/2005
9/30/2002	001577	11/4/2005
12/31/2002	001577	11/4/2005
3/31/2003	001577	11/4/2005
6/30/2003	001577	11/4/2005
3/31/2001	004324	9/3/2005
6/30/2001	004324	9/3/2005
9/30/2001	004324	9/3/2005
12/31/2001	004324	9/3/2005
3/31/2002	004324	9/3/2005
6/30/2002	004324	9/3/2005
9/30/2002	004324	10/12/2006
12/31/2002	004324	10/12/2006
3/31/2003	004324	10/12/2006
6/30/2003	004324	10/12/2006
;


data want;
  set have;
  by gvkey eventdate;
  if first.gvkey then n=1;
  else if first.eventdate then n+1;
  if n=1;
  drop n;
run;
datadate gvkey eventdate
03/31/2001 001577 10/01/2004
06/30/2001 001577 10/01/2004
09/30/2001 001577 10/01/2004
12/31/2001 001577 10/01/2004
03/31/2002 001577 10/01/2004
03/31/2001 004324 09/03/2005
06/30/2001 004324 09/03/2005
09/30/2001 004324 09/03/2005
12/31/2001 004324 09/03/2005
03/31/2002 004324 09/03/2005
06/30/2002 004324 09/03/2005

 

 

View solution in original post

5 REPLIES 5
Reeza
Super User

And you know it's 5 because of the value in GVKEY?

Is this done once or does it need to be done for each group?

 

data want;
set have;

by eventDate gvkey;
if first.eventDate then count=0;

count+1;

if count<= gvkey;

run;

@Chutzler wrote:

Hello,

 

This is my first time submitting a question! Exciting stuff!

 

So, I am trying to keep the first set of observations based off of two variables. Here is an example dataset:

Have:

datadate gvkey eventdate
3/31/2001 5 10/1/2004
6/30/2001 5 10/1/2004
9/30/2001 5 10/1/2004
12/31/2001 5 10/1/2004
3/31/2002 5 10/1/2004
6/30/2002 5 11/4/2005
9/30/2002 5 11/4/2005
12/31/2002 5 11/4/2005
3/31/2003 5 11/4/2005
6/30/2003 5 11/4/2005

 Want:

3/31/2001 5 10/1/2004
6/30/2001 5 10/1/2004
9/30/2001 5 10/1/2004
12/31/2001 5 10/1/2004
3/31/2002 5 10/1/2004

 

To be clear, what I would like to pull from this dataset is the first five rows. So, all observations that have the first event date for each gvkey. What I'm having trouble doing is pulling all five observations. I can pull just the first observation based off of event date and gvkey, but I haven't figured out a way to pull all five. Thanks!


 

Chutzler
Calcite | Level 5

Sorry, I should have clarified. It needs to be done for each group. I don't know it's 5 because of the value of the GVKEY. It could be more or less than five. The GVKEYS are typically six digit identifiers. Maybe this will be more clear:

Have:

datadategvkeyeventdate
3/31/200100157710/1/2004
6/30/200100157710/1/2004
9/30/200100157710/1/2004
12/31/200100157710/1/2004
3/31/200200157710/1/2004
6/30/200200157711/4/2005
9/30/200200157711/4/2005
12/31/200200157711/4/2005
3/31/200300157711/4/2005
6/30/200300157711/4/2005
3/31/20010043249/3/2005
6/30/20010043249/3/2005
9/30/20010043249/3/2005
12/31/20010043249/3/2005
3/31/20020043249/3/2005
6/30/20020043249/3/2005
9/30/200200432410/12/2006
12/31/200200432410/12/2006
3/31/200300432410/12/2006
6/30/200300432410/12/2006

 

Want:

datadategvkeyeventdate
3/31/200100157710/1/2004
6/30/200100157710/1/2004
9/30/200100157710/1/2004
12/31/200100157710/1/2004
3/31/200200157710/1/2004
3/31/20010043249/3/2005
6/30/20010043249/3/2005
9/30/20010043249/3/2005
12/31/20010043249/3/2005
3/31/20020043249/3/2005
6/30/20020043249/3/2005
Reeza
Super User

So why is it 5 for GVKEY=001577 and 6 for GVKEY=004324?
How do you know that?

novinosrin
Tourmaline | Level 20

@Reeza  My understanding is that the OP wants just the 1st event records for each GVkey regardless of how many records the 1st event may contain

 


data have;
input datadate	:mmddyy10. gvkey $	eventdate :mmddyy10.;
format datadate eventdate mmddyy10.;
cards;
3/31/2001	001577	10/1/2004
6/30/2001	001577	10/1/2004
9/30/2001	001577	10/1/2004
12/31/2001	001577	10/1/2004
3/31/2002	001577	10/1/2004
6/30/2002	001577	11/4/2005
9/30/2002	001577	11/4/2005
12/31/2002	001577	11/4/2005
3/31/2003	001577	11/4/2005
6/30/2003	001577	11/4/2005
3/31/2001	004324	9/3/2005
6/30/2001	004324	9/3/2005
9/30/2001	004324	9/3/2005
12/31/2001	004324	9/3/2005
3/31/2002	004324	9/3/2005
6/30/2002	004324	9/3/2005
9/30/2002	004324	10/12/2006
12/31/2002	004324	10/12/2006
3/31/2003	004324	10/12/2006
6/30/2003	004324	10/12/2006
;


data want;
  set have;
  by gvkey eventdate;
  if first.gvkey then n=1;
  else if first.eventdate then n+1;
  if n=1;
  drop n;
run;
datadate gvkey eventdate
03/31/2001 001577 10/01/2004
06/30/2001 001577 10/01/2004
09/30/2001 001577 10/01/2004
12/31/2001 001577 10/01/2004
03/31/2002 001577 10/01/2004
03/31/2001 004324 09/03/2005
06/30/2001 004324 09/03/2005
09/30/2001 004324 09/03/2005
12/31/2001 004324 09/03/2005
03/31/2002 004324 09/03/2005
06/30/2002 004324 09/03/2005

 

 

Chutzler
Calcite | Level 5

Awesome, this worked! Thanks so much!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 749 views
  • 2 likes
  • 3 in conversation