Solved: Choose the last observations in the same firm-year

joon1 · Posted 09-13-2019 05:57 PM

Hi Madam/Sir,

I would like to choose the last observation out of two observations in the same firm-year. For instance, I would like to choose observations # 2, 4, and 6 and drop observations #1,2, and 3. It will be highly appreciative if you can provide the programming code. Thank you. Joon1

Obs CIKCODE cyear ANNUALREPORTDATE dirind123456

0000002024	2016	20160101	10
0000002024	2016	20161201	11
0000010254	2014	20140301	3
0000010254	2014	20141201	5
0000018169	2011	20110101	4
0000018169	2011	20111201	5

Reeza · Posted 09-13-2019 06:04 PM

proc sort data=have;
by cikcode cyear annualReportdate;
run;

data want;
set have;
by cikcode cyear annualreportdate;
if last.cyear;
run;

@joon1 wrote:

Hi Madam/Sir,

I would like to choose the last observation out of two observations in the same firm-year. For instance, I would like to choose observations # 2, 4, and 6 and drop observations #1,2, and 3. It will be highly appreciative if you can provide the programming code. Thank you. Joon1

Obs CIKCODE cyear ANNUALREPORTDATE dirind123456

0000002024 2016 20160101 10

0000002024 2016 20161201 11

0000010254 2014 20140301 3

0000010254 2014 20141201 5

0000018169 2011 20110101 4

0000018169 2011 20111201 5

View solution in original post

Reeza · Posted 09-13-2019 06:04 PM

proc sort data=have;
by cikcode cyear annualReportdate;
run;

data want;
set have;
by cikcode cyear annualreportdate;
if last.cyear;
run;

@joon1 wrote:

Hi Madam/Sir,

I would like to choose the last observation out of two observations in the same firm-year. For instance, I would like to choose observations # 2, 4, and 6 and drop observations #1,2, and 3. It will be highly appreciative if you can provide the programming code. Thank you. Joon1

Obs CIKCODE cyear ANNUALREPORTDATE dirind123456

0000002024 2016 20160101 10

0000002024 2016 20161201 11

0000010254 2014 20140301 3

0000010254 2014 20141201 5

0000018169 2011 20110101 4

0000018169 2011 20111201 5

ballardw · Posted 09-13-2019 06:05 PM

if the data are in order as your example implies:

Data want;
   set have;
   by cikcode cyear;
   if last.cyear;
run;

hashman · Posted 09-13-2019 11:31 PM

@joon1:

If your file is already properly sorted, just use the DATA step posted by @ballardw and @Reeza.

If it's not sorted, a single-step solution can be:

data have ;                                                                                                                                                                                                                                                     
  input CIKCODE cyear ANNUALREPORTDATE dirind123456 ;                                                                                                                                                                                                           
  cards ;                                                                                                                                                                                                                                                       
0000002024  2016  20160101  10                                                                                                                                                                                                                                  
0000002024  2016  20161201  11                                                                                                                                                                                                                                  
0000010254  2014  20140301   3                                                                                                                                                                                                                                  
0000010254  2014  20141201   5                                                                                                                                                                                                                                  
0000018169  2011  20110101   4                                                                                                                                                                                                                                  
0000018169  2011  20111201   5                                                                                                                                                                                                                                  
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data _null_ ;                                                                                                                                                                                                                                                   
  dcl hash h (dataset:"have", duplicate:"r", ordered:"a") ;                                                                                                                                                                                                     
  h.definekey ("cikcode", "cyear") ;                                                                                                                                                                                                                            
  h.definedata (all:"y") ;                                                                                                                                                                                                                                      
  h.definedone () ;                                                                                                                                                                                                                                             
  h.output (dataset:"want") ;                                                                                                                                                                                                                                   
  stop ;                                                                                                                                                                                                                                                        
  set have ;                                                                                                                                                                                                                                                    
run ;

This assumes that you have enough memory to hold the output data set, which for the sheet in your Excel workbook is guaranteed. But if in reality you have a much bigger file, here is another version requiring the hash table to hold only the record identifier in its data portion (rather than all the non-key variables):

data want ;                                                                                                                                                                                                                                                     
  dcl hash h (ordered:"a") ;                                                                                                                                                                                                                                    
  h.definekey ("cikcode", "cyear") ;                                                                                                                                                                                                                            
  h.definedata ("rid") ;                                                                                                                                                                                                                                        
  h.definedone () ;                                                                                                                                                                                                                                             
  do rid = 1 by 1 until (z) ;                                                                                                                                                                                                                                   
    set have end = z ;                                                                                                                                                                                                                                          
    h.replace() ;                                                                                                                                                                                                                                               
  end ;                                                                                                                                                                                                                                                         
  dcl hiter hi ("h") ;                                                                                                                                                                                                                                          
  do while (hi.next() = 0) ;                                                                                                                                                                                                                                    
    set have point = rid ;                                                                                                                                                                                                                                      
    output ;                                                                                                                                                                                                                                                    
  end ;                                                                                                                                                                                                                                                         
run ;

Suit yourself.

Kind regards

Paul D.

Choose the last observations in the same firm-year

Re: Choose the last observations in the same firm-year

Re: Choose the last observations in the same firm-year

Re: Choose the last observations in the same firm-year

Re: Choose the last observations in the same firm-year

Choose the last observations in the same firm-year

Re: Choose the last observations in the same firm-year

Re: Choose the last observations in the same firm-year

Re: Choose the last observations in the same firm-year

Re: Choose the last observations in the same firm-year

Registration is open

SAS Training: Just a Click Away