Hi Madam/Sir,
I would like to choose the last observation out of two observations in the same firm-year. For instance, I would like to choose observations # 2, 4, and 6 and drop observations #1,2, and 3. It will be highly appreciative if you can provide the programming code. Thank you. Joon1
0000002024 | 2016 | 20160101 | 10 |
0000002024 | 2016 | 20161201 | 11 |
0000010254 | 2014 | 20140301 | 3 |
0000010254 | 2014 | 20141201 | 5 |
0000018169 | 2011 | 20110101 | 4 |
0000018169 | 2011 | 20111201 | 5 |
proc sort data=have;
by cikcode cyear annualReportdate;
run;
data want;
set have;
by cikcode cyear annualreportdate;
if last.cyear;
run;
@joon1 wrote:
Hi Madam/Sir,
I would like to choose the last observation out of two observations in the same firm-year. For instance, I would like to choose observations # 2, 4, and 6 and drop observations #1,2, and 3. It will be highly appreciative if you can provide the programming code. Thank you. Joon1
Obs CIKCODE cyear ANNUALREPORTDATE dirind123456
0000002024 2016 20160101 10 0000002024 2016 20161201 11 0000010254 2014 20140301 3 0000010254 2014 20141201 5 0000018169 2011 20110101 4 0000018169 2011 20111201 5
proc sort data=have;
by cikcode cyear annualReportdate;
run;
data want;
set have;
by cikcode cyear annualreportdate;
if last.cyear;
run;
@joon1 wrote:
Hi Madam/Sir,
I would like to choose the last observation out of two observations in the same firm-year. For instance, I would like to choose observations # 2, 4, and 6 and drop observations #1,2, and 3. It will be highly appreciative if you can provide the programming code. Thank you. Joon1
Obs CIKCODE cyear ANNUALREPORTDATE dirind123456
0000002024 2016 20160101 10 0000002024 2016 20161201 11 0000010254 2014 20140301 3 0000010254 2014 20141201 5 0000018169 2011 20110101 4 0000018169 2011 20111201 5
if the data are in order as your example implies:
Data want; set have; by cikcode cyear; if last.cyear; run;
If your file is already properly sorted, just use the DATA step posted by @ballardw and @Reeza.
If it's not sorted, a single-step solution can be:
data have ;
input CIKCODE cyear ANNUALREPORTDATE dirind123456 ;
cards ;
0000002024 2016 20160101 10
0000002024 2016 20161201 11
0000010254 2014 20140301 3
0000010254 2014 20141201 5
0000018169 2011 20110101 4
0000018169 2011 20111201 5
;
run ;
data _null_ ;
dcl hash h (dataset:"have", duplicate:"r", ordered:"a") ;
h.definekey ("cikcode", "cyear") ;
h.definedata (all:"y") ;
h.definedone () ;
h.output (dataset:"want") ;
stop ;
set have ;
run ;
This assumes that you have enough memory to hold the output data set, which for the sheet in your Excel workbook is guaranteed. But if in reality you have a much bigger file, here is another version requiring the hash table to hold only the record identifier in its data portion (rather than all the non-key variables):
data want ;
dcl hash h (ordered:"a") ;
h.definekey ("cikcode", "cyear") ;
h.definedata ("rid") ;
h.definedone () ;
do rid = 1 by 1 until (z) ;
set have end = z ;
h.replace() ;
end ;
dcl hiter hi ("h") ;
do while (hi.next() = 0) ;
set have point = rid ;
output ;
end ;
run ;
Suit yourself.
Kind regards
Paul D.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.