Dear Madam/Sir,
I would like to remain only the last observation by multiple groups (ticker prd_yr anndats)
ID is not coded by three multiple groups and last.id are not recognized using the following code. The sample dataset is attached.
proc sort data=m2b; by ticker prd_yr anndats; run;
proc freq data=m2b; tables ticker*prd_yr*anndats/noprint out=m2c;
data m2c; set m2c; id=_n_; run;
proc sort data=m2c; by ticker prd_yr anndats; run;
data m2d; merge m2b m2c; by ticker prd_yr anndats;
proc sort data=m2d; by ticker prd_yr anndats; run;
data m3; set m2d; by ticker prd_yr anndats;
if last.id; run;
Any help will be highly appreciated.
Sincerely,
Joon1
Hello @joon1,
If you want to select the last observation of each ticker-prd_yr BY group, use if last.prd_yr.
data want;
set m2d;
by ticker prd_yr anndats id;
if last.prd_yr;
run;
The result will be the same with the shorter BY statement
by ticker prd_yr;
but including anndats id implies a check that dataset M2D is sorted by ticker prd_yr and within the ticker-prd_yr BY groups by anndats id (otherwise an error message would occur). So, in particular, the selected observations will be those with the latest anndats (within their ticker-prd_yr BY group).
Note that you defined variable id in a way that ticker-prd_yr-anndats-id BY groups are identical with ticker-prd_yr-anndats BY groups.
You need to add ID to your PROC SORT and DATA step BY statements. As long as you include it as one of the BY variables, and it can be in any position, you can use LAST.ID.
Thanks for your quick response.
I have inserted 'id' in the code. However, the result is the same as before. ID is not assigned properly and last.id is not working. I want to remain the last observation (red colored). Any help will be highly appreciated.
proc sort data=m2b; by ticker prd_yr anndats; run;
proc freq data=m2b; tables ticker*prd_yr*anndats/noprint out=m2c;
data m2c; set m2c; id=_n_; run;
proc sort data=m2c; by ticker prd_yr anndats id; run;
data m2d; merge m2b m2c; by ticker prd_yr anndats;
proc sort data=m2d; by ticker prd_yr anndats id; run;
data m3; set m2d; by ticker prd_yr anndats id;
if last.id; run;
000R | 2014 | 12 | 2014-05-01 | 1 | EPS | ANN | -1 | -0.9 | -0.9000 | -0.95000 |
000R | 2014 | 12 | 2014-07-31 | 2 | EPS | ANN | -1.17 | -1.04 | -1.0400 | -1.10500 |
000R | 2014 | 12 | 2014-10-29 | 3 | EPS | ANN | -1.07 | -1.04 | -1.0400 | -1.05500 |
000R | 2015 | 12 | 2015-03-19 | 4 | EPS | ANN | -0.72 | -0.53 | -0.5300 | -0.62500 |
000R | 2015 | 12 | 2015-08-04 | 5 | EPS | ANN | -0.68 | -0.6 | -0.6000 | -0.64000 |
000R | 2015 | 12 | 2015-10-29 | 6 | EPS | ANN | -0.57 | -0.54 | -0.5400 | -0.55500 |
Hi @joon1
You get the expected result from your code.
If you want the red lines only, data should still be sorted as specifies to get the "correct" last record,
but the subsetting if-statment should be "if last.prd_mon".
Hello @joon1,
If you want to select the last observation of each ticker-prd_yr BY group, use if last.prd_yr.
data want;
set m2d;
by ticker prd_yr anndats id;
if last.prd_yr;
run;
The result will be the same with the shorter BY statement
by ticker prd_yr;
but including anndats id implies a check that dataset M2D is sorted by ticker prd_yr and within the ticker-prd_yr BY groups by anndats id (otherwise an error message would occur). So, in particular, the selected observations will be those with the latest anndats (within their ticker-prd_yr BY group).
Note that you defined variable id in a way that ticker-prd_yr-anndats-id BY groups are identical with ticker-prd_yr-anndats BY groups.
It works! Thank you so much, FreelanceReinh.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.