Desktop productivity for business analysts and programmers

select max from group of data

Reply
Not applicable
Posts: 0

select max from group of data

I have data that looks like this:

Claim# LossYr RptYr Amount
A 6 6 10000
A 6 7 20000
B 7 7 10000
B 7 8 5000
C 7 7 10000
C 7 8 10000
E 8 8 20000
F 7 7 5000
F 7 8 5000
G 7 7 5000
H 6 6 10000
H 6 7 5000

What I would like is to only see, for each claim #, the row relating to the highest report year (the value in the third column)

So for A, I only want to see the 2nd row. For G, there is only 1 row, and I want to see that row. Any suggestions?
Not applicable
Posts: 0

Re: select max from group of data

Posted in reply to deleted_user
What I tried to do was select all the data and select max for report year and group by everything else. That did not work.
Not applicable
Posts: 0

Re: select max from group of data

Posted in reply to deleted_user
I actually just thought of a way to do it, but I don't particularly like it. I created an ID using the function "cats" using the first 3 variables. The I created a new table which did not have the amount column and had the max from column 3 (report year). Then I recreated the ID and joined back to the original dataset below. It actually does exactly what I want, but I feel like there should be a more elegant way. Any input would be greatly appreciated.

I feel like I'm talking to myself here!
Community Manager
Posts: 3,457

Re: select max from group of data

Posted in reply to deleted_user
There might be a fancy way to do it via the Rank task with a Group By variable.

Use the Rank task to rank RptYr by Claim#. Set the option to rank Largest to Smallest. The result will have a rank_RptYr variable with values like 1,2, etc.

Then use Query Builder to subset with just the rows where rank_RptYr = 1.

Chris
Community Manager
Posts: 3,457

Re: select max from group of data

Posted in reply to ChrisHemedinger
Here's the program *generated* by EG with my point and click approach (includes your sample data).

[pre]
data claims;
length claim $ 1 LossYr 3 ReptYr 3 Amount 8;
input claim LossYr ReptYr Amount;
datalines;
A 6 6 10000
A 6 7 20000
B 7 7 10000
B 7 8 5000
C 7 7 10000
C 7 8 10000
E 8 8 20000
F 7 7 5000
F 7 8 5000
G 7 7 5000
H 6 6 10000
H 6 7 5000
;

/* generated by Rank Data task */
PROC SORT
DATA=WORK.CLAIMS
OUT=WORK.SORTTempTableSorted
;
BY claim;
RUN;
PROC RANK DATA = WORK.SORTTempTableSorted
DESCENDING
TIES=MEAN
OUT=WORK.RANKEDCLAIMS;
BY claim;
VAR ReptYr;
RANKS rank_ReptYr ;

/* generated by Query Builder */
PROC SQL;
CREATE TABLE WORK.RecentClaims AS
SELECT t1.claim,
t1.LossYr,
t1.ReptYr,
t1.Amount
FROM WORK.RANKEDCLAIMS AS t1
WHERE t1.rank_ReptYr = 1;
QUIT;
[/pre]
Not applicable
Posts: 0

Re: select max from group of data

Posted in reply to deleted_user
More talking to myself.

I'm in the process of reading "The Little SAS Book" and I just stumbled across another way that I think I could do what I want. It talks about an automatic variable called FIRST.variable and LAST.variable. I feel like I could use LAST.Claim# and as long as I sort by claim number and report year, that would work. However, I'm not sure if this functions exists in SAS EG or not and if so, how do I use it.
Community Manager
Posts: 3,457

Re: select max from group of data

Posted in reply to deleted_user
And yes, a simple DATA step can do the job as well. Sort the data by claim and RptYr, then use the BY statement and FIRST. syntax to keep just the top years.

Chris
Not applicable
Posts: 0

Re: select max from group of data

Posted in reply to deleted_user
Thanks for your help Chris. I used the Rank Task and it worked great.
Ask a Question
Discussion stats
  • 7 replies
  • 491 views
  • 0 likes
  • 2 in conversation