Desktop productivity for business analysts and programmers

select max from group of data

Reply
N/A
Posts: 0

select max from group of data

I have data that looks like this:

Claim# LossYr RptYr Amount
A 6 6 10000
A 6 7 20000
B 7 7 10000
B 7 8 5000
C 7 7 10000
C 7 8 10000
E 8 8 20000
F 7 7 5000
F 7 8 5000
G 7 7 5000
H 6 6 10000
H 6 7 5000

What I would like is to only see, for each claim #, the row relating to the highest report year (the value in the third column)

So for A, I only want to see the 2nd row. For G, there is only 1 row, and I want to see that row. Any suggestions?
N/A
Posts: 0

Re: select max from group of data

What I tried to do was select all the data and select max for report year and group by everything else. That did not work.
N/A
Posts: 0

Re: select max from group of data

I actually just thought of a way to do it, but I don't particularly like it. I created an ID using the function "cats" using the first 3 variables. The I created a new table which did not have the amount column and had the max from column 3 (report year). Then I recreated the ID and joined back to the original dataset below. It actually does exactly what I want, but I feel like there should be a more elegant way. Any input would be greatly appreciated.

I feel like I'm talking to myself here!
Community Manager
Posts: 2,882

Re: select max from group of data

There might be a fancy way to do it via the Rank task with a Group By variable.

Use the Rank task to rank RptYr by Claim#. Set the option to rank Largest to Smallest. The result will have a rank_RptYr variable with values like 1,2, etc.

Then use Query Builder to subset with just the rows where rank_RptYr = 1.

Chris
Community Manager
Posts: 2,882

Re: select max from group of data

Here's the program *generated* by EG with my point and click approach (includes your sample data).

[pre]
data claims;
length claim $ 1 LossYr 3 ReptYr 3 Amount 8;
input claim LossYr ReptYr Amount;
datalines;
A 6 6 10000
A 6 7 20000
B 7 7 10000
B 7 8 5000
C 7 7 10000
C 7 8 10000
E 8 8 20000
F 7 7 5000
F 7 8 5000
G 7 7 5000
H 6 6 10000
H 6 7 5000
;

/* generated by Rank Data task */
PROC SORT
DATA=WORK.CLAIMS
OUT=WORK.SORTTempTableSorted
;
BY claim;
RUN;
PROC RANK DATA = WORK.SORTTempTableSorted
DESCENDING
TIES=MEAN
OUT=WORK.RANKEDCLAIMS;
BY claim;
VAR ReptYr;
RANKS rank_ReptYr ;

/* generated by Query Builder */
PROC SQL;
CREATE TABLE WORK.RecentClaims AS
SELECT t1.claim,
t1.LossYr,
t1.ReptYr,
t1.Amount
FROM WORK.RANKEDCLAIMS AS t1
WHERE t1.rank_ReptYr = 1;
QUIT;
[/pre]
N/A
Posts: 0

Re: select max from group of data

More talking to myself.

I'm in the process of reading "The Little SAS Book" and I just stumbled across another way that I think I could do what I want. It talks about an automatic variable called FIRST.variable and LAST.variable. I feel like I could use LAST.Claim# and as long as I sort by claim number and report year, that would work. However, I'm not sure if this functions exists in SAS EG or not and if so, how do I use it.
Community Manager
Posts: 2,882

Re: select max from group of data

And yes, a simple DATA step can do the job as well. Sort the data by claim and RptYr, then use the BY statement and FIRST. syntax to keep just the top years.

Chris
N/A
Posts: 0

Re: select max from group of data

Thanks for your help Chris. I used the Rank Task and it worked great.
Ask a Question
Discussion stats
  • 7 replies
  • 218 views
  • 0 likes
  • 2 in conversation