DATA Step, Macro, Functions and more

Finding the maximum value of each unique ID while keeping all other columns

Reply
Occasional Contributor
Posts: 19

Finding the maximum value of each unique ID while keeping all other columns

Hi, I am using SAS to try to calculate the maximum value of a variable for each unique person. However, I want to retain the other columns of the dataset for the row that contains that maximum value. I've tried to use PROC SQL to accomplish this, but I have no idea how to make sure that the other columns also transfer over. For example, my dataset has 250,000 rows, but only 32,000 unique individuals. I want to find the maximum date for each unique person, and for the row that contains the maximum, I also want to retain all other columns. This is my code so far:

 

proc sql;
    create table test as
    select ID, max(date_var) from dataset
    group by 1;
quit;

Any help would be great. Thanks.

Super User
Posts: 17,784

Re: Finding the maximum value of each unique ID while keeping all other columns

[ Edited ]

Proc sort and a data step will work easily as well. 

 

Proc sort data=have;
By Id variable;
Run;

Data want;
Set have;
By Id;
If last.id;
Run;

However, if there's the possibility that there may be duplicates of the value SQL is easier. You can use the HAVING clause to keep only the rows of interest. 

 

Proc sql;
Create table want as
Select *, max(variable) as max_value
From have
Group by Id 
Having max(variable)=variable;
Quit;

 


corkee wrote:

Hi, I am using SAS to try to calculate the maximum value of a variable for each unique person. However, I want to retain the other columns of the dataset for the row that contains that maximum value. I've tried to use PROC SQL to accomplish this, but I have no idea how to make sure that the other columns also transfer over. For example, my dataset has 250,000 rows, but only 32,000 unique individuals. I want to find the maximum date for each unique person, and for the row that contains the maximum, I also want to retain all other columns. This is my code so far:

 

proc sql;
    create table test as
    select ID, max(date_var) from dataset
    group by 1;
quit;

Any help would be great. Thanks.



 

Respected Advisor
Posts: 3,887

Re: Finding the maximum value of each unique ID while keeping all other columns

@corkee

The Proc Sort / Data Step option @Reeza posted will likely perform better than the SQL.

 

Should it be possible that you have more than one record with a max date_var in an ID then the SQL will return all rows with a max value but the Proc Sort / Data Step will only return one row. Your pick!

Ask a Question
Discussion stats
  • 2 replies
  • 113 views
  • 0 likes
  • 3 in conversation