Hi SAS Community,
I have a data set where I am trying to find the max grade for each ID. This is simple enough using Proc SQL and then using a group by statement.
There are two problems that I am running into , the first is I am trying to identify the model where the max grade comes from and assign that model for each row with the same ID. In the example below, for ID #1 model b has a higher grade of 2 than a that only has a grade of 1, so model b would be assigned as the max_ model for ID #1..
The second problem that I am trying to tackle is the same as the first problem with an added nuance, If there are multiple models that have the same highest grade for the ID, I want to pick the one one with the highest amount value. In the example below, for ID #2 has two models have the same max grade of 3. Since model a has a higher value, I would assign a to every row for ID 2
This is an example of my data
id model grade amount
1 a 1 10
1 b 2 10
2 a 3 20
2 b 3 10
This is what I envision my data to look like in the end
id model grade amount max_grade max_model
1 a 1 10 2 b
1 b 2 10 2 b
2 a 3 20 3 a
2 b 3 10 3 a
Please reach out if you have any questions or need me to clarify anything.
Best,
Tom
You will need a data step:
proc sort data=have; by id descending grade descending amount; run;
data want;
set have;
by id;
retain max_grade max_model;
if first.id then do;
max_grade = grade;
max_model = model;
end;
run;
solves both problems.
PROC SUMMARY allows you to identify the ID value (where ID can be any variable) that produces the maximum. Here is an example:
As far as the second problem, I think you'd have to do some coding in a DATA step or SQL.
Use BY group + SORT
Something like this, untested.
proc sort data=have;
by ID descending grade descending amount;
run;
data max;
set have;
by ID ;
if first.ID;
rename grade=max_grade model = max_model;
drop amount;
run;
data want;
merge have max;
by ID;
run;
@Tommy1 wrote:
Hi SAS Community,
I have a data set where I am trying to find the max grade for each ID. This is simple enough using Proc SQL and then using a group by statement.
There are two problems that I am running into , the first is I am trying to identify the model where the max grade comes from and assign that model for each row with the same ID. In the example below, for ID #1 model b has a higher grade of 2 than a that only has a grade of 1, so model b would be assigned as the max_ model for ID #1..
The second problem that I am trying to tackle is the same as the first problem with an added nuance, If there are multiple models that have the same highest grade for the ID, I want to pick the one one with the highest amount value. In the example below, for ID #2 has two models have the same max grade of 3. Since model a has a higher value, I would assign a to every row for ID 2
This is an example of my data
id model grade amount
1 a 1 10
1 b 2 10
2 a 3 20
2 b 3 10
This is what I envision my data to look like in the end
id model grade amount max_grade max_model
1 a 1 10 2 b
1 b 2 10 2 b
2 a 3 20 3 a
2 b 3 10 3 a
Please reach out if you have any questions or need me to clarify anything.
Best,
Tom
For huge data sets, wouldn't a SORT be more time consuming than PROC SUMMARY?
I think this tackles it:
data have;
id=1; model='a'; grade=1; amount=10; output;
id=1; model='b'; grade=2; amount=10; output;
id=2; model='a'; grade=3; amount=20; output;
id=2; model='b'; grade=3; amount=10; output;
run;
proc sort data=have out=have_sort;
by id descending grade descending amount model;
run;
data want (drop=max_amount);
set have_sort;
by id;
retain max_grade max_amount max_model;
if first.id then do; max_grade = grade; max_model = model; max_amount=amount; end;
if grade > max_grade then do; max_grade = grade; max_model = model; end;
else if grade=max_grade and amount > max_amount then do; max_grade = grade; max_model = model; end;
run;
You could also do this in proc sql qith nested queries, but that second criteria would be a bit tricky.
Not hard in SQL if you know what to group and boolean expressions
data have;
input id model $ grade amount;
cards;
1 a 1 10
1 b 2 10
2 a 3 20
2 b 3 10
;
proc sql;
create table want(drop=t) as
select *, max(t) as max_model
from
(select *,ifc(max(amount)=amount,model,' ') as t
from
(select *, max(grade) as max_grade from have group by id)
group by id,max_grade)
group by id,max_grade;
quit;
data have;
input id model $ grade amount;
cards;
1 a 1 10
1 b 2 10
2 a 3 20
2 b 3 10
;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("max_grade") ;
h.definedata ("_amt", "_m") ;
h.definedone () ;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
length max_model _m $32;
if grade<max_grade then continue;
max_grade=grade;
max_model=model;
if h.find()=0 and _amt>amount then max_model=_m;
else do; _amt=amount;_m=model;end;
h.replace();
end;
do _n_=1 to _n_;
set have;
output;
end;
drop _:;
h.clear();
run;
You will need a data step:
proc sort data=have; by id descending grade descending amount; run;
data want;
set have;
by id;
retain max_grade max_model;
if first.id then do;
max_grade = grade;
max_model = model;
end;
run;
solves both problems.
Wow! Thank you all so much for all the solutions! The knowledge and speed that you responded to my post blew me away. This shows just how wonderful a place the SAS community is. I wish I could accept multiple solutions because there were so many different ways that you all helped me approach the problem. I am not sure if there is a certain etiquette for which solution to pick, but I went with the one that was the most simple and easiest for me to implement with my data.
Thank You!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.