I need to find the most frequent level of character variables by group, and the sum of several numeric variables by the same group. If the most frequent level is not unique, I would like the last level with in the group (the data is sorted). Here is an example of what I want to acheive:
Original data:
id | city | color | numeric_variable |
123 | New York | blue | 4 |
123 | LA | blue | 3 |
123 | Seattle | green | 2 |
456 | New York | yellow | 3 |
456 | LA | green | 2 |
456 | LA | red | 3 |
456 | Seattle | blue | 5 |
789 | Seattle | yellow | 2 |
789 | New York | green | 3 |
789 | Seattle | yellow | 5 |
789 | Seattle | yellow | 2 |
789 | Seattle | green | 4 |
Desired outcome:
id | city | color | sum |
123 | Seattle | blue | 9 |
456 | LA | blue | 13 |
789 | Seattle | yellow | 16 |
For the sum, using group by id and SUM() within PROC SQL works well. Is there a MODE function within PROC SQL?
I am using Enterprise Guide version 5.1 (5.100.0.12019) (32-bit).
Thank you!
I'm not aware of any mode function (unfortunately).
But you can calculate the mode by using the proc univariate.
Not sure though if can calculate all your measures in one go.
Well it can be done in SQL but it ain't straightforward
data a;
input id city &:$20. color :$8. num;
datalines;
123 New York blue 4
123 LA blue 3
123 Seattle green 2
456 New York yellow 3
456 LA green 2
456 LA red 3
456 Seattle blue 5
789 Seattle yellow 2
789 New York green 3
789 Seattle yellow 5
789 Seattle yellow 2
789 Seattle green 4
;
data T;
set a;
order = _n_;
run;
proc sql;
create table cityModes as
select id, city
from
(select id, city, last
from
(select id, city, count(*) as n, max(order) as last
from T
group by id, city)
group by id
having n = max(n))
group by id
having last=max(last);
create table colorModes as
select id, color
from
(select id, color, last
from
(select id, color, count(*) as n, max(order) as last
from T
group by id, color)
group by id
having n = max(n))
group by id
having last=max(last);
create table want as
select T.id, a.city, b.color, sum(T.num) as sum
from
T inner join
cityModes as a on T.id=a.id inner join
colorModes as b on a.id=b.id
group by T.id, a.city, b.color;
select * from want;
quit;
Thank you!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.