Hi,
I started learning SAS one week ago. I know some basics like data steps, loops, BY statement and how it creates First and Last variables, some Procedures like Sort.
I have a dataset divided into groups, each group has different number of observations:
example of how that dataset would look like:
x y
a 0
a 5
a 22
b 22
b 3
c 5
c 22
c 14
c 9
So in this case, 22 appears in 3 groups so the result will be 22.
I just want to know the idea of how to implement that, I thought of comparing each observation for all other observations and so on, but this makes no sense in case if I have a big dataset.
I think we need a little more information. Are you just looking to get the mode? Or are you trying to only keep records that have that max value? Here's one way to show the most frequent value, but I don't know if it's what you want.
data have;
input x $ y;
datalines;
a 0
a 5
a 22
b 22
b 3
c 5
c 22
c 14
c 9
;
run;
proc freq data = have order = freq noprint;
tables y / out = want;
run;
Can you have duplicate entries of the Y per X? What if there are ties in the most frequent?
Something like this could get you started
proc sql outobs=1;
create table want as
select y, count(distinct x) as num_groups
from have
group by y
order by 2 desc, 1;
quit;
You're essentially looking to do distinct counts per item, rather than per group so reframing the question makes it fairly trivial.
https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas
@Mohammed_123 wrote:
Hi,
I started learning SAS one week ago. I know some basics like data steps, loops, BY statement and how it creates First and Last variables, some Procedures like Sort.
I have a dataset divided into groups, each group has different number of observations:
example of how that dataset would look like:
x y
a 0
a 5
a 22
b 22
b 3
c 5
c 22
c 14
c 9
So in this case, 22 appears in 3 groups so the result will be 22.
I just want to know the idea of how to implement that, I thought of comparing each observation for all other observations and so on, but this makes no sense in case if I have a big dataset.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.