Hi,
How do I find the most frequent response for each person across a list of variables? Ideally I'd like to know the frequency count of the "most frequent response".
If it's a tie, either would be fine (although it would be nice if there is a way to choose without long SAS programming code)
Also my data are string variables (with values 01-10) data but if code-wise it's easier to do numerical, I can make them numerical.
data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 01 01 02 02
5 01
;
run;
Want:
ID X1 X2 X3 X4 X5 MostFreq Count
1 07 04 07 07 07 07 3
2 04 05 04 04 05 04 3
3 02 02 03 02 2
4 01 01 02 02 01 2 (or 02 2)
5 05 05 1
Hi @Solph,
Try this:
data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
run;
This is for character variables x1, x2, ... For numeric variables it could be simplified.
Sorry, posted it before I finished composing. My question was if there is some efficient way to do it? I don't know if SAS has some ready to use code to do multiple response (like SPSS has) and write out the values into the datatset, or like the max/min functions (xx=max(of x1-x5) that can easily identify the max/min values out of a list of variables. The code I can come up with to get the most frequent response is a really really long one.
Thanks.
The fastest way is to transpose your data, from wide to long, use a proc freq and then re-transpose to desired structure.
Do you need help coding that?
Proc transpose can flip your data each way.
You meant to have the vertical data? I think it worked perfectly. Here is what I did after transposing. Let me know if there is a better way.
proc sort; by id;
proc freq; by id; table x /out=xx (drop=percent); run;
proc print noobs; run;
It outputted ID, x (01-10) and count for each ID and each X value.
id | x | COUNT |
---|---|---|
1 | 04 | 1 |
1 | 07 | 4 |
2 | 04 | 3 |
2 | 05 | 2 |
Thanks again.
I would recommend an approach that gives you a little more flexibility ... get 5 counts instead of 1 maximum count. For example:
data want;
set have;
array x {5};
array count {5};
do _i_=1 to 5;
count{_i_}=0;
do _n_=1 to 5;
if x{_i_} = x{_n_} then count{_i_} + 1;
end;
end;
run;
At the end of this, COUNT1 will be the number of times that the X1 answer appears, COUNT2 the number of times that the X2 answer appears, etc. You can easily add to this DATA step with statements such as these:
maxcount = max(of count1-count5);
maxvar = whichn(maxcount, of count1-count5);
varname = vname(x{maxvar});
This way you can easily detect ties for the maximum count, detect answers that only appear once, etc.
I thought about using array to count but got stuck for the rest. I liked the code because it indeed gives more flexibliity for further manipulation, but
- I need to assign a new variable associating varname values (e.g. x1 in the first row) to varaible name x1's value of 07, so that I can create a variable with a value of 07? How do I do it
- How come count for ID=5 didn't look right? Is it because it's counting missing? (Looks like it, for ID 3 and 4 too).
id | x1 | x2 | x3 | x4 | x5 | count1 | count2 | count3 | count4 | count5 | _i_ | maxcount | maxvar | varname |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 07 | 04 | 07 | 07 | 07 | 4 | 1 | 4 | 4 | 4 | 6 | 4 | 1 | x1 |
2 | 04 | 05 | 04 | 04 | 05 | 3 | 2 | 3 | 3 | 2 | 6 | 3 | 1 | x1 |
3 | 02 | 02 | 03 | 2 | 2 | 2 | 1 | 2 | 6 | 2 | 1 | x1 | ||
4 | 01 | 01 | 02 | 02 | 2 | 2 | 2 | 1 | 2 | 6 | 2 | 1 | x1 | |
5 | 01 | 1 | 4 | 4 | 4 | 4 | 6 | 4 | 2 | x2 |
A few variations, then ...
If you don't want to count missing values, change one line in the program:
if x{_i_} > ' ' then do _n_=1 to 5;
The counts for the missing values will remain at zero.
It looks like you can get MAXCOUNT, MAXVAR, and VARNAME. To also get MAXVALUE, you could utilize MAXVAR:
maxvalue = x{maxvar};
Hi @Solph,
Try this:
data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
run;
This is for character variables x1, x2, ... For numeric variables it could be simplified.
data want(drop=i);
set have;
array x{*} x:;
do i=1 to dim(x);
MostFreq=max(of x(*));
Count=countc(cats(of x(*)),MostFreq,'i');
end;
run;
Thanks all for the various approaches.
FreelanceReinhard's code is ready to go, for the most frequent response. Thanks a lot (so that I don't need to write additional code or revise code).
stat_sas's code works when I want the max value and its count.
I've filed away all for future reference. Thanks again for your help.
You are right. I thougt this is based on maximum number but this is not the case.
If your variable are Alphabeta NOT digit , try the following code :
data have;
infile cards truncover;
input id ( x1 x2 x3 x4 x5) ($) ;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 . 03 .
4 01 01 02 02
5 01 . . .
;
run;
data want;
if _n_=1 then do;
length key $ 40;
declare hash h();
declare hiter hi('h');
h.definekey('key');
h.definedata('key','count');
h.definedone();
end;
set have;
array xx{*} $ x:;
do i=1 to dim(xx);
if not missing(xx{i}) then do;
key=xx{i};
if h.find()=0 then count+1;
else count=1;
h.replace();
end;
end;
most_freq=0;
do while(hi.next()=0);
if most_freq lt count then do;
mode=key;
most_freq=count;
end;
end;
h.clear();
drop i key count ;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.