Find most frequent response across multiple variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 101
Accepted Solution

Find most frequent response across multiple variables

Hi,

 

How do I find the most frequent response for each person across a list of variables? Ideally I'd like to know the frequency count of the "most frequent  response". 

 

 

If it's a tie, either would be fine (although it would be nice if there is a way to choose without long SAS programming code)

Also my data are string variables (with values 01-10) data but if code-wise it's easier to do numerical, I can make them numerical.

 

data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02    03
4 01 01 02 02
5 01
;
run;

 

Want:
ID X1 X2 X3 X4 X5 MostFreq Count
1 07 04 07 07 07   07 3
2 04 05 04 04 05   04 3
3 02 02 03             02 2
4 01 01 02 02        01  2 (or 02  2)
5 05                        05  1


Accepted Solutions
Solution
‎05-11-2016 03:35 PM
Trusted Advisor
Posts: 1,115

Re: Find most frequent response across multiple variables

Hi @Solph,

 

Try this:

data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
  if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
run;

This is for character variables x1, x2, ... For numeric variables it could be simplified.

View solution in original post


All Replies
Frequent Contributor
Posts: 101

Re: Find the most frequent response across multiple variables

Sorry, posted it before I finished composing. My question was if there is some efficient way to do it? I don't know if SAS has some ready to use code to do multiple response (like SPSS has) and write out the values into the datatset, or like the max/min functions (xx=max(of x1-x5) that can easily identify the max/min values out of a list of variables. The code I can come up with to get the most frequent response is a really really long one.

 

Thanks. 

Super User
Posts: 17,734

Re: Find the most frequent response across multiple variables

The fastest way is to transpose your data, from wide to long,  use a proc freq and then re-transpose to desired structure.  

 

Do you need help coding that? 

Proc transpose can flip your data each way. 

Frequent Contributor
Posts: 101

Re: Find the most frequent response across multiple variables

You meant to have the vertical data? I think it worked perfectly. Here is what I did after transposing. Let me know if there is a better way.


proc sort; by id;
proc freq; by id; table x /out=xx (drop=percent); run;
proc print noobs; run;

 

It outputted ID, x (01-10) and count for each ID and each X value.

 

 

id x COUNT
1 04 1
1 07 4
2 04 3
2 05 2

 

Thanks again.

Super User
Posts: 5,069

Re: Find most frequent response across multiple variables

I would recommend an approach that gives you a little more flexibility ... get 5 counts instead of 1 maximum count.  For example:

 

data want;

set have;

array x {5};

array count {5};

do _i_=1 to 5;

count{_i_}=0;

do _n_=1 to 5;

   if x{_i_} = x{_n_} then count{_i_} + 1;

end;

end;

run;

 

At the end of this, COUNT1 will be the number of times that the X1 answer appears, COUNT2 the number of times that the X2 answer appears, etc.  You can easily add to this DATA step with statements such as these:

 

maxcount = max(of count1-count5);

maxvar = whichn(maxcount, of count1-count5);

varname = vname(x{maxvar});

 

This way you can easily detect ties for the maximum count, detect answers that only appear once, etc.

Frequent Contributor
Posts: 101

Re: Find most frequent response across multiple variables

I thought about using array to count but got stuck for the rest. I liked the code because it indeed gives more flexibliity for further manipulation, but

- I need to assign a new variable associating varname values (e.g. x1 in the first row) to varaible name x1's value of 07, so that I can create a variable with a value of 07? How do I do it

- How come count for ID=5 didn't look right? Is it because it's counting missing? (Looks like it, for ID 3 and 4 too).

 

 

id x1 x2 x3 x4 x5 count1 count2 count3 count4 count5 _i_ maxcount maxvar varname
1 07 04 07 07 07 4 1 4 4 4 6 4 1 x1
2 04 05 04 04 05 3 2 3 3 2 6 3 1 x1
3 02   02 03   2 2 2 1 2 6 2 1 x1
4 01 01 02   02 2 2 2 1 2 6 2 1 x1
5 01         1 4 4 4 4 6 4 2 x2

 

Super User
Posts: 5,069

Re: Find most frequent response across multiple variables

A few variations, then ...

 

If  you don't want to count missing values, change one line in the program:

 

if x{_i_} > ' ' then do _n_=1 to 5;

 

The counts for the missing values will remain at zero.

 

It looks like you can get MAXCOUNT, MAXVAR, and VARNAME.  To also get MAXVALUE, you could utilize MAXVAR:

 

maxvalue = x{maxvar};

Solution
‎05-11-2016 03:35 PM
Trusted Advisor
Posts: 1,115

Re: Find most frequent response across multiple variables

Hi @Solph,

 

Try this:

data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
  if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
run;

This is for character variables x1, x2, ... For numeric variables it could be simplified.

Trusted Advisor
Posts: 1,204

Re: Find most frequent response across multiple variables

data want(drop=i);
set have;
array x{*} x:;
do i=1 to dim(x);
MostFreq=max(of x(*));
Count=countc(cats(of x(*)),MostFreq,'i');
end;
run;

Trusted Advisor
Posts: 1,115

Re: Find most frequent response across multiple variables

[ Edited ]
stat_sas wrote:

MostFreq=max(of x(*));


@stat_sas: Why should the largest value coincide with the most frequent value?

Frequent Contributor
Posts: 101

Re: Find most frequent response across multiple variables

Thanks all for the various approaches.

 

FreelanceReinhard's code is ready to go, for the most frequent response. Thanks a lot (so that I don't need to write additional code or revise code).
stat_sas's code works when I want the max value and its count.

 

I've filed away all for future reference. Thanks again for your help.

Trusted Advisor
Posts: 1,204

Re: Find most frequent response across multiple variables

You are right. I thougt this is based on maximum number but this is not the case.

Super User
Posts: 9,662

Re: Find most frequent response across multiple variables

If your variable are Alphabeta NOT digit , try the following code :

 

data have; 
infile cards truncover;
input id ( x1  x2  x3  x4  x5) ($) ;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02  .  03 .
4 01 01 02 02
5 01 . . .
;
run;

data want;
if _n_=1 then do;
 length key $ 40;
 declare hash h();
 declare hiter hi('h');
 h.definekey('key');
 h.definedata('key','count');
 h.definedone();
end;
 set have;
 array xx{*} $ x:;
 do i=1 to dim(xx);
  if not missing(xx{i}) then do;
   key=xx{i};
   if h.find()=0 then count+1;
    else count=1;
   h.replace();
  end;
 end;
 most_freq=0;
 do while(hi.next()=0);
  if most_freq lt count then do;
   mode=key;
   most_freq=count;
  end;
 end;
h.clear();
drop i key count ;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 691 views
  • 0 likes
  • 6 in conversation