Solved: Re: Find most frequent response across multiple variables

Solph · Posted 05-11-2016 12:03 PM

Hi,

How do I find the most frequent response for each person across a list of variables? Ideally I'd like to know the frequency count of the "most frequent response".

If it's a tie, either would be fine (although it would be nice if there is a way to choose without long SAS programming code)

Also my data are string variables (with values 01-10) data but if code-wise it's easier to do numerical, I can make them numerical.

data have;
input id 1 x1 $ 4-5 x2 $ 7-8 x3 $ 10-11 x4 $ 13-14 x5 $ 16-17;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02 03
4 01 01 02 02
5 01
;
run;

Want:
ID X1 X2 X3 X4 X5 MostFreq Count
1 07 04 07 07 07 07 3
2 04 05 04 04 05 04 3
3 02 02 03 02 2
4 01 01 02 02 01 2 (or 02 2)
5 05 05 1

FreelanceReinh · Posted 05-11-2016 12:33 PM

Hi @Solph,

Try this:

data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
  if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
run;

This is for character variables x1, x2, ... For numeric variables it could be simplified.

View solution in original post

Solph · Posted 05-11-2016 12:11 PM

Sorry, posted it before I finished composing. My question was if there is some efficient way to do it? I don't know if SAS has some ready to use code to do multiple response (like SPSS has) and write out the values into the datatset, or like the max/min functions (xx=max(of x1-x5) that can easily identify the max/min values out of a list of variables. The code I can come up with to get the most frequent response is a really really long one.

Thanks.

Reeza · Posted 05-11-2016 12:13 PM

The fastest way is to transpose your data, from wide to long, use a proc freq and then re-transpose to desired structure.

Do you need help coding that?

Proc transpose can flip your data each way.

Solph · Posted 05-11-2016 12:50 PM

You meant to have the vertical data? I think it worked perfectly. Here is what I did after transposing. Let me know if there is a better way.

proc sort; by id;
proc freq; by id; table x /out=xx (drop=percent); run;
proc print noobs; run;

It outputted ID, x (01-10) and count for each ID and each X value.

id	x	COUNT
1	04	1
1	07	4
2	04	3
2	05	2

Thanks again.

Astounding · Posted 05-11-2016 12:32 PM

I would recommend an approach that gives you a little more flexibility ... get 5 counts instead of 1 maximum count. For example:

data want;

set have;

array x {5};

array count {5};

do _i_=1 to 5;

count{_i_}=0;

do _n_=1 to 5;

if x{_i_} = x{_n_} then count{_i_} + 1;

end;

run;

At the end of this, COUNT1 will be the number of times that the X1 answer appears, COUNT2 the number of times that the X2 answer appears, etc. You can easily add to this DATA step with statements such as these:

maxcount = max(of count1-count5);

maxvar = whichn(maxcount, of count1-count5);

varname = vname(x{maxvar});

This way you can easily detect ties for the maximum count, detect answers that only appear once, etc.

Solph · Posted 05-11-2016 01:02 PM

I thought about using array to count but got stuck for the rest. I liked the code because it indeed gives more flexibliity for further manipulation, but

- I need to assign a new variable associating varname values (e.g. x1 in the first row) to varaible name x1's value of 07, so that I can create a variable with a value of 07? How do I do it

- How come count for ID=5 didn't look right? Is it because it's counting missing? (Looks like it, for ID 3 and 4 too).

id	x1	x2	x3	x4	x5	count1	count2	count3	count4	count5	_i_	maxcount	maxvar	varname
1	07	04	07	07	07	4	1	4	4	4	6	4	1	x1
2	04	05	04	04	05	3	2	3	3	2	6	3	1	x1
3	02		02	03		2	2	2	1	2	6	2	1	x1
4	01	01	02		02	2	2	2	1	2	6	2	1	x1
5	01					1	4	4	4	4	6	4	2	x2

Astounding · Posted 05-11-2016 01:14 PM

A few variations, then ...

If you don't want to count missing values, change one line in the program:

if x{_i_} > ' ' then do _n_=1 to 5;

The counts for the missing values will remain at zero.

It looks like you can get MAXCOUNT, MAXVAR, and VARNAME. To also get MAXVALUE, you could utilize MAXVAR:

maxvalue = x{maxvar};

FreelanceReinh · Posted 05-11-2016 12:33 PM

Hi @Solph,

Try this:

data want;
set have;
length MostFreq $2;
array x x:;
array _t[10] _temporary_;
call missing(of _t[*]);
do _n_=1 to dim(x);
  if x[_n_] ne ' ' then _t[input(x[_n_],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
run;

This is for character variables x1, x2, ... For numeric variables it could be simplified.

stat_sas · Posted 05-11-2016 12:46 PM

data want(drop=i);
set have;
array x{*} x:;
do i=1 to dim(x);
MostFreq=max(of x(*));
Count=countc(cats(of x(*)),MostFreq,'i');
end;
run;

FreelanceReinh · Posted 05-11-2016 12:54 PM

@stat_sas wrote:

MostFreq=max(of x(*));

@stat_sas: Why should the largest value coincide with the most frequent value?

Solph · Posted 05-11-2016 03:40 PM

Thanks all for the various approaches.

FreelanceReinhard's code is ready to go, for the most frequent response. Thanks a lot (so that I don't need to write additional code or revise code).
stat_sas's code works when I want the max value and its count.

I've filed away all for future reference. Thanks again for your help.

stat_sas · Posted 05-12-2016 01:27 PM

You are right. I thougt this is based on maximum number but this is not the case.

Ksharp · Posted 05-11-2016 10:33 PM

If your variable are Alphabeta NOT digit , try the following code :

data have; 
infile cards truncover;
input id ( x1  x2  x3  x4  x5) ($) ;
cards;
1 07 04 07 07 07
2 04 05 04 04 05
3 02 02  .  03 .
4 01 01 02 02
5 01 . . .
;
run;

data want;
if _n_=1 then do;
 length key $ 40;
 declare hash h();
 declare hiter hi('h');
 h.definekey('key');
 h.definedata('key','count');
 h.definedone();
end;
 set have;
 array xx{*} $ x:;
 do i=1 to dim(xx);
  if not missing(xx{i}) then do;
   key=xx{i};
   if h.find()=0 then count+1;
    else count=1;
   h.replace();
  end;
 end;
 most_freq=0;
 do while(hi.next()=0);
  if most_freq lt count then do;
   mode=key;
   most_freq=count;
  end;
 end;
h.clear();
drop i key count ;
run;

Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find the most frequent response across multiple variables

Re: Find the most frequent response across multiple variables

Re: Find the most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

Re: Find most frequent response across multiple variables

id	x1	x2	x3	x4	x5	count1	count2	count3	count4	count5	_i_	maxcount	maxvar	varname
1	07	04	07	07	07	4	1	4	4	4	6	4	1	x1
2	04	05	04	04	05	3	2	3	3	2	6	3	1	x1
3	02		02	03		2	2	2	1	2	6	2	1	x1
4	01	01	02		02	2	2	2	1	2	6	2	1	x1
5	01					1	4	4	4	4	6	4	2	x2

id	x1	x2	x3	x4	x5	count1	count2	count3	count4	count5	_i_	maxcount	maxvar	varname
1	07	04	07	07	07	4	1	4	4	4	6	4	1	x1
2	04	05	04	04	05	3	2	3	3	2	6	3	1	x1
3	02		02	03		2	2	2	1	2	6	2	1	x1
4	01	01	02		02	2	2	2	1	2	6	2	1	x1
5	01					1	4	4	4	4	6	4	2	x2

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away

id	x1	x2	x3	x4	x5	count1	count2	count3	count4	count5	_i_	maxcount	maxvar	varname
1	07	04	07	07	07	4	1	4	4	4	6	4	1	x1
2	04	05	04	04	05	3	2	3	3	2	6	3	1	x1
3	02		02	03		2	2	2	1	2	6	2	1	x1
4	01	01	02		02	2	2	2	1	2	6	2	1	x1
5	01					1	4	4	4	4	6	4	2	x2