I've a data set and would like to find the most frequent value from variables, along with max number of response and the count for the most frequent value.
data have; input id 1 fy2004 $ 3-4 fy2005 $ 6-7 fy2006 $ 9-10 fy2007 $ 12-13 mostfreq $ 17-18 count 20 total_count 22;
datalines;
1 13 13 13 13 13 4 4
2 14 14 14 14 3 3
3 12 12 12 05 12 3 4
4 01 01 02 01 2 4
5 01 01 12 12 02 2 4
;
data want;
set have;
length MostFreq $2;
array list fy:;
array _t[10] _temporary_;
call missing(of _t[*]);
do i=1 to dim(list);
if list[i] ne '' then _t[input(list[i],2.)]+1;
end;
Count=max(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
drop i;
run;
The code came from https://communities.sas.com/t5/General-SAS-Programming/Find-most-frequent-response-across-multiple-v.... I modified it but don't know why it didn't work on my data. I also need a third variable for total count of responses.
Much appeciated for your help.
@Solph wrote:
Sorry. I've a follow up question. Hope you can help out. What if my smallest number is 00, not 01. The code wouldn't work when the starting number is 00. Thanks in advance.
No problem. As @Reeza said, define the temporary array as
array _t[0:99] _temporary_;
in order to count the number of '00' responses in _t[0], the number of '01' responses in _t[1], ..., and the number of '99' responses (if any) in _t[99]. There is one more change required, though: The WHICHN function used in the definition of MostFreq now yields numbers 1, 2, 3, ... if the maximum count is found in _t[0], _t[1], _t[2], ... To compensate for that, we have to subtract 1 from those values. Then they match the responses again.
MostFreq=put(whichn(Count, of _t[*])-1,z2.);
Hello @Solph,
Happy to see that my 2016 solution is still in use. 🙂
Back then your possible response values were '01' - '10', which was the reason for dimension 10 of the temporary array _t. Now, apparently, you need at least dimension 14. If you like, you can also use 99 to be on the safe side (e.g. for future additional questions). Note that this array contains the count for response '01' in its first element (_t[1]), the count for response '02' in the second and so on.
For the total count you can insert the line
Total_Count=dim(list)-cmiss(of list[*]);
This will result in the number of non-missing values among the variables in array list.
Edit: Of course, a shorter formula for the total count is
Total_Count=sum(of _t[*]);
However, there is one case where the two would differ: If all responses are missing, the first formula gives 0, whereas the second results in a missing value together with the familiar note "Missing values were generated ..." in the log (which would be triggered anyway by the formula for Count, though).
Thanks a lot. It worked perfectly. Don't know why I didn't catch it. The code below reflected your input and now worked.
data want;
set have;
length MostFreq $2;
array list fy:;
array _t[99] _temporary_; *Set the max possible value to be 99;
call missing(of _t[*]);
do i=1 to dim(list);
if list[i] ne '' then _t[input(list[i],2.)]+1;
end;
Count=max(of _t[*]);
Total_Count=sum(of _t[*]);
MostFreq=put(whichn(Count, of _t[*]),z2.);
drop i;
run;
Sorry. I've a follow up question. Hope you can help out. What if my smallest number is 00, not 01. The code wouldn't work when the starting number is 00. Thanks in advance.
data have; input id 1 fy2004 $ 3-4 fy2005 $ 6-7 fy2006 $ 9-10 fy2007 $ 12-13 _Mostfreq $ 17-18 _Count 20 _Total_count 22;
datalines;
1 00 13 13 13 13 3 4
2 14 14 14 14 3 3
3 12 12 12 05 12 3 4
4 01 01 02 01 2 3
5 00 00 12 12 00 2 4
;
@Solph wrote:
Sorry. I've a follow up question. Hope you can help out. What if my smallest number is 00, not 01. The code wouldn't work when the starting number is 00. Thanks in advance.
No problem. As @Reeza said, define the temporary array as
array _t[0:99] _temporary_;
in order to count the number of '00' responses in _t[0], the number of '01' responses in _t[1], ..., and the number of '99' responses (if any) in _t[99]. There is one more change required, though: The WHICHN function used in the definition of MostFreq now yields numbers 1, 2, 3, ... if the maximum count is found in _t[0], _t[1], _t[2], ... To compensate for that, we have to subtract 1 from those values. Then they match the responses again.
MostFreq=put(whichn(Count, of _t[*])-1,z2.);
Thanks a lot FreelanceReinhard and Reeza. They totally addressed my needs. So the code below worked! Thanks so much.
data have; input id 1 fy2004 $ 3-4 fy2005 $ 6-7 fy2006 $ 9-10 fy2007 $ 12-13 _Mostfreq $ 17-18 _Count 20 _Total_count 22; datalines; 1 00 13 13 13 13 3 4 2 14 14 14 14 3 3 3 12 12 12 05 12 3 4 4 01 01 02 01 2 3 5 00 00 12 12 00 2 4 ; data want; set have; length MostFreq $2; array list fy:; array _t[0:99] _temporary_; *Set the max possoble value, in this example 99; call missing(of _t[*]); do i=1 to dim(list); if list[i] ne '' then _t[input(list[i],2.)]+1; end; Count=max(of _t[*]); Total_Count=sum(of _t[*]); *MostFreq=put(whichn(Count, of _t[*]),z2.); MostFreq=put(whichn(Count, of _t[*])-1,z2.); drop i; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.