02-10-2012 07:23 AM
I’m trying to use the PROC MEANS to calculate the mode and output it to an output dataset but it seems that if the classficiation variable is unique then it returns a missing value.
Two questions, is that correct? And is there anyway around this?
Little bit of code attached to demonstrate.
Many thanks in advance for your help!
02-10-2012 08:26 AM
Not a perfect alternative, but the following will work for all cases except D where there aren't any non-unique replications:
proc univariate data=test modes;
output out=test2 mode=mode;
02-10-2012 12:10 PM
Thanks Art, that's a solution I have come up with in the meantime but it's taking between a quarter and about a third longer to execute so I was hoping that I was missing something really simple in the proc means.
But at least I can say I have tried to find out when my customers start chuntering!
02-10-2012 05:20 PM
The only differenc appears to be when N=1 so you can just make the MODE equal to the value. (MIN SUM MEAN MAX)
input char $ num;
proc summary data=test nway;
output out=test2 mode=mode n=n min=min;
if n eq 1 then mode=min;
proc univariate data=test noprint;
output out=test3 mode=mode n=n min=min;
Proc compare base=test2 compare=test3;
02-10-2012 05:23 PM
Thanks! I was wondering what you were referring to.
There is another problem, though, in that proc means and summary don't output multiple modes when they exist. They simply select (as I recall) the lowest value.
02-10-2012 05:57 PM
To getmulti-modal output you need use the ODS OUTPUT data set from UNIVARIATE. I can't recall the name.
I don't know when the mode is an interesting statistic. I can't recall every putting it into a phama data summary.
02-10-2012 06:01 PM
It is definitely of interest if one is trying to use stats to analyze their data, as a multi-modal distribution could easily violate the statistic's assumptions. Of course, I am not a statistician
02-12-2012 10:18 PM
I think SAS is right.
mode is the number which has the biggest frequence. So If you only have one value or
several value has the same frequence, SAS don't know which value is the mode.
data test; input char $ num; datalines; A 9 B 8 B 8 B 7 B 7 C 8 D 6 D 5 E 5 E 5 ; run; proc freq data=test noprint; table char*num/out=freq nopercent nocum; run; proc sort data=freq;by char descending count;run; data want(keep=char num); set freq; by char; if first.char; run;
02-12-2012 10:42 PM
Ksharp, I don't understand what you are saying. If there is an N-way tie for the most frequent response, all N values involved in the tie constitute the modes of the distribution.
02-13-2012 04:49 AM
Ou............... My bad .
data test; input char $ num; datalines; A 9 B 9 B 8 B 7 B 7 C 8 D 6 D 5 E 5 E 5 ; run; proc freq data=test noprint; table char*num/out=freq nopercent nocum; run; proc sort data=freq;by char descending count;run; data want(keep=char num); set freq; by char count notsorted; retain found; if first.char then found=0; if not found then output; if last.count then found=1; run;