- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guys,
I’m trying to use the PROC MEANS to calculate the mode and output it to an output dataset but it seems that if the classficiation variable is unique then it returns a missing value.
Two questions, is that correct? And is there anyway around this?
Little bit of code attached to demonstrate.
Many thanks in advance for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not a perfect alternative, but the following will work for all cases except D where there aren't any non-unique replications:
proc univariate data=test modes;
class char;
var num;
output out=test2 mode=mode;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Art, that's a solution I have come up with in the meantime but it's taking between a quarter and about a third longer to execute so I was hoping that I was missing something really simple in the proc means.
But at least I can say I have tried to find out when my customers start chuntering!
Thanks again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if n=1 mode=sum or mean or min or max .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DN: Please explain your comment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The only differenc appears to be when N=1 so you can just make the MODE equal to the value. (MIN SUM MEAN MAX)
data test;
input char $ num;
datalines;
A 9
B 8
B 8
B 7
B 7
C 8
D 6
D 5
E 5
E 5
;
run;
proc summary data=test nway;
class char;
var num;
output out=test2 mode=mode n=n min=min;
run;
data test2;
set test2;
if n eq 1 then mode=min;
run;
proc print;
run;
proc univariate data=test noprint;
class char;
var num;
output out=test3 mode=mode n=n min=min;
run;
Proc print;
run;
Proc compare base=test2 compare=test3;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! I was wondering what you were referring to.
There is another problem, though, in that proc means and summary don't output multiple modes when they exist. They simply select (as I recall) the lowest value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To getmulti-modal output you need use the ODS OUTPUT data set from UNIVARIATE. I can't recall the name.
I don't know when the mode is an interesting statistic. I can't recall every putting it into a phama data summary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is definitely of interest if one is trying to use stats to analyze their data, as a multi-modal distribution could easily violate the statistic's assumptions. Of course, I am not a statistician
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Multi what?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK, besides not being a programmer or statistician, I can't type either! I changed multi-nomal to multi-modal in my response.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think SAS is right.
mode is the number which has the biggest frequence. So If you only have one value or
several value has the same frequence, SAS don't know which value is the mode.
data test; input char $ num; datalines; A 9 B 8 B 8 B 7 B 7 C 8 D 6 D 5 E 5 E 5 ; run; proc freq data=test noprint; table char*num/out=freq nopercent nocum; run; proc sort data=freq;by char descending count;run; data want(keep=char num); set freq; by char; if first.char; run;
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ksharp, I don't understand what you are saying. If there is an N-way tie for the most frequent response, all N values involved in the tie constitute the modes of the distribution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ou............... My bad .
data test; input char $ num; datalines; A 9 B 9 B 8 B 7 B 7 C 8 D 6 D 5 E 5 E 5 ; run; proc freq data=test noprint; table char*num/out=freq nopercent nocum; run; proc sort data=freq;by char descending count;run; data want(keep=char num); set freq; by char count notsorted; retain found; if first.char then found=0; if not found then output; if last.count then found=1; run;
Ksharp