Hi All,
anybody can please guide me how can i find the MODE in SAS data step?
with help of Proc Univeriate or Proc Means, i can get the answers but i want to find that in SAS data step, see example below:
there are 5 column and 10 observation in the table columns : a1, a2, a3, a4, a5
i want to find mean and mode for each 10 rows based on those five columns.
Data AA;
set ZZ;
Avg = Mean(a1, a2, a3, a4, a5);
Mode = ? /* how to find mode as like mean?*/
run;
Thanks,
KP
MODE : i.e. most common frequency
Maybe Im miss reading something but wouldnt means or univariate give you the mode down the column and not across the columns?
I dont know of mode function in the datastep and to get (as I understand it) from univariate or means I think you would have to transpose the data.
EJ
Hi EJ,
yes you are right, Proc Univeriate can give me the answer vertically but i am looking for Horizontly.
Transposing the table will work but practically its not possible.... as I am talking about finding the MODE for more than 100,000 each rows.
There is no simple one line MODE for the data step. I think partially because there would have to be tie-breaking rule decisions. For your data, if all of the variables have a different value which is the mode? If two variables have one value and two more have a different value but are the same, such as 1 1 3 5 5?
If you really ( I mean really) want one, here is one, it can be tweaked to work on Char as well, if there is a tie, it chooses randomly (I think):
data have;
input v1-v5;
cards;
1 1 1 2 3
1 2 2 3 4
1 2 3 4 5
1 2 3 2 4
;
data want (drop=rc rename=(value=mode));
declare hash h();
h.definekey('value');
h.definedata('value','count');
h.definedone();
declare hash h1(ordered:'d');
h1.definekey('count');
h1.definedata('value','count');
h1.definedone();
declare hiter hi('h');
declare hiter hi1('h1');
set have;
array v v:;
do over v;
if h.find(key:v) ne 0 then do; count=1; value=v; h.replace();end;
else do; value=v;count+1; h.replace();end;
end;
do rc=hi.first() by 0 while (rc=0);
h1.replace();
rc=hi.next();
end;
hi1.first();
run;
Haikuo
Haikuo,
Will your program have a pretty big performance issue as the number of observations grows? Would it be better to declare the hash(s) only one time and clear them at the end of the data step for the next observation?
Good point, DN. Honestly I have no idea which one is more efficient, re-declare or h.clear() for each obs. I have seen both, this one got chosen merely for the reason of getting a shorter code.
Thanks for pointing it out and, OP be aware if using my code.
Haikuo
Here's the DATA step version. To simplify things, I assume you know how many numeric variables you will want to process. If need be, macro language could count them anyway.
data want;
set have;
array nums {150} _numeric_;
array counts {150} _temporary_;
do _n_=1 to 149;
counts{_n_}=1;
do _i_=_n_+1 to 150;
if nums{_n_} = nums{_i_} then counts{_n_} + 1;
end;
if counts{_n_} > maxcount then do;
mode = nums{_n_};
maxcount = counts{_n_};
end;
end;
run;
In the case of ties, it just takes the first one.
Good luck.
Two second thoughts here. First, it's unfair to call this the DATA step solution since both are DATA step solutions. How about the ARRAY solution? And second, there's no need for a second array (unless you need to locate ties for the mode). A single variable would do. So ...
data want;
set have;
array nums {150} _numeric_;
do _n_=1 to 149;
count=1;
do _i_ = _n_+1 to 150;
if nums{_n_} = nums{_i_} then count + 1;
end;
if count > maxcount then do;
mode = nums{_n_};
maxcount = count;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.