BookmarkSubscribeRSS Feed
KrunalPatel
Calcite | Level 5

Hi All,

anybody can please guide me how can i find the MODE in SAS data step?

with help of Proc Univeriate or Proc Means, i can get the answers but i want to find that in SAS data step, see example below:

there are 5 column and 10 observation in the table columns : a1, a2, a3, a4, a5

i want to find mean and mode for each 10 rows based on those five columns.

Data AA;

set ZZ;

Avg = Mean(a1, a2, a3, a4, a5);

Mode = ? /* how to find mode as like mean?*/

run;

Thanks,

KP

9 REPLIES 9
KrunalPatel
Calcite | Level 5

MODE : i.e. most common frequency

esjackso
Quartz | Level 8

Maybe Im miss reading something but wouldnt means or univariate give you the mode down the column and not across the columns?

I dont know of mode function in the datastep and to get (as I understand it) from univariate or means I think you would have to transpose the data.

EJ

KrunalPatel
Calcite | Level 5

Hi EJ,

yes you are right, Proc Univeriate can give me the answer vertically but i am looking for Horizontly.

Transposing the table will work but practically its not possible.... as I am talking about finding the MODE for more than 100,000 each rows.

ballardw
Super User

There is no simple one line MODE for the data step. I think partially because there would have to be tie-breaking rule decisions. For your data, if all of the variables have a different value which is the mode? If two variables have one value and two more have a different value but are the same, such as 1 1 3 5 5?

Haikuo
Onyx | Level 15

If you really ( I mean really) want one, here is one, it can be tweaked to work on Char as well, if there is a tie, it chooses randomly (I think):

data have;

input v1-v5;

cards;

1 1 1 2 3

1 2 2 3 4

1 2 3 4 5

1 2 3 2 4

;

data want (drop=rc rename=(value=mode));

         declare hash h();

h.definekey('value');

h.definedata('value','count');

h.definedone();

         declare hash h1(ordered:'d');

h1.definekey('count');

h1.definedata('value','count');

h1.definedone();

         declare hiter hi('h');

         declare hiter hi1('h1');

set have;

  array v v:;

  do over v;

    if h.find(key:v) ne 0 then do; count=1; value=v; h.replace();end;

    else do; value=v;count+1; h.replace();end;

  end;

  do rc=hi.first() by 0 while (rc=0);

     h1.replace();

       rc=hi.next();

  end;

  hi1.first();

run;

Haikuo

data_null__
Jade | Level 19

Haikuo,

Will your program have a pretty big performance issue as the number of observations grows?  Would it be better to declare the hash(s) only one time and clear them at the end of the data step for the next observation?

Haikuo
Onyx | Level 15

Good point, DN. Honestly I have no idea which one is more efficient, re-declare or h.clear() for each obs. I have seen both, this one got chosen merely for the reason of getting a shorter code.

Thanks for pointing it out and, OP be aware if using my code.

Haikuo

Astounding
PROC Star

Here's the DATA step version.  To simplify things, I assume you know how many numeric variables you will want to process.  If need be, macro language could count them anyway.

data want;

  set have;

  array nums {150} _numeric_;

  array counts {150} _temporary_;

  do _n_=1 to 149;

      counts{_n_}=1;

      do _i_=_n_+1 to 150;

          if nums{_n_} = nums{_i_} then counts{_n_} + 1;

      end;

      if counts{_n_} > maxcount then do;

         mode = nums{_n_};

         maxcount = counts{_n_};

     end;

  end;

run;

In the case of ties, it just takes the first one.

Good luck.

Astounding
PROC Star

Two second thoughts here.  First, it's unfair to call this the DATA step solution since both are DATA step solutions.  How about the ARRAY solution?  And second, there's no need for a second array (unless you need to locate ties for the mode).  A single variable would do.  So ...

data want;

  set have;

  array nums {150} _numeric_;

  do _n_=1 to 149;

     count=1;

     do _i_ = _n_+1 to 150;

        if nums{_n_} = nums{_i_} then count + 1;

     end;

     if count > maxcount then do;

        mode = nums{_n_};

        maxcount = count;

     end;

  end;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 9586 views
  • 2 likes
  • 6 in conversation