turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Finding MODE in SAS data step

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 06:54 AM

Hi All,

anybody can please guide me how can i find the MODE in SAS data step?

with help of Proc Univeriate or Proc Means, i can get the answers but i want to find that in SAS data step, see example below:

there are 5 column and 10 observation in the table columns : a1, a2, a3, a4, a5

i want to find mean and mode for each 10 rows based on those five columns.

Data AA;

set ZZ;

Avg = Mean(a1, a2, a3, a4, a5);

Mode = ? /* how to find mode as like mean?*/

run;

Thanks,

KP

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 06:55 AM

MODE : i.e. most common frequency

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 08:16 AM

Maybe Im miss reading something but wouldnt means or univariate give you the mode down the column and not across the columns?

I dont know of mode function in the datastep and to get (as I understand it) from univariate or means I think you would have to transpose the data.

EJ

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 12:06 PM

Hi EJ,

yes you are right, Proc Univeriate can give me the answer vertically but i am looking for Horizontly.

Transposing the table will work but practically its not possible.... as I am talking about finding the MODE for more than 100,000 each rows.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 11:48 AM

There is no simple one line MODE for the data step. I think partially because there would have to be tie-breaking rule decisions. For your data, if all of the variables have a different value which is the mode? If two variables have one value and two more have a different value but are the same, such as 1 1 3 5 5?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 04:12 PM

If you really ( I mean really) want one, here is one, it can be tweaked to work on Char as well, if there is a tie, it chooses randomly (I think):

**data** have;

input v1-v5;

cards;

1 1 1 2 3

1 2 2 3 4

1 2 3 4 5

1 2 3 2 4

;

**data** want (drop=rc rename=(value=mode));

declare hash h();

h.definekey('value');

h.definedata('value','count');

h.definedone();

declare hash h1(ordered:'d');

h1.definekey('count');

h1.definedata('value','count');

h1.definedone();

declare hiter hi('h');

declare hiter hi1('h1');

set have;

array v v:;

do over v;

if h.find(key:v) ne **0** then do; count=**1**; value=v; h.replace();end;

else do; value=v;count+**1**; h.replace();end;

end;

do rc=hi.first() by **0** while (rc=**0**);

h1.replace();

rc=hi.next();

end;

hi1.first();

**run**;

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-07-2013 12:36 PM

Haikuo,

Will your program have a pretty big performance issue as the number of observations grows? Would it be better to declare the hash(s) only one time and clear them at the end of the data step for the next observation?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-07-2013 12:46 PM

Good point, DN. Honestly I have no idea which one is more efficient, re-declare or h.clear() for each obs. I have seen both, this one got chosen merely for the reason of getting a shorter code.

Thanks for pointing it out and, OP be aware if using my code.

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 06:53 PM

Here's the DATA step version. To simplify things, I assume you know how many numeric variables you will want to process. If need be, macro language could count them anyway.

data want;

set have;

array nums {150} _numeric_;

array counts {150} _temporary_;

do _n_=1 to 149;

counts{_n_}=1;

do _i_=_n_+1 to 150;

if nums{_n_} = nums{_i_} then counts{_n_} + 1;

end;

if counts{_n_} > maxcount then do;

mode = nums{_n_};

maxcount = counts{_n_};

end;

end;

run;

In the case of ties, it just takes the first one.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 08:19 PM

Two second thoughts here. First, it's unfair to call this the DATA step solution since both are DATA step solutions. How about the ARRAY solution? And second, there's no need for a second array (unless you need to locate ties for the mode). A single variable would do. So ...

data want;

set have;

array nums {150} _numeric_;

do _n_=1 to 149;

count=1;

do _i_ = _n_+1 to 150;

if nums{_n_} = nums{_i_} then count + 1;

end;

if count > maxcount then do;

mode = nums{_n_};

maxcount = count;

end;

end;

run;