DATA Step, Macro, Functions and more

Calculating Mode across variables..

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Calculating Mode across variables..

I have 20 different values for the same person from different sources (Stored in 20 variables Age1, Age2, ... Age20) and need to select one final value of age. We decided to go with Maximum occurred value (Mode) of age. If the person has one mode then that's the value or if more than one then randomly pick one. To do this I need to calculate Mode across variables as we calculate Mean or Sum(Age1, Age2,...Age20) using SAS Function. I couldn't find similar function for calculating MODE. I can transpose and calculate modes for each sub-group but the problem is dataset is very large (4-million records and 20 variables) and I need their mode recombined to their IDs. So better way is if I can find MODEs. across the variables Is there any way to calculate this?


Accepted Solutions
Solution
‎07-08-2014 08:51 AM
Super User
Posts: 9,687

Re: Calculating Mode across variables..

OH, Very sorry about that. I just found a fatal error which could generate your kind error. I should wirte     if _n_ eq 1    NOT   if _n_  , otherwise SAS will create a Hash Object for every Obs . Sorry about that again.

data want(drop=k count i max ran);

if _n_ eq 1 then do;

declare hash ha();

declare hiter hi('ha');

  ha.definekey('k');

  ha.definedata('k','count');

  ha.definedone();

declare hash random(ordered:'a');

declare hiter hir('random');

  random.definekey('ran');

  random.definedata('k','count');

  random.definedone();

end;

set x;

  array x{*} age: ;

do i=1 to dim(x);

if not missing(x{i}) then do;

     k=x{i};

     if ha.find()=0 then do;count+1;ha.replace();end;

       else do;count=1;ha.add();end;

end;

end;

do while(hi.next()=0);

ran=ranuni(-1);random.add();

end;

max=0;

do while(hir.next()=0);

if count gt max then do;max=count;mod=k;end;

end;

ha.clear();     random.clear();

run;

Xia Keshan

Message was edited by: xia keshan

View solution in original post


All Replies
Trusted Advisor
Posts: 1,621

Re: Calculating Mode across variables..

Google finds an example of a SAS program that computes mode

http://www.tek-tips.com/viewthread.cfm?qid=1461025

There's no need to post the same question in multiple forums

Super User
Posts: 9,687

Re: Calculating Mode across variables..

OK. Here is :

data x(drop=i j);
 array x{*} a1-a6 ;
 do i=1 to 100;
  do j=1 to dim(x);
   x{j}=ceil(ranuni(0)*20);
  end;
  output;
 end;
run;
data want(drop=k count i max);
if _n_ then do;
 declare hash ha();
 declare hiter hi('ha');
  ha.definekey('k');
  ha.definedata('k','count');
  ha.definedone();
end;
 set x;
  array x{*} a1-a6 ;
do i=1 to dim(x);
 k=x{i};
 if ha.find()=0 then do;count+1;ha.replace();end;
  else do;count=1;ha.add();end;
end;
max=0;
do while(hi.next()=0);
 if count gt max then do;max=count;mod=k;end;
end;
ha.clear();
run;
  

Xia Keshan

Occasional Contributor
Posts: 10

Re: Calculating Mode across variables..

Hi KSharp,

Thank you very much for the reply and codes. This code perfectly works for non-missing observations. I've lots of missing (Like some IDs have only 2-values, some has 4 and some has 10 or 20). These codes generates lots of missing values for MODE if So I need to calculate MODE for available values across columns(Variables) and also want to find out multi-modes if bimodal. I check other softwares and Excel (2010) has these functions MODE.SNGL and MODE.MULT. I am looking for similar results as Number of records are very large and in SAS.

Super User
Posts: 9,687

Re: Calculating Mode across variables..

It is for non-missing value.

data x(drop=i j);
 array x{*} a1-a6 ;
 do i=1 to 100;
  do j=1 to dim(x);
   x{j}=ifn(ranuni(0) < 0.2,., ceil(ranuni(0)*20) );
  end;
  output;
 end;
run;
data want(drop=k count i max);
if _n_ then do;
 declare hash ha();
 declare hiter hi('ha');
  ha.definekey('k');
  ha.definedata('k','count');
  ha.definedone();
end;
 set x;
  array x{*} a1-a6 ;
do i=1 to dim(x);
 if not missing(x{i}) then do;
     k=x{i};
     if ha.find()=0 then do;count+1;ha.replace();end;
       else do;count=1;ha.add();end;
 end;
end;
max=0;
do while(hi.next()=0);
 if count gt max then do;max=count;mod=k;end;
end;
ha.clear();
run;
  

"also want to find out multi-modes if bimodal."

About it. I also can do it, but I need more details and the output you want.




Xia Keshan

Occasional Contributor
Posts: 10

Re: Calculating Mode across variables..

Here is Sample data and answer(Output) expectation:

Sample Fake Data ValuesAnswer expected
IDAge1Age2Age3Age4Age5Age6
A0110.0...10.011.010 (Mode Value)
A03...13.112.013.5Random one from all different values
A05.12.012.012.012.513.012 (Mode Value)
A0711.811.011.0.13.010.011 (Mode Value)
A0914.014.010.0.10.015.0random one from two (Mode Values)
A11..17.0.16.217.017 (Mode Value)
A1311.0.11.311.011.311.011 (Mode Value)
A15...10.711.010.710.7 (Mode Value)
A17.14.0.12.114.012.1random one from two (Mode Values)
A1914.0.....14 (Mode Value)
A21.12.0....12 (Mode Value)
A23....12.014.0Random one from all different values
A2512.513.012.713.511.912.6Random one from all different values

Thank you very much for the answer.

Hitesh

Respected Advisor
Posts: 3,124

Re: Calculating Mode across variables..

Hitesh,

Ksharp is on Beijing time, so he may not be able to get back to you in a while. But Ksharp's new code seems to work out of box. Have you actually tried it?

Regards,

Haikuo

Super User
Posts: 9,687

Re: Calculating Mode across variables..

I have already given you solution. Didn't you try it ?

data x;
infile cards truncover;
input ID     $  Age1     Age2     Age3     Age4     Age5     Age6     ;
cards;
A01     10.0     .     .     .     10.0     11.0     10 
A03     .     .     .     13.1     12.0     13.5     
A05     .     12.0     12.0     12.0     12.5     13.0     12 
A07     11.8     11.0     11.0     .     13.0     10.0     11 
A09     14.0     14.0     10.0     .     10.0     15.0     
A11     .     .     17.0     .     16.2     17.0     17 
A13     11.0     .     11.3     11.0     11.3     11.0     11
A15     .     .     .     10.7     11.0     10.7     10.7 
A17     .     14.0     .     12.1     14.0     12.1     
A19     14.0     .     .     .     .     .     14 
A21     .     12.0     .     .     .     .     12 
A23     .     .     .     .     12.0     14.0     
A25     12.5     13.0     12.7     13.5     11.9     12.6
;
run;

data want(drop=k count i max);
if _n_ then do;
 declare hash ha();
 declare hiter hi('ha');
  ha.definekey('k');
  ha.definedata('k','count');
  ha.definedone();
end;
 set x;
  array x{*} age: ;
do i=1 to dim(x);
 if not missing(x{i}) then do;
     k=x{i};
     if ha.find()=0 then do;count+1;ha.replace();end;
       else do;count=1;ha.add();end;
 end;
end;
max=0;
do while(hi.next()=0);
 if count gt max then do;max=count;mod=k;end;
end;
ha.clear();
run;

Xia Keshan

Occasional Contributor
Posts: 10

Re: Calculating Mode across variables..

Hi Xia,

Thank you very much. This is very much helpful!

Hitesh

Occasional Contributor
Posts: 19

Re: Calculating Mode across variables..

Hi Xia,

Thank you very much for providing the codes.

The code works but gives same value of mode selected (results) all the runs which may causes some bias. To eliminate this bias I want random selection in two different situations, which are:

1. If there are multiple modes (two or three), pick one mode value randomly

2. If there is no mode, pick one value randomly from available values in the columns

Your help is appreciated.

Hitesh

Super User
Posts: 9,687

Re: Calculating Mode across variables..

OK. No problem. Let me know if it worked.

 
data x;
infile cards truncover;
input ID     $  Age1     Age2     Age3     Age4     Age5     Age6     ;
cards;
A01     10.0     .     .     .     10.0     11.0     10 
A03     .     .     .     13.1     12.0     13.5     
A05     .     12.0     12.0     12.0     12.5     13.0     12 
A07     11.8     11.0     11.0     .     13.0     10.0     11 
A09     14.0     14.0     10.0     .     10.0     15.0     
A11     .     .     17.0     .     16.2     17.0     17 
A13     11.0     .     11.3     11.0     11.3     11.0     11
A15     .     .     .     10.7     11.0     10.7     10.7 
A17     .     14.0     .     12.1     14.0     12.1     
A19     14.0     .     .     .     .     .     14 
A21     .     12.0     .     .     .     .     12 
A23     .     .     .     .     12.0     14.0     
A25     12.5     13.0     12.7     13.5     11.9     12.6
;
run;

data want(drop=k count i max);
if _n_ then do;
 declare hash ha();
 declare hiter hi('ha');
  ha.definekey('k');
  ha.definedata('k','count');
  ha.definedone();

 declare hash random(ordered:'a');
 declare hiter hir('random');
  random.definekey('ran');
  random.definedata('k','count');
  random.definedone();

end;
 set x;
  array x{*} age: ;
do i=1 to dim(x);
 if not missing(x{i}) then do;
     k=x{i};
     if ha.find()=0 then do;count+1;ha.replace();end;
       else do;count=1;ha.add();end;
 end;
end;
do while(hi.next()=0);
 ran=ceil(ranuni(-1)*1000000);random.add();
end;
max=0;
do while(hir.next()=0);
 if count gt max then do;max=count;mod=k;end;
end;
ha.clear();     random.clear();
run;

Xia Keshan

Occasional Contributor
Posts: 10

Re: Calculating Mode across variables..

Hi KSharp,

Sorry for bothering you again..

The program creating Hash/Hiter works perfectly and serves all my purposes for small data..up to 10,000 records but,

for files larger than that gives ERROR Message and stops processing. The error message is:

ERROR: Hash object added 0 items when memory failure occurred.

FATAL: Insufficient memory to execute DATA Step program. Aborted during Execution phase.

ERROR: SAS system stopped processing this step because of insufficient memory.

I've tried two different recommendations:

1. MEMSIZE in config file changed to 4G, didn't help.

2. used HASHEXP:20 in Declare statement "declare hash ha(hashexp:20)" in your syntax.

Please, let me know if you have solution for this problem. I've 3.3Million records and 16 Age variables for each record.

Once again, thank you very much for all your help.

Hitesh

Solution
‎07-08-2014 08:51 AM
Super User
Posts: 9,687

Re: Calculating Mode across variables..

OH, Very sorry about that. I just found a fatal error which could generate your kind error. I should wirte     if _n_ eq 1    NOT   if _n_  , otherwise SAS will create a Hash Object for every Obs . Sorry about that again.

data want(drop=k count i max ran);

if _n_ eq 1 then do;

declare hash ha();

declare hiter hi('ha');

  ha.definekey('k');

  ha.definedata('k','count');

  ha.definedone();

declare hash random(ordered:'a');

declare hiter hir('random');

  random.definekey('ran');

  random.definedata('k','count');

  random.definedone();

end;

set x;

  array x{*} age: ;

do i=1 to dim(x);

if not missing(x{i}) then do;

     k=x{i};

     if ha.find()=0 then do;count+1;ha.replace();end;

       else do;count=1;ha.add();end;

end;

end;

do while(hi.next()=0);

ran=ranuni(-1);random.add();

end;

max=0;

do while(hir.next()=0);

if count gt max then do;max=count;mod=k;end;

end;

ha.clear();     random.clear();

run;

Xia Keshan

Message was edited by: xia keshan

Occasional Contributor
Posts: 10

Re: Calculating Mode across variables..

Thank you very much, Xia !

Super User
Posts: 9,687

Re: Calculating Mode across variables..

You are welcome. and Don't forget to let me know if it worked . Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 1925 views
  • 9 likes
  • 6 in conversation