BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hitesh
Fluorite | Level 6

I have 20 different values for the same person from different sources (Stored in 20 variables Age1, Age2, ... Age20) and need to select one final value of age. We decided to go with Maximum occurred value (Mode) of age. If the person has one mode then that's the value or if more than one then randomly pick one. To do this I need to calculate Mode across variables as we calculate Mean or Sum(Age1, Age2,...Age20) using SAS Function. I couldn't find similar function for calculating MODE. I can transpose and calculate modes for each sub-group but the problem is dataset is very large (4-million records and 20 variables) and I need their mode recombined to their IDs. So better way is if I can find MODEs. across the variables Is there any way to calculate this?

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OH, Very sorry about that. I just found a fatal error which could generate your kind error. I should wirte     if _n_ eq 1    NOT   if _n_  , otherwise SAS will create a Hash Object for every Obs . Sorry about that again.

data want(drop=k count i max ran);

if _n_ eq 1 then do;

declare hash ha();

declare hiter hi('ha');

  ha.definekey('k');

  ha.definedata('k','count');

  ha.definedone();

declare hash random(ordered:'a');

declare hiter hir('random');

  random.definekey('ran');

  random.definedata('k','count');

  random.definedone();

end;

set x;

  array x{*} age: ;

do i=1 to dim(x);

if not missing(x{i}) then do;

     k=x{i};

     if ha.find()=0 then do;count+1;ha.replace();end;

       else do;count=1;ha.add();end;

end;

end;

do while(hi.next()=0);

ran=ranuni(-1);random.add();

end;

max=0;

do while(hir.next()=0);

if count gt max then do;max=count;mod=k;end;

end;

ha.clear();     random.clear();

run;

Xia Keshan

Message was edited by: xia keshan

View solution in original post

18 REPLIES 18
PaigeMiller
Diamond | Level 26

Google finds an example of a SAS program that computes mode

http://www.tek-tips.com/viewthread.cfm?qid=1461025

There's no need to post the same question in multiple forums

--
Paige Miller
Ksharp
Super User

OK. Here is :

data x(drop=i j);
 array x{*} a1-a6 ;
 do i=1 to 100;
  do j=1 to dim(x);
   x{j}=ceil(ranuni(0)*20);
  end;
  output;
 end;
run;
data want(drop=k count i max);
if _n_ then do;
 declare hash ha();
 declare hiter hi('ha');
  ha.definekey('k');
  ha.definedata('k','count');
  ha.definedone();
end;
 set x;
  array x{*} a1-a6 ;
do i=1 to dim(x);
 k=x{i};
 if ha.find()=0 then do;count+1;ha.replace();end;
  else do;count=1;ha.add();end;
end;
max=0;
do while(hi.next()=0);
 if count gt max then do;max=count;mod=k;end;
end;
ha.clear();
run;
  

Xia Keshan

Hitesh
Fluorite | Level 6

Hi KSharp,

Thank you very much for the reply and codes. This code perfectly works for non-missing observations. I've lots of missing (Like some IDs have only 2-values, some has 4 and some has 10 or 20). These codes generates lots of missing values for MODE if So I need to calculate MODE for available values across columns(Variables) and also want to find out multi-modes if bimodal. I check other softwares and Excel (2010) has these functions MODE.SNGL and MODE.MULT. I am looking for similar results as Number of records are very large and in SAS.

Ksharp
Super User

It is for non-missing value.

data x(drop=i j);
 array x{*} a1-a6 ;
 do i=1 to 100;
  do j=1 to dim(x);
   x{j}=ifn(ranuni(0) < 0.2,., ceil(ranuni(0)*20) );
  end;
  output;
 end;
run;
data want(drop=k count i max);
if _n_ then do;
 declare hash ha();
 declare hiter hi('ha');
  ha.definekey('k');
  ha.definedata('k','count');
  ha.definedone();
end;
 set x;
  array x{*} a1-a6 ;
do i=1 to dim(x);
 if not missing(x{i}) then do;
     k=x{i};
     if ha.find()=0 then do;count+1;ha.replace();end;
       else do;count=1;ha.add();end;
 end;
end;
max=0;
do while(hi.next()=0);
 if count gt max then do;max=count;mod=k;end;
end;
ha.clear();
run;
  

"also want to find out multi-modes if bimodal."

About it. I also can do it, but I need more details and the output you want.




Xia Keshan

Hitesh
Fluorite | Level 6

Here is Sample data and answer(Output) expectation:

Sample Fake Data ValuesAnswer expected
IDAge1Age2Age3Age4Age5Age6
A0110.0...10.011.010 (Mode Value)
A03...13.112.013.5Random one from all different values
A05.12.012.012.012.513.012 (Mode Value)
A0711.811.011.0.13.010.011 (Mode Value)
A0914.014.010.0.10.015.0random one from two (Mode Values)
A11..17.0.16.217.017 (Mode Value)
A1311.0.11.311.011.311.011 (Mode Value)
A15...10.711.010.710.7 (Mode Value)
A17.14.0.12.114.012.1random one from two (Mode Values)
A1914.0.....14 (Mode Value)
A21.12.0....12 (Mode Value)
A23....12.014.0Random one from all different values
A2512.513.012.713.511.912.6Random one from all different values

Thank you very much for the answer.

Hitesh

Haikuo
Onyx | Level 15

Hitesh,

Ksharp is on Beijing time, so he may not be able to get back to you in a while. But Ksharp's new code seems to work out of box. Have you actually tried it?

Regards,

Haikuo

Ksharp
Super User

I have already given you solution. Didn't you try it ?

data x;
infile cards truncover;
input ID     $  Age1     Age2     Age3     Age4     Age5     Age6     ;
cards;
A01     10.0     .     .     .     10.0     11.0     10 
A03     .     .     .     13.1     12.0     13.5     
A05     .     12.0     12.0     12.0     12.5     13.0     12 
A07     11.8     11.0     11.0     .     13.0     10.0     11 
A09     14.0     14.0     10.0     .     10.0     15.0     
A11     .     .     17.0     .     16.2     17.0     17 
A13     11.0     .     11.3     11.0     11.3     11.0     11
A15     .     .     .     10.7     11.0     10.7     10.7 
A17     .     14.0     .     12.1     14.0     12.1     
A19     14.0     .     .     .     .     .     14 
A21     .     12.0     .     .     .     .     12 
A23     .     .     .     .     12.0     14.0     
A25     12.5     13.0     12.7     13.5     11.9     12.6
;
run;

data want(drop=k count i max);
if _n_ then do;
 declare hash ha();
 declare hiter hi('ha');
  ha.definekey('k');
  ha.definedata('k','count');
  ha.definedone();
end;
 set x;
  array x{*} age: ;
do i=1 to dim(x);
 if not missing(x{i}) then do;
     k=x{i};
     if ha.find()=0 then do;count+1;ha.replace();end;
       else do;count=1;ha.add();end;
 end;
end;
max=0;
do while(hi.next()=0);
 if count gt max then do;max=count;mod=k;end;
end;
ha.clear();
run;

Xia Keshan

Hitesh
Fluorite | Level 6

Hi Xia,

Thank you very much. This is very much helpful!

Hitesh

devlekhu_98
Fluorite | Level 6

Hi Xia,

Thank you very much for providing the codes.

The code works but gives same value of mode selected (results) all the runs which may causes some bias. To eliminate this bias I want random selection in two different situations, which are:

1. If there are multiple modes (two or three), pick one mode value randomly

2. If there is no mode, pick one value randomly from available values in the columns

Your help is appreciated.

Hitesh

Ksharp
Super User

OK. No problem. Let me know if it worked.

 
data x;
infile cards truncover;
input ID     $  Age1     Age2     Age3     Age4     Age5     Age6     ;
cards;
A01     10.0     .     .     .     10.0     11.0     10 
A03     .     .     .     13.1     12.0     13.5     
A05     .     12.0     12.0     12.0     12.5     13.0     12 
A07     11.8     11.0     11.0     .     13.0     10.0     11 
A09     14.0     14.0     10.0     .     10.0     15.0     
A11     .     .     17.0     .     16.2     17.0     17 
A13     11.0     .     11.3     11.0     11.3     11.0     11
A15     .     .     .     10.7     11.0     10.7     10.7 
A17     .     14.0     .     12.1     14.0     12.1     
A19     14.0     .     .     .     .     .     14 
A21     .     12.0     .     .     .     .     12 
A23     .     .     .     .     12.0     14.0     
A25     12.5     13.0     12.7     13.5     11.9     12.6
;
run;

data want(drop=k count i max);
if _n_ then do;
 declare hash ha();
 declare hiter hi('ha');
  ha.definekey('k');
  ha.definedata('k','count');
  ha.definedone();

 declare hash random(ordered:'a');
 declare hiter hir('random');
  random.definekey('ran');
  random.definedata('k','count');
  random.definedone();

end;
 set x;
  array x{*} age: ;
do i=1 to dim(x);
 if not missing(x{i}) then do;
     k=x{i};
     if ha.find()=0 then do;count+1;ha.replace();end;
       else do;count=1;ha.add();end;
 end;
end;
do while(hi.next()=0);
 ran=ceil(ranuni(-1)*1000000);random.add();
end;
max=0;
do while(hir.next()=0);
 if count gt max then do;max=count;mod=k;end;
end;
ha.clear();     random.clear();
run;

Xia Keshan

Hitesh
Fluorite | Level 6

Hi KSharp,

Sorry for bothering you again..

The program creating Hash/Hiter works perfectly and serves all my purposes for small data..up to 10,000 records but,

for files larger than that gives ERROR Message and stops processing. The error message is:

ERROR: Hash object added 0 items when memory failure occurred.

FATAL: Insufficient memory to execute DATA Step program. Aborted during Execution phase.

ERROR: SAS system stopped processing this step because of insufficient memory.

I've tried two different recommendations:

1. MEMSIZE in config file changed to 4G, didn't help.

2. used HASHEXP:20 in Declare statement "declare hash ha(hashexp:20)" in your syntax.

Please, let me know if you have solution for this problem. I've 3.3Million records and 16 Age variables for each record.

Once again, thank you very much for all your help.

Hitesh

Ksharp
Super User

OH, Very sorry about that. I just found a fatal error which could generate your kind error. I should wirte     if _n_ eq 1    NOT   if _n_  , otherwise SAS will create a Hash Object for every Obs . Sorry about that again.

data want(drop=k count i max ran);

if _n_ eq 1 then do;

declare hash ha();

declare hiter hi('ha');

  ha.definekey('k');

  ha.definedata('k','count');

  ha.definedone();

declare hash random(ordered:'a');

declare hiter hir('random');

  random.definekey('ran');

  random.definedata('k','count');

  random.definedone();

end;

set x;

  array x{*} age: ;

do i=1 to dim(x);

if not missing(x{i}) then do;

     k=x{i};

     if ha.find()=0 then do;count+1;ha.replace();end;

       else do;count=1;ha.add();end;

end;

end;

do while(hi.next()=0);

ran=ranuni(-1);random.add();

end;

max=0;

do while(hir.next()=0);

if count gt max then do;max=count;mod=k;end;

end;

ha.clear();     random.clear();

run;

Xia Keshan

Message was edited by: xia keshan

Hitesh
Fluorite | Level 6

Thank you very much, Xia !

Ksharp
Super User

You are welcome. and Don't forget to let me know if it worked . Smiley Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 7333 views
  • 10 likes
  • 6 in conversation