BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jorheej
Obsidian | Level 7

I'm working with over 40 billion rows and 31 columns

and I want to check modes of each rows but I have no idea how to do it..

 

For example,

 A   B C  D  E  mode

ㅇ ㅇ ㅇ ㅁ ㅁ   ㅇ

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input a b c d e f;
cards;
1 2 3 4 5 6
2 2 1 1 1 1
4 5 6 7 8 8
1 1 2 2 2 2
;

data want;
if _n_=1 then do;
length key count 8;
  declare hash h();
  declare hiter hi('h');
  h.definekey('key');
  h.definedata('key','count');
  h.definedone();
end;
set have;
array x{*}  a b c d e f;
h.clear();max=.;
do i=1 to dim(x);
  if not missing(x{i}) then do;
    key=x{i};
 if h.find()=0 then count=count+1;
  else count=1;
 h.replace();
  end;
end;
do while(hi.next()=0);
 if count>max then do;max=count;mode=key;end;
end;
drop i max key count;
run;

BTW, If you have SAS/IML ,that could be more succinct . @Rick_SAS would love to IML code.

View solution in original post

14 REPLIES 14
Quentin
Super User

Perhaps a hash-table approach like this answer:

https://communities.sas.com/t5/SAS-Programming/Calculate-mode-for-each-row/td-p/468952

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
sbxkoenk
SAS Super FREQ

Hello,

 

You can transpose, such that you need to find the mode (modus) for each column.
You can then do that with PROC FREQTAB, which is the CAS-enabled version of PROC FREQ.

But with 40 billion columns you might run into problems (??).
With respect to "SAS: what is maximum number of columns in dataset" --> For all practical purposes, the answer to this question is "unlimited". But that's theory. You will probably need to split into several datasets.

 

Cheers,

Koen

Quentin
Super User

Yeah, 40B transposing 30 columns is .. a lot.  : )

 

@jorheej , what is the range of each column, and are they integers?  Depending on the range, you might be able to calculate the mode with an array, rather than a hash table.  

 

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
jorheej
Obsidian | Level 7
they are integers and the range is from 0 to 5,000,000(approximately).
I guess the links above uses both hash table and array and it works!
Thank you for your reply
jorheej
Obsidian | Level 7
Unfortunately, this dataset is not appropriate to use transpose but I'll remember the process of your comment 😄
Thank you for your reply.
jorheej
Obsidian | Level 7
It works for me. Thank you!
Ksharp
Super User
data have;
input a b c d e f;
cards;
1 2 3 4 5 6
2 2 1 1 1 1
4 5 6 7 8 8
1 1 2 2 2 2
;

data want;
if _n_=1 then do;
length key count 8;
  declare hash h();
  declare hiter hi('h');
  h.definekey('key');
  h.definedata('key','count');
  h.definedone();
end;
set have;
array x{*}  a b c d e f;
h.clear();max=.;
do i=1 to dim(x);
  if not missing(x{i}) then do;
    key=x{i};
 if h.find()=0 then count=count+1;
  else count=1;
 h.replace();
  end;
end;
do while(hi.next()=0);
 if count>max then do;max=count;mode=key;end;
end;
drop i max key count;
run;

BTW, If you have SAS/IML ,that could be more succinct . @Rick_SAS would love to IML code.

jorheej
Obsidian | Level 7
Hi, thank you so much and I guess the one I tried according to @Quentin is also your solution.
that one works perfectly but it doesn't.
Error was occurred and it says It has to be data or something in statement max=.;

set have;
array x{*} a b c d e f;
h.clear();max=.; <- here

What do you think that I've done wrong?
Quentin
Super User

I don't see how this bit:

h.clear();
max=.;

Cojuld cause an error, unless you have a character variable named MAX in your input data, and it is character, and you have the undocumented option dsoptions=Note2Err turned on.  Which seems unlikely.

Can you post the log from running this step, showing all the code in the step and the error message?

And when you post the log, please click the </> button above the message box. It says "insert code" when you hover, but it's good for inserting any fixed fpnt.  It makes it easier to read.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
jorheej
Obsidian | Level 7
oh i didn't know this function :D

Thank you so much. It turns out I made mistype at the line above.

N0ow those codes work. 

Have a nice day !

Ksharp
Super User
Can you post your real dataset ? so I can test it .
And also can you post FULL log .
jorheej
Obsidian | Level 7
Sorry for that, the codes were right and I mistyped that ^^;;
Thank you so much for your solutions, I hope you have a great day
ballardw
Super User

Are the variables numeric or character or a mix?

jorheej
Obsidian | Level 7
Some of variables are character but I wanted to use numeric ones only for calculate mode!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1652 views
  • 7 likes
  • 5 in conversation