BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jorheej
Obsidian | Level 7

I'm working with over 40 billion rows and 31 columns

and I want to check modes of each rows but I have no idea how to do it..

 

For example,

 A   B C  D  E  mode

ㅇ ㅇ ㅇ ㅁ ㅁ   ㅇ

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input a b c d e f;
cards;
1 2 3 4 5 6
2 2 1 1 1 1
4 5 6 7 8 8
1 1 2 2 2 2
;

data want;
if _n_=1 then do;
length key count 8;
  declare hash h();
  declare hiter hi('h');
  h.definekey('key');
  h.definedata('key','count');
  h.definedone();
end;
set have;
array x{*}  a b c d e f;
h.clear();max=.;
do i=1 to dim(x);
  if not missing(x{i}) then do;
    key=x{i};
 if h.find()=0 then count=count+1;
  else count=1;
 h.replace();
  end;
end;
do while(hi.next()=0);
 if count>max then do;max=count;mode=key;end;
end;
drop i max key count;
run;

BTW, If you have SAS/IML ,that could be more succinct . @Rick_SAS would love to IML code.

View solution in original post

14 REPLIES 14
Quentin
Super User

Perhaps a hash-table approach like this answer:

https://communities.sas.com/t5/SAS-Programming/Calculate-mode-for-each-row/td-p/468952

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
sbxkoenk
SAS Super FREQ

Hello,

 

You can transpose, such that you need to find the mode (modus) for each column.
You can then do that with PROC FREQTAB, which is the CAS-enabled version of PROC FREQ.

But with 40 billion columns you might run into problems (??).
With respect to "SAS: what is maximum number of columns in dataset" --> For all practical purposes, the answer to this question is "unlimited". But that's theory. You will probably need to split into several datasets.

 

Cheers,

Koen

Quentin
Super User

Yeah, 40B transposing 30 columns is .. a lot.  : )

 

@jorheej , what is the range of each column, and are they integers?  Depending on the range, you might be able to calculate the mode with an array, rather than a hash table.  

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
jorheej
Obsidian | Level 7
they are integers and the range is from 0 to 5,000,000(approximately).
I guess the links above uses both hash table and array and it works!
Thank you for your reply
jorheej
Obsidian | Level 7
Unfortunately, this dataset is not appropriate to use transpose but I'll remember the process of your comment 😄
Thank you for your reply.
jorheej
Obsidian | Level 7
It works for me. Thank you!
Ksharp
Super User
data have;
input a b c d e f;
cards;
1 2 3 4 5 6
2 2 1 1 1 1
4 5 6 7 8 8
1 1 2 2 2 2
;

data want;
if _n_=1 then do;
length key count 8;
  declare hash h();
  declare hiter hi('h');
  h.definekey('key');
  h.definedata('key','count');
  h.definedone();
end;
set have;
array x{*}  a b c d e f;
h.clear();max=.;
do i=1 to dim(x);
  if not missing(x{i}) then do;
    key=x{i};
 if h.find()=0 then count=count+1;
  else count=1;
 h.replace();
  end;
end;
do while(hi.next()=0);
 if count>max then do;max=count;mode=key;end;
end;
drop i max key count;
run;

BTW, If you have SAS/IML ,that could be more succinct . @Rick_SAS would love to IML code.

jorheej
Obsidian | Level 7
Hi, thank you so much and I guess the one I tried according to @Quentin is also your solution.
that one works perfectly but it doesn't.
Error was occurred and it says It has to be data or something in statement max=.;

set have;
array x{*} a b c d e f;
h.clear();max=.; <- here

What do you think that I've done wrong?
Quentin
Super User

I don't see how this bit:

h.clear();
max=.;

Cojuld cause an error, unless you have a character variable named MAX in your input data, and it is character, and you have the undocumented option dsoptions=Note2Err turned on.  Which seems unlikely.

Can you post the log from running this step, showing all the code in the step and the error message?

And when you post the log, please click the </> button above the message box. It says "insert code" when you hover, but it's good for inserting any fixed fpnt.  It makes it easier to read.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
jorheej
Obsidian | Level 7
oh i didn't know this function :D

Thank you so much. It turns out I made mistype at the line above.

N0ow those codes work. 

Have a nice day !

Ksharp
Super User
Can you post your real dataset ? so I can test it .
And also can you post FULL log .
jorheej
Obsidian | Level 7
Sorry for that, the codes were right and I mistyped that ^^;;
Thank you so much for your solutions, I hope you have a great day
ballardw
Super User

Are the variables numeric or character or a mix?

jorheej
Obsidian | Level 7
Some of variables are character but I wanted to use numeric ones only for calculate mode!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1137 views
  • 7 likes
  • 5 in conversation