I'm working with over 40 billion rows and 31 columns
and I want to check modes of each rows but I have no idea how to do it..
For example,
A B C D E mode
ㅇ ㅇ ㅇ ㅁ ㅁ ㅇ
data have;
input a b c d e f;
cards;
1 2 3 4 5 6
2 2 1 1 1 1
4 5 6 7 8 8
1 1 2 2 2 2
;
data want;
if _n_=1 then do;
length key count 8;
declare hash h();
declare hiter hi('h');
h.definekey('key');
h.definedata('key','count');
h.definedone();
end;
set have;
array x{*} a b c d e f;
h.clear();max=.;
do i=1 to dim(x);
if not missing(x{i}) then do;
key=x{i};
if h.find()=0 then count=count+1;
else count=1;
h.replace();
end;
end;
do while(hi.next()=0);
if count>max then do;max=count;mode=key;end;
end;
drop i max key count;
run;
BTW, If you have SAS/IML ,that could be more succinct . @Rick_SAS would love to IML code.
Perhaps a hash-table approach like this answer:
https://communities.sas.com/t5/SAS-Programming/Calculate-mode-for-each-row/td-p/468952
Hello,
You can transpose, such that you need to find the mode (modus) for each column.
You can then do that with PROC FREQTAB, which is the CAS-enabled version of PROC FREQ.
But with 40 billion columns you might run into problems (??).
With respect to "SAS: what is maximum number of columns in dataset" --> For all practical purposes, the answer to this question is "unlimited". But that's theory. You will probably need to split into several datasets.
Cheers,
Koen
Yeah, 40B transposing 30 columns is .. a lot. : )
@jorheej , what is the range of each column, and are they integers? Depending on the range, you might be able to calculate the mode with an array, rather than a hash table.
data have;
input a b c d e f;
cards;
1 2 3 4 5 6
2 2 1 1 1 1
4 5 6 7 8 8
1 1 2 2 2 2
;
data want;
if _n_=1 then do;
length key count 8;
declare hash h();
declare hiter hi('h');
h.definekey('key');
h.definedata('key','count');
h.definedone();
end;
set have;
array x{*} a b c d e f;
h.clear();max=.;
do i=1 to dim(x);
if not missing(x{i}) then do;
key=x{i};
if h.find()=0 then count=count+1;
else count=1;
h.replace();
end;
end;
do while(hi.next()=0);
if count>max then do;max=count;mode=key;end;
end;
drop i max key count;
run;
BTW, If you have SAS/IML ,that could be more succinct . @Rick_SAS would love to IML code.
I don't see how this bit:
h.clear(); max=.;
Cojuld cause an error, unless you have a character variable named MAX in your input data, and it is character, and you have the undocumented option dsoptions=Note2Err turned on. Which seems unlikely.
Can you post the log from running this step, showing all the code in the step and the error message?
And when you post the log, please click the </> button above the message box. It says "insert code" when you hover, but it's good for inserting any fixed fpnt. It makes it easier to read.
oh i didn't know this function :D
Thank you so much. It turns out I made mistype at the line above.
N0ow those codes work.
Have a nice day !
Are the variables numeric or character or a mix?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.