DATA Step, Macro, Functions and more

Calculate mode for each row

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Calculate mode for each row

Hey guys,

I have a dataset like as follows:

 

data work.class1;

input date $ cusip_hld $ value1 $ value2 $ value3$ value4 $;

datalines;

01DEC05 A 3 5 3 2 3

01DEC05 B 2 4 2 1 2 

01JAN06 A 1 2 3 4 1

01JAN06 C 4 4 4 0 1 

;

run;

 

Is there an opportunity to add a column "classValue" which is calculated as the mode  for each row, i.e. the mode for each 

date-cusip_hld-combination? (With mode I mean the value which appears most often, for example, the mode of 01JAN06 and Cusip_hld = C would be 4 in my table) 

 

I found some general solutions here, but I have 2 specific issues which makes it hard for me to reproduce those.

First, it is possible (and often), that there are missing values in the value_i Columns. I only want to calculate the mode of the non-missing values in the row.

Second, the code is in a macro. In each Iteration, one column value_i is added. The number of value_i columns is saved in the macro variable &fundNumber (if that helps). 

 

I´m really thankful of any suggestions or solutions.

 

Kind regards,

Zlatan

 

 


Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 10,695

Re: Calculate mode for each row

Posted in reply to mrzlatan91
data work.class1;
input date $ cusip_hld $ value1 $ value2 $ value3$ value4 $;
datalines;
01DEC05 A 3 5 3 2 3
01DEC05 B 2 4 2 1 2 
01JAN06 A 1 2 3 4 1
01JAN06 C 4 4 4 0 1 
;

run;

data fin_mode (drop=k count i max ran);
if _n_ eq 1 then do;
declare hash ha();
declare hiter hi('ha');
ha.definekey('k');
ha.definedata('k','count');
ha.definedone();
declare hash random(ordered:'a');
declare hiter hir('random');
random.definekey('ran');
random.definedata('k','count');
random.definedone();
end;
set class1;
array x{*} value: ;
do i=1 to dim(x);
if not missing(x{i}) then do;
k=x{i};
if ha.find()=0 then do;count+1;ha.replace();end;
else do;count=1;ha.add();end;
end;
end;

do while(hi.next()=0);
ran=ranuni(-1);random.add();
end;
max=0;
do while(hir.next()=0);
if count gt max then do;max=count;mod=k;end;
end;
ha.clear(); random.clear();
run;

View solution in original post


All Replies
PROC Star
Posts: 1,593

Re: Calculate mode for each row

Posted in reply to mrzlatan91

@mrzlatan91  hello mate!, long time.  Please also post a sample of your wanted output for responders to know the exact requirement.

 

Anyways, I am going home now. Have a great weekend

PROC Star
Posts: 1,593

Re: Calculate mode for each row

Posted in reply to mrzlatan91
data work.class1;

input date $ cusip_hld $ value1 $ value2 $ value3$ value4 $ value5 $;

datalines;
01DEC05 A 3 5 3 2 3
01DEC05 B 2 4 2 1 2 
01JAN06 A 1 2 3 4 1
01JAN06 C 4 4 4 0 1 
;

run;

proc transpose data= class1 out=t;
by date cusip_hld notsorted;
var value:;
run;

proc freq data=t order=freq noprint;
by 	date cusip_hld notsorted;
tables col1/out=t1(drop=percent) ;
run;

data want;
set t1;
by date cusip_hld notsorted;
if first.cusip_hld;
mode=col1;
run;

Occasional Contributor
Posts: 19

Re: Calculate mode for each row

Posted in reply to novinosrin

Hey mate, 

first, I want to thank you so much for your solution.

 

Sorry that I forgot to post my wanted dataset, but I guess this is exactly what i wanted (if column mode contains the mode of the values)

Unfortunately, as I said, I have a lot of missing values, s.t. in column mode (and col1 as well) are only missing values, s.t. I can´t 

proof if the result is correct.

Could you kindly tell me how to fix the problem with the missing values? Unfortunately, I´m an absolute SAS Noob.

 

To understand it: table T represents the transposed table, _name_ is like a counting variable and col1 contains the values.

Table T1 counts how often a specific value for a cusip-date combination exists.

And Table Want contains all date-cusip combinations, count represents how often the mode value exists for that 

combination and mode represents the mode value, but what does column col1 then stand for?

 

Have a nice day mate.

 

PROC Star
Posts: 1,593

Re: Calculate mode for each row

Posted in reply to mrzlatan91

@mrzlatan91  I'm glad you got a great solution by the champion @Ksharp whose code I tend to copy all the time for my needs. Going forward, please do post a wanted sample like your input sample to make it easy to understand.

Solution
2 weeks ago
Super User
Posts: 10,695

Re: Calculate mode for each row

Posted in reply to mrzlatan91
data work.class1;
input date $ cusip_hld $ value1 $ value2 $ value3$ value4 $;
datalines;
01DEC05 A 3 5 3 2 3
01DEC05 B 2 4 2 1 2 
01JAN06 A 1 2 3 4 1
01JAN06 C 4 4 4 0 1 
;

run;

data fin_mode (drop=k count i max ran);
if _n_ eq 1 then do;
declare hash ha();
declare hiter hi('ha');
ha.definekey('k');
ha.definedata('k','count');
ha.definedone();
declare hash random(ordered:'a');
declare hiter hir('random');
random.definekey('ran');
random.definedata('k','count');
random.definedone();
end;
set class1;
array x{*} value: ;
do i=1 to dim(x);
if not missing(x{i}) then do;
k=x{i};
if ha.find()=0 then do;count+1;ha.replace();end;
else do;count=1;ha.add();end;
end;
end;

do while(hi.next()=0);
ran=ranuni(-1);random.add();
end;
max=0;
do while(hir.next()=0);
if count gt max then do;max=count;mod=k;end;
end;
ha.clear(); random.clear();
run;
Occasional Contributor
Posts: 19

Re: Calculate mode for each row

Thank you very much. It works

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 185 views
  • 2 likes
  • 3 in conversation