Hi All,
I would like to count the maximum number of value columns with the same value across any number of columns in the same row. For a small number of variables, this can be done with a bunch of conditional statements. However, this approach quickly falls apart as the number of variables increases. Is there a more elegant, dynamic approach? To illustrate:
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
data want;
set have;
if col1 = col2 = col3 = col4 then count = 4;
else if col1 = col2 = col3 then count = 3;
else if col1 = col2 = col4 then count = 3;
else if col1 = col3 = col4 then count = 3;
else if col2 = col3 = col4 then count = 3;
else if col2 = col3 = col4 then count = 3;
else if col1 = col2 then count = 2;
else if col1 = col3 then count = 2;
else if col1 = col4 then count = 2;
else if col2 = col3 then count = 2;
else if col2 = col4 then count = 2;
else if col3 = col4 then count = 2;
else count = 1;
run;
I found this code from Messr. Tabachneck to be close but it counts the number of matching pairs rather than the number of columns with the same value.
data want (drop=i j);
set have;
array vars{*} col1-col4;
count=0;
do i=1 to dim(vars)-1;
do j=i+1 to dim(vars);
count=sum(count, vars(i) eq vars(j));
end;
end;
run;
Best,
kbk
Another brute force approach using Array():
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
data want;
set have;
array col col1-col4;
length _cat $100.;
_cat=catx(',',of col(*));
do over col;
do _i=1 to dim(col);
_ct+(col=scan(_cat,_i));
end;
count=max(count,_ct);
_ct=0;
end;
drop _:;
run;
Haikuo
What happens when the following is true:
else if col1 = col2 then count = 2;
else if col3 = col4 then count = 2;
Except if you have more than 4 columns?
Reeza, I am assuming you are asking if the column values consist of [a a b b]. In this case the number needed would be 2. Column values of [a a a b b b b] would yield a count of 4, due to the fact of having 4 b's.
Why not transpose the data, run a proc freq, and take the top one?
Reeza, thanks. I have transposed the dataset but am not sure the best way to "run a proc freq and take the top one."
PROC TRANSPOSE DATA=WORK.have OUT=WORK.have_tran PREFIX=Column;
VAR col1-col4;
run;
PROC FREQ DATA=WORK.have_tran;
TABLES Column1 / OUT=WORK.Col1WANT;
TABLES Column2 / OUT=WORK.Col2WANT;
TABLES Column3 / OUT=WORK.Col3WANT;
TABLES Column4 / OUT=WORK.Col4WANT;
TABLES Column5 / OUT=WORK.Col5WANT;
RUN;
You have some more efficient suggestions above, but the following is what I had in mind. More of a beginner method I suppose
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
*add in a row count;
data have_transposed;
set have;
row=_n_;
array col(4) $ col1-col4;
do i=1 to 4;
column=i;
value=col(i);
output;
end;
keep row column value;
run;
*count the results and order them descending;
proc freq data=have_transposed order=freq noprint;
by row;
table value/out=counts;
run;
*take the top one;
data want;
set counts;
by row;
if first.row;
run;
Use call routine sortc. Like this :
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
data want(keep=best bestCount);
set have;
array c{*} col:;
call sortc(of c{*});
last = c{1};
best = last; /* So that best as the same length as last and col1 */
do i = 1 to dim(c);
same = c{i} = last;
count = sum(count, same);
if count > bestCount then do;
best = coalescec(last, c{i});
bestCount = count;
end;
if not same then count = 0;
last = c{i};
end;
output;
run;
PG
I had a similar idea but I don't know how it works. :smileyshocked:
I used peek and poke to save the array and put it back.
Thanks for the help everyone!
Another brute force approach using Array():
data have;
input col1 $ col2 $ col3 $ col4 $;
cards;
a b b b
a b b d
c b c b
d d d d
a d a a
;
data want;
set have;
array col col1-col4;
length _cat $100.;
_cat=catx(',',of col(*));
do over col;
do _i=1 to dim(col);
_ct+(col=scan(_cat,_i));
end;
count=max(count,_ct);
_ct=0;
end;
drop _:;
run;
Haikuo
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.